Working with data can be a messy process, especially when you need to convert character variables to numeric formats.
You can Suppress invalid argument or _error_ when converting from character to numeric with the ?? Informat Modifier.
The Problem with Invalid Character Data When reading in raw data files, it’s extremely common to encounter non-numeric characters mixed into variables that should be numeric – things like blanks, special characters, alphabetic values, and more. If you try to directly convert these character variables to numeric formats using input functions like input() , SAS will hit those invalid values and a Note is written in SAS log.
data invalid_data;
length char_var $5;
input char_var;
num_var = input(char_var, 5.);
datalines;
12345
-7
abc??
1,234
;
run;
When we try to run this data step, SAS does not abort, but it does write the following NOTE to the log:
Rather than aborting the data step, SAS simply assigns a missing value (.) to num_var for that observation and continues processing the data step.
The ?? Informat Modifier to the Rescue
The ??
informat modifier in SAS is used to suppress invalid arguments and error messages when reading data with the INPUT
function. When the ??
modifier is used, the INPUT
function will return a missing value for characters that cannot be converted to the specified informat rather than generating an error message.
Here’s an example of using ??:
data invalid_data;
length char_var $5;
input char_var;
num_var = input(char_var, ??5.);
datalines;
12345
-7
abc??
1,234
;
run;
This can be useful when you read data containing invalid or incorrect values. You want to handle those values gracefully by setting them to missing rather than stopping the program with an error.
Note that the ??
modifier can only be used with the INPUT
function and not with the PUT
function or other SAS functions or statements.
While the ?? informat modifier provides a convenient way to handle invalid data during character-to-numeric conversions, it’s important to use it cautiously.
By default, the ?? informat will automatically assign a missing value (.) to any non-numeric values in the input character string. This means your resulting numeric variable could end up with a lot of missing observations if there is a significant amount of invalid data in the original character variable.
Missing values can cause issues and skew results when doing further analysis and calculations on the numeric data. So before blindly applying the ?? informat, make sure you understand the quality of your character data source and the prevalence of potential invalid values.
It may be better to properly clean and validate the character values first before converting to numeric. Only use the ?? informat if you are confident that introducing some missing values due to invalid data is acceptable for your analysis needs. Otherwise, you risk ending up with a numeric variable that has too many missing observations to be usefully analyzed.