The SUBSTR in SAS is used to extract part of a string. But apart from extracting parts of a string, it has another important use.SUBSTR function can be used on the left side of the assignment statement and also on the right side.
The SUBSTR function returns a portion of a string that you specify in the string. The part begins with the character you specify in the start-position argument and the number of characters you specify in length.
If the length of the returned variable is not defined, it is assigned with the length of the first argument.
SUBSTR(character-value,start-position,number-of-characters-to-read)
“Note”: The length of the resulting variable will be the length of the character value if the length is not previously assigned.
| Examples | Results |
|---|---|
| SUBSTR(“ABC123XYZ”,4,2) | 12 |
| SUBSTR(“ABC123XYZ”,4) | 123XYZ |
When you use the SUBSTR function on the left side of the equal sign, SAS replaces the variable’s value with the expression on the right side. SUBSTR replaces length characters, starting with the character you specify in the position argument (2nd Argument).
If you use an undeclared variable, it is assigned a default length of 8.
Syntax:
SUBSTR(character-value,start,) = charcter-value
| Examples | Results |
|---|---|
| SUBSTR(“Pin Code 411014”, 4, 5) = “:”; | “Pin Code 411014” |
| SUBSTR(“Pin Code 411014”,4) | “Pin”: |
The use of SUBSTR on the left-hand side is similar to the use of the SAS COLON MODIFIER ( =: ).
Both methods allow the comparison of values based on the prefix of a text string.
The advantage of using SUBSTR over the COLON MODIFIER is that SUBSTR can also be used in MACRO statements.
“Example”:
data test;length name $20 president $4;name = 'GEORGE WASHINGTON';president = 'YES' ;output;name = 'THOMAS JEFFERSON' ;president = 'yes' ;output;name = 'BENJAMIN FRANKLIN';president = 'Nope';output;run;
Example Data
name presidentGEORGE WASHINGTON YESYESTHOMAS JEFFERSON yesBENJAMIN FRANKLIN Nope
Observe the variable president in the sample data contains ‘Yes’ and ‘No’ values. The code below turns this inconsistent data into a useful text message.
data test1;set test;if upcase(president)=: 'N' thentext_msg=name || 'Was not President of the USA';if upcase(substr(president, 1, 1))='Y' thentext_msg=name || 'Was President of the USA';run;
“Output”:

The SUBSTR function’s second and third arguments must be positive. But what if we needed to specify the position (second argument) that will start extracting input relative to the end rather than the beginning of the string?
There are two ways you can achieve this.
Method “1”: Using a combination of LENGTH and the SUBSTR function.
We will use the substr function to extract the last three characters from the string
data fileext;fname = "data.csv";ext=substr(fname,length(fname)-2,3);run;
| fname | ext |
|---|---|
| data.csv | csv |
The length function is used to find the starting position of the input string.
Similarly, if we have to find the last four characters, adjust the length function as below.
ext=substr(fname,length(fname)-3,4);
Method “2”: Using REVERSE and SUBSTR Function
Another way of achieving this is to reverse the string so that the first character becomes the last.
Then, apply the SUBSTR function to extract the first three characters and reverse the SUBSTR function’s result again.
data fileext;fname = "data.csv";reverse_fname = reverse(fname);first_three=substr(reverse_fname,1,3);reverse_first_three=reverse(first_three);run;
| fname | reverse_fname | first_three | reverse_first_three |
|---|---|---|---|
| data.csv | vsc.atad | vsc | csv |
All three steps can be combined in a single step as below.
ext=reverse(substr(reverse(fname),1,3));
This function serves the same purpose as the SUBSTR function with a few additional features. The starting position and the length arguments of the SUBSTRN function can be 0 or negative without causing an error.
“Syntax”:
SUBSTRN(character-value,start,)
The SUBSTR in SAS is one of the frequently used character functions for extracting substring from a string, but it can be frustrating when it issues an abrupt NOTE to the log when the start or length argument is more than the length of the string.
Preparing the Test data
data _null_;String1='Hello World';a=substr(string1, 1, 5);b=substr(string1, 1, 15);c=substr(string1, 15, 10);put a=b=c=;run;
SAS LOG
"NOTE": Invalid third argument to function SUBSTR at line 74 column 5."NOTE": Invalid second argument to function SUBSTR at line 75 column 5.a=Hello b=Hello World c= String1=Hello World a=Hello b=Hello World c= _ERROR_=1 _N_=1
If the design of your code can allow a null result from the sub-string function, consider using SUBSTRN.
This function provides the same essential functionality as SUBSTR but will not issue a NOTE message if it returns a null result.
The example below shows the use of the SUBSTRN function.
data _null_;String1='Hello World';a=substrn(string1, 1, 5);b=substrn(string1, 1, 15);c=substrn(string1, 15, 10);put a=b=c=;run;
OUTPUT
a=Hello b=Hello World c=
SUBPAD function returns a substring of the length specified in the argument with blank padding.
If either position or length has a missing value, SUBPAD returns a string with zero length.
If the position is negative, the result is padded with 1-position leading blanks.
If the specified substring extends beyond the end of the string, the result is padded with trailing blanks.
In a WHERE statement or the SQL procedure, the length of the value returned by the SUBPAD function cannot exceed 200.
“Syntax”:
SUBPAD(string, position,)
Example
data _NULL_;string="Hello World";a='*'|| subpad(string, -1, 0) || '*';b='*'|| subpad(string, 1, 0) || '*';c='*'|| subpad(string, 1, 20) || '*';d='*'|| subpad(string, -1, 7) || '*';put a=b=c=d=;run;
OUTPUT
a=** b=** c=*Hello World * d=* Hello*
The SUBPAD function is similar to the SUBSTR function except for the following “differences”:
The CHAR function returns a single character from a specified position in a character string.
Syntax
CHAR(string, position)
| Examples | Results |
|---|---|
| char(“Hello World”,0); | |
| char(“Hello World”,1); | H |
| char(“Hello World”,7); | W |
| char(“Hello World”,20); |
The FIRST function returns the first character in a given string having a length of 1. It returns a single blank if the length of the string is 0.
Syntax
**FIRST**(string)
| Examples | Results |
|---|---|
| first(“Hello World”) | H |
| first("") |
Key Takeaway
So, these are the different SAS character functions to extract a substring from a string.
SUBSTR and SUBSTRN have more flexibility than the others. Use the SUBSTRN function when you can have null values in the result.
Use the FIRST function if you need to extract only the first character of the substring, and use the CHAR function to extract only one character from any position of the string.
Moreover, if you have any other suggestions regarding other tips or tricks to add, suggest us below the comment section. We will take those lists in our further blog post.
Thanks for reading!
If you liked this article, you might also want to read SAS Numeric functions and Operators.
Do you have any tips to add? Let us know in the comments.
Please subscribe to our mailing list for weekly updates. You can also find us on Instagram and Facebook.
