The Ultimate Guide to SUBSTR in SAS6 min read

The SUBSTR in SAS is used to extract part of a string. But apart from extracting parts of a string, it has got another important use as well.
SUBSTR function can be used on the left side of the assignment statement and also on the right side.

SUBSTR( ) in SAS used on Right Side on Assignment Statement.

The SUBSTR function returns a portion of a string that you specify in the string. The part begins with the character that you specify in the start-position argument  and is the number of characters that you specify in length argument.

If the length of the returned variable is not defined, it is assigned with the length of the first argument.

Syntax

SUBSTR(character-value,start-position,number-of-characters-to-read)

  • character-value is any SAS character expression.
  • start-position is the starting position within the string. 
  • number-of-characters-to-read is the length argument to read from start position to the given length, an if it is not specified, it reads the number of characters from the start position to the end of the string.

Note: The length of the resulting variable will be the length of the character-value if the length is not previously assigned.

Examples

Results

SUBSTR(“ABC123XYZ”,4,2)

12

SUBSTR(“ABC123XYZ”,4)

123XYZ

Check out the article on how to read and extract character from backward.

SUBSTR in SAS used on the left-hand side of the equal sign

When you use the SUBSTR function on the left side of the equal sign, SAS replaces the value of the variable with the expression on the right side. SUBSTR replaces length characters, starting at the character that you specify in position argument (2nd Argument).

If you use an undeclared variable, it is assigned a default length of 8.

Syntax

SUBSTR(character-value,start,) = charcter-value

  • Character-value is any SAS character expression.
  • The start is the starting position in a string where you want to place the length of the new character.
  • Length is the number of characters to be placed in that string. If length is omitted all the characters on the right-hand side of the equal sign replaces the characters in the character-value.

Examples

Results

SUBSTR("Pin Code 411014", 4, 5) = “:”;

"Pin Code 411014"

SUBSTR("Pin Code 411014",4)

Pin:

SUBSTR on Left-Side and the colon modifier

The use of SUBSTR on the left-hand side is similar to the use of the SAS COLON MODIFIER ( =: ).

Both methods the methods allow comparison of values based on the prefix of a text string.

The advantage of using SUBSTR over the COLON MODIFIER is that SUBSTR can also be used in MACRO statements

Example:

data test;
length name $20 president $4;
name = 'GEORGE WASHINGTON'; president = 'YES' ; output;
name = 'THOMAS JEFFERSON' ; president = 'yes' ; output;
name = 'BENJAMIN FRANKLIN'; president = 'Nope'; output;
run;

EXAMPLE DATA

name                                     president

GEORGE WASHINGTON    YES
THOMAS JEFFERSON        yes
BENJAMIN FRANKLIN        Nope

Observe the variable president in the sample data contains ‘Yes’ and ‘No’ values . The code below turns this inconsistant data into a useful text message.

data test1;
set test;
if upcase(president) =: 'N' then text_msg = name || 'Was not President of the USA';
if upcase(substr(president, 1, 1)) = 'Y' then text_msg = name || 'Was President of the USA';
run;

OUTPUT

substr-left-side

SUBSTRN

This function serves the same purpose as the SUBSTR function with a few additional features. The starting position and the length arguments of the SUBSTRN function can be 0 or negative without causing an error.

Syntax:

SUBSTRN(character-value,start,)

Difference between SUBSTR and SUBSTRN

The SUBSTR function is one of the frequently used character function for extracting substring from a string but it can be frustrating when it issues an abrupt NOTE to the log when the start or length argument is more than the length of the string.

Preparing the Test data

data _null_;
String1 = 'Hello World';
a = substr(string1,1,5);
b = substr(string1,1,15);
c = substr(string1,15,10);
put a= b= c=;
run;

SAS LOG

NOTE: Invalid third argument to function SUBSTR at line 74 column 5.

NOTE: Invalid second argument to function SUBSTR at line 75 column 5.

a=Hello b=Hello World c=
String1=Hello World a=Hello b=Hello World c= _ERROR_=1 _N_=1

If the design of your code can allow a null result from the sub-string function, consider using SUBSTRN.

This function provides the same essential functionality as SUBSTR but will not issue a NOTE message if it returns a null result. 

The example below shows the use of SUBSTRN function.

data _null_;
String1 = 'Hello World';
a = substrn(string1,1,5);
b = substrn(string1,1,15);
c = substrn(string1,15,10);
put a= b= c=;
run;

OUTPUT

a=Hello b=Hello World c=

SUBPAD

SUBPAD function returns a substring of the length specified in the argument with blank padding.

If either position or length has a missing value, SUBPAD returns
a string with a length of zero.

If position is negative, the result is padded with 1-position leading blanks.

If the specified substring extends beyond the end of the string,
the result is padded with trailing blanks.

In a WHERE statement, or in the SQL procedure, the length of
the value returned by the SUBPAD function cannot exceed 200.

Syntax:

SUBPAD(string, position,)

Example

data _NULL_;
string="Hello World";
a='*'|| subpad(string,-1,0) || '*';
b='*'|| subpad(string,1,0) || '*';
c='*'|| subpad(string,1,20) || '*';
d='*'|| subpad(string,-1,7) || '*'; put a= b= c= d=; run;

OUTPUT

a=** b=** c=*Hello World * d=* Hello*

Difference between SUBPAD and SUBSTR

The SUBPAD function is similar to the SUBSTR function except for the following differences:

  • If position argument is 0 or negative, SUBPAD adds leading blanks to the result, whereas SUBSTR writes a note saying the second argument is invalid,sets automatic variable _ERROR_=1, and returns the substring extending from the specified position to the end of the string.
  • If length argument is zero, SUBPAD returns a zero-length string,
    whereas SUBSTR prints a note - Third argument is invalid, sets _ERROR_=1, and returns the substring extending
    from the specified position to the end of the string.
  • If the specified substring extends the end of the string,
    SUBPAD pads the result with blanks to match the requested length, whereas SUBSTR prints a note - Third argument is invalid, sets _ERROR_=1, and returns the substring extending from the specified position to the end of the string.

CHAR

The CHAR function returns a single character from a specified position in a character string.

Syntax

SYNTAX : CHAR(string,position)

Examples

Results

char("Hello World",0);


char("Hello World",1);

H

char("Hello World",7);

W

char("Hello World",20);


FIRST

The FIRST function returns the first character in a given string having a length of 1. It returns a single blank if the length of the string is 0.

Syntax 

FIRST(string)

Examples

Results

first("Hello World")

H

first("")


Key Takeaway

So, this are the different SAS character functions to extract substring from a string.  

SUBSTR and SUBSTRN has more flexibility than the others. Use the SUBSTRN function when you can have null values in the result.

Use FIRST function if you need to extrat only the first character of the substring and use the CHAR function to extract only one character from any position of the string.

Moreover, if you have any other suggestion regarding other tips or tricks to add then suggest us below the comment section. We would really take those list in our further blog post.

Thanks for reading!

If you liked this article, you might also want to read SAS Numeric functions and Operators  as well.

Do you any tips to add Let us know in the comments.

Please subscribe to our mailing list for weekly updates. You can also find us on Instagram and Facebook.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share via
Copy link