How to use the SAS SCAN Function?8 min read

SAS has two powerful functions for dividing a string into words. Words can be characters separated by blanks or other delimiters that you specify.

SCAN and SCANQ split strings into words. Both functions are similar. However, the SCANQ function has some additional features, and there is a difference in the default delimiter used in these two functions.

This article contains the SCAN function’s primary usage and real use cases of the SCAN function.

SAS SCAN Function

SAS SCAN function extracts a specified word from a character expression, where the word is the characters separated by a set of specified delimiters.

The length of the returned variables is 200 unless previously defined.

Syntax :

SCAN(character-value, n-word <,’delimiter-list’>,<modifiers>)

  • The n-word is the nth “word” in the string.
  • AnΒ  ‘n’ value greater than the number of words returns a value that contains no characters.
  • For negative ‘n’ values, the character value is scanned from right to left, and a value of zero is invalid.

List of Modifiers

Below are some of the most used modifiers with the SCAN function.

πŸ‘‰Β  a adds alphabetic characters to the list of characters.

πŸ‘‰ b scans backward from right to left instead of from left to right, regardless of the sign of the count argument.

πŸ‘‰ d adds digits to the list of characters.

πŸ‘‰ i ignores the case of the characters.

πŸ‘‰ k causes all characters that are not in the list of characters to be treated as delimiters.

πŸ‘‰ mΒ  With the m modifier you can specify multiple consecutive delimiters, and delimiters at the beginning or end of the string argument, refer to words that have a length of zero.

πŸ‘‰ qΒ  modifier is used when you want to ignore delimiters that are inside substrings that are enclosed in quotation marks.

πŸ‘‰ t trims trailing blanks from the string and charlist arguments. If you want to remove trailing blanks from only one character argument instead of both character arguments, use the TRIM function instead of the SCAN function with the T modifier.

For the complete list of delimiters, refer to the SAS documentation.

When using a modifier with the SCAN function, specify the modifier as the fourth argument, and explicitly specify the delimiter as the third argument so that SAS won’t use your modifier as the delimiter!

Example: Using Scan function to find the nth word – Left to Right Scan

data _null_;
text = "Kenny Green flies brown kites";
third_word = scan(text,3);
put third_word=;
run;

fifth_word=flies

Using Scan function to find the Second Last Word – Right to Left Scan

Using the SCAN function, you also can read from right to left, effectively allowing you to capture the last word in a character string.

To tell SAS to read from right to left, change the count argument to a negative number to indicate the word number you would like to read, starting from the right and moving left.

So, to select the word β€œbrown” in our TEXT variable, we can use a count of -2, as shown here:

data _null_;
text = "Kenny Green flies brown kites";
second_last_word = scan(text,-2);
put second_last_word=;
run;

second_last_word=brown

Alternatively, you can use the β€œb” modifier available with the SCAN function rather than using a negative count. By specifying the β€œb” argument with the SCAN function, you can tell SAS to read from right to left instead of the default left to right.

data _null_;
	text="Kenny Green flies brown kites";
	second_last_word=scan(text,2," ","b");
	put second_last_word=;
run;

Points to remember while using SCAN Function:

  • If the length is not defined previously, it defaults to 200 bytes for the created variable.
  • A missing value is returned if there are fewer than n words in the string.
  • delimiters at the beginning or end of the string argument are ignored
  • Two or more contiguous delimiters are treated as a single delimiter.
  • When the SCAN function is used, Any character or set of characters can serve as delimiters.
  • If n is negative, SCAN selects the word in the character string starting from the string’s end.

Handling Different Word Delimiters

The default word delimiter for SCAN function is the space but the SCAN function still works even with commas as the delimiter.

data _null_;
text = "Kenny,Green,flies,brown,kites";
third_word = scan(text,3);
put third_word=;
run;

third_word=flies

The reason these still works is that by default, with any computer using ASCII characters, the SCAN function will automatically check for any of the following characters as delimiters:

blank ! $ % & ( ) * + , – . / ; < ^ :

When your data contains a delimiter between words not found in the default list, you can use the third argument with the SCAN function to specify your custom delimiter.

For example, if the words in your character string with a plus sign (+), you need to enclose the plus sign in quotations as the third argument to the scan function.

The syntax below demonstrates how to select the fifth word from a plus sign delimited character string:

data _null_;
text = "Kenny+Green+flies+brown+kites";
fifth_word = scan(text,5,"+");
put fifth_word=;
run;

fifth_word=kites

How to handle multiple Delimeters?

By default, SAS will use not just one but all of the delimiters in the default list. The result might not come as expected when your data contains multiple delimiters.

So, you may also want to force SAS to use only one of the default delimiters in some cases.

In the below dataset, the names variable contains a list of first, last, and middle names.

sas scan

The structure is as follows: <last name><blank><middlename><comma><firstname>. You would like to create first name from this data.

Since commas and spaces are default delimiters, if we use the SCAN function without specifying our delimiter in firstname1, SAS will consider space as the delimiter and “Slyke” will become the first word.

To correct this, we can tell SAS only to use the comma as a delimiter so that “Van Slyke” will become the last name and Andy will be the given name:

data one;
    input names $25.;
    firstname1=scan(names,2);
    firstname=scan(names,2,",");
    datalines;
    Van Slyke, Andy 
    Thomas,Andres
    Robidoux, Billy Jo
    Mr. Bruce,Brenly
    Bob,Horner
	;
run;

sas scan

 

Now that the blanks are no longer considered delimiters and only the commas are, we get the desired result in our output data with “Andy” now in the firstname variable.

Using SCAN with DO LOOPS to Parse Long Character Strings

When combined with a simple DO LOOP and a SAS, the SCAN function makes it easy to parse out each word from a character string into separate variables.

In the below example dataset, you would like to parse out each word from the letters variable into five separate variables.

How to use the SAS SCAN Function? 2

data letter;
	input letter $50.;
	call symputx('count', count(letter, ",") + 1);
	datalines;
A,B,C,D,E,F,G
;
run;

A macro variable is created within the data step to hold the count of letters. This macro variable is later used to define the array elements dynamically.

Read :Β 

The code below uses a DO LOOP to scan the letters variable and then create the variables letter1 to letter7.

data one;
	set letter;
	array letters[&count] $15 letter1-letter&count;
	do i=1 to dim(letters);
		letters[i]=scan(letter, i, ", ");
	end;
	drop i;
run;

sas scan

As you can see in the output data shown partially below, we now have five new MODEL variables, with one word per variable:

Separating the comma-separated list horizontally

Sometimes you may need to create separate rows from one value in a column. For example, consider a list of comma-separated email id’s in a variable. You may need to create send separate emails to each of the email ids.

The SCAN function can be used in this scenario where each of the email ID can be separated as a single row.

Below is a simple example where I have used the letter dataset to create separate letters as a row in the dataset.

data two(keep=new_letter);
set letter;
do i = 1 by 1 while (scan(letter,i,',') ^= ' ');
    new_letter = scan(letter,i,',');
    output;
  End;
  run;

and this is the output of the above code snippet.

sas scan

SCANQ

Similar to the SCAN function, SCANQ is also used to extract a specified word from a character expression.

Syntax:

SCANQ(character-value, n-word <,’delimiter-list’>)

Difference between SCAN and SCANQ

Below are some of the differences between SCAN and SCANQ

  1. Default Delimeter set – If you omit delimiter, SCANQ uses white space characters (blank, horizontal and vertical tab, carriage return, line feed, and form feed) as the default delimiters.
  2. A value of 0 for the word count does not result in an error message. when you use the SCANQ
  3. SCANQ also ignores delimiters enclosed in quotation marks.

Example:

data _null_;
	text="Kenny,'Green,flies',brown,kites";
	third_word1=scan(text, 3);
	third_word2=scan(text, 3,",","q");
	third_word3=scanq(text, 3,",");
	put third_word1=third_word2=third_word3=;
run;

third_word1=flies’ third_word2=brown third_word3=brown

If the delimiter is a comma, you have to use the delimiter argument in the SCANQ function.

To ignore the delimiter inside the quotation mark, the q argument can be used with the SCAN function.

Difference between SCAN and SUBSTR

SCAN is used to extract words from a list of words that are separated by delimiters. SUBSTR is used to extract a part of the word by specifying a starting location and the length of the part to be extracted.

When to use the SCAN function?

  • Use the scan function when you know the order of the words in the character value.
  • When you know that the starting position of the word varies.
  • The words are separated by some delimiter.

 

by Subhro Kar
Been in the realm with the professionals of the IT industry. I am passionate about Coding, Blogging, Web Designing and deliver creative and useful content for a wide array of audience.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share via
Copy link
Powered by Social Snap