SET Statement SAS

Exploring the SET Statement in SAS

  • Post author:
  • Post category:Base SAS
  • Post comments:4 Comments
  • Reading time:16 mins read

A majority of DATA steps use the SET statement. The primary function of the SET statement in SAS is to read observations from one or more SAS datasets.

Properly using the SET statement in SAS is one of the key techniques for improving the efficiency of SAS programs. The SET statement has options that can be used to control how the data are to be read.

SET statement options

  • END = It is used to detect the last observation from an incoming dataset.
  • KEY = It specifies an index to be used when reading SAS datasets.
  • INSDNAME = It is used to identify the current data source.
  • CUROBS= It creates and names a variable which contains the current observation number
  • NOBS= Creates a temporary variable containing the total number of observations of the input dataset.
  • POINT = Specifies a temporary variable whose value determines which observation is read
  • UNIQUE = It is used with the KEY= option to read from the top of the index.

Using the NOBS= and POINT= options

The SET statement in SAS reads values in a sequential manner. i.e. One observation after another. Using the POINT= option, you can perform a non-sequential read. The POINT= option tells SAS which observation to read next.

The POINT option allows direct access to the specified number of observations. For example, if you have to read only the 5th observation from a dataset.

The Point variable to be accessed must be declared with the observation number before the SET statement.

Since with the POINT, options data is being randomly accessed, you must tell the DATA step when to ‘OUTPUT’ and when to ‘STOP’.

The variable used for the POINT variable is a temporary variable and not added to the final dataset.

Since the POINT = option provides direct access to SAS data sets, it cannot be used with a BY statement or the WHERE= dataset option.

The POINT= and NOBS-= are used together, which returns the number of observations in the dataset.

data class;
	pt=5;
	set sashelp.class point=pt;
	put pt;
	output;
	stop;
run;

You can download the example programs and the datasets from the link provided at the end of this article.

In the above example, the POINT= option tells SAS to read only the 5th observation from the input dataset.

To read more observations from the input data set, the POINT= option can be combined with a DO loop as in the below example.

data class;
	do Sliceobs=2, 3, 6, 10;
		set sashelp.class point=Sliceobs;
		output;
	end;
	stop;
run;

The POINT= and NOBS= options can also be helpful when performing a look-ahead or look-back of the data.

In the following example, we need to extract the student names who have scored less than 40 along with the previous semester and the next semester.

data class2(keep=name score semester cnt pt);
 array flag {100} $1 _temporary_;
 set datasets.class(keep=name score rename=(name=name1));
 cnt+1;
 put cnt;
 if score lt 40 then
  do point=(cnt-1) to (cnt+1);
   put point=;
   pt=point;
   if 1 le point le noobs then
    do;
     set datasets.class point=point nobs=noobs;
     if name1=name and flag{point}=' ' then
      output class2;
     flag{point}='x';
    end;
  end;
run;
POINT option in the SET Statement in SAS

CNT will determine the range of values for the POINT variable.

If any student has a score value of less than 40, we need to print the previous observation and the next observation.

An array can be used to flag an observation once it has been used.

Using the INDSNAME= Option

With the INDSNAME option in the set statement in SAS, you can store the names of datasets from which the current observation is read.

data cars;
set sashelp.cars indsname=dataset;
put dataset=;
run;

The dataset name is created in a temporary variable dataset. It contains a two-level name – SASHELP.CARS.

The default length of the variable created by INDSNAME is 41.

END= option

The END= option can be used to create a numeric (0 or 1) temporary variable that indicates that the last record has been read.

For the last record, the temporary variable would have a value of 1.

data ageTotal;
 set sashelp.class end=eof;
end=eof;
total+age;
put eof=;

if eof then
 put total=;
run;

DATA Step with Two or more SET Statements

The DATA step may contain multiple SET statements. Multiple SET statements can give you more flexibility over the process of reading the data.

By using multiple SET statements, you can vertically stack datasets that will combine observations from two or more data sets into a single observation in the new data set.

The program reads the data from both datasets sequentially, starting with the first observation in the first dataset and ending with the last observation in the last dataset. All variables from both datasets are added to the program’s data vector. The corresponding values will be set to missing if all variables are not present in all datasets.

data data1;
 input name $ gender $;
 datalines;
A F
J M
T M
;
run;

data data2;
input name $ age;
datalines;
B 36
M 35
;
run;

data combined;
set data1 data2;
run;
using multiple set statements
The output of Multiple SET Statements

Using the CUROBS

The CUROBS options create a variable that writes the current observation number in the SAS dataset.

data Female;
 set sashelp.class curobs=cobs;
 where sex='F';
 obs=cobs;
run;

KEY Option

The KEY= in SAS set statement allows us to access observations based on the value of an index variable or a key. The KEY option cannot be used with the POINT option.

This option is helpful if we have to perform a table lookup of values from another dataset. The KEY= option on the SET statement identifies an index that is to be used.

data combine;
   set data.ProductCat;
   set data.ProductDetails key=id/unique;
   if _iorc_ > 0 then price=' ';
run;
Product Price Dataset
Product Price
Product Category
Product Category
Key lookup
Key Lookup

To use the key lookup, the data set must be indexed. The KEY option on the statement identifies the index to be used.

To create an index, you can write the below proc SQL step.

proc sql;
create unique index id on data.ProductPrice;
quit;
proc sql;
create index id on data.ProductCat;
quit;

_IORC_is the automatic variable which stands for INPUT/OUTPUT Return Code. If an index value is found _IORC_ is set to 0.

Also Read :

KEYRESET

With the KEYRESET option in the SET statement in SAS, you can control whether a KEY= search should begin at the top of the index for the data set being read.

The index lookup begins at the top when the value of KEYRESET is 1. If the value of the KEYRESET variable is 0, the index lookup is not reset, and the lookup continues where the prior lookup ended.

data a(index=(i));
 do i=1, 2, 3, 3, 3, 4, 5;
  j=put(int(ranuni(4)*40), 6.)|| byte(int(65+26*ranuni(0)));
  output;
 end;
run;

data b;
input i;
datalines;
3
3
;
run;

data _null_;
set b;
reset=2;
set a key=i keyreset=reset;
put i= j=;
run;

When the KEYRESET is set to 0

i=3 j=27W
 i=3 j=6D

When the KEYRESET is set to 1

i=3 j=27W
i=3 j=27W

IN= Option

The IN = data set option is used with multiple data sets to identify which data set contributed an observation.

DATA newdata;
SET dataset1 ( in = a ) dataset2 ( in = b ) ;
 IF a THEN -------;
 ELSE IF b THEN ----------- ;
RUN ;

A separate IN = variable can be specified for each data set defined with the SET statement.

The variable named by the IN = option has a value of 1 for every observation originating

from the corresponding datasets, and the datasets are read one after the other.Read :

The IN = option only exists for the duration of the DATA step for the variables defined. These variables are not added to the output dataset. So, DROP or KEEP options are not applied to these variables.

With the WHERE = dataset option, SAS selects only those observations that meet the condition specified. It functions identically to the WHERE statement.

The WHERE = data set option is more efficient than the WHERE statement as only the matched observations are loaded into the Program Data Vector.

The WHERE statement reads all observations from the input data set, and non-matched observations are discarded.

DATA newdata ;
  SET dataset1 ( where = ( condition ) )
  dataset2 ( where = ( condition ) ) ;
 RUN ;

If both a WHERE = and a WHERE statement are used in the same data step, the WHERE statement is ignored for those datasets with a WHERE = condition defined.

The WHERE = data set option can not be used with the POINT =, FIRSTOBS =, or OBS = data set options.

Download the entire code from this post here.

The Takeaway:

So, this was our side of the SET Statement We really hope that you have found it useful.

Moreover, if you have any other suggestions, mention them in the comment section below. We would really take those lists in our further blog post.

Thanks for reading!

If you liked this article, you might also want to read Data Set Options In SAS.

Do you have any tips to add Let us know in the comments.

Every week we'll send you SAS tips and in-depth tutorials

JOIN OUR COMMUNITY OF SAS Programmers!

Subhro

Subhro provides valuable and informative content on SAS, offering a comprehensive understanding of SAS concepts. We have been creating SAS tutorials since 2019, and 9to5sas has become one of the leading free SAS resources available on the internet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This Post Has 4 Comments

  1. kaushik

    Thanks I’ve actually been looking for this. Clear explanation.

    1. Subhro Kar

      Thank You!!

  2. David

    Thanks Subhro! for sharing this. I would like to know if i can set a dataset in sas up until a point and then set another one. Do let me know.

  3. Adrianoamete

    Many thanks, Useful information!