In SAS, the data set is an essential part of the analysis, and you must know all sorts of things about the data.
SAS has many data set options to help you accomplish your goals while working with the datasets.
Data set options modify how a data set is read or written. There are over three dozen of these options, and while you will generally only use a handful of them, you should have a good idea of their scope.
To use these option (s), place them in parentheses immediately following the name of the data set to which they are to be applied.
While you can use data set options virtually anytime the data set is named, some options are situation-dependent. Therefore, you must understand what an option does before applying it.
For instance, options that control how a data set is to be read would not be used on a DATA statement.
In the following example, the KEEP data set option is applied to the data set being used by PROC SORT.
proc sort data=sashelp.cars(keep= make model price)
out=carslist;
by price;
run;
Regardless of how many variables are in SASHELP.CARS, the SORT procedure will only have to deal with the three variables provided in the KEEP= option.
For a SORT procedure, this can substantially speed up the processing.
REPLACE and REPEMPTY Dataset Options
It is possible to create an empty (zero observation) data set. However, we may want to control whether the new table will replace an existing table of the same name.
- REPLACE
- REPEMPTY
REPLACE=NO – It prevents the replacement of a permanent data set. This data set option overrides the system option of the same name. REPLACE=YES is the default.
REPEMPTY option determines whether or not an empty data set can overwrite an existing data set.
These two options are usually used together. Typically, we want to be able to replace permanent data sets unless the new version is empty.
Using the DATASTMTCHK Dataset Option
Traditionally, overwriting a data set with observations with an empty one has been especially problematic when the semicolon is left off the DATA statement.
The missing semicolon in the SET statement is masked in the following DATA step, and three empty datasets are created.
MYLIB.VERYIMPORTANT, WORK.SET and SASHELP.CLASS
options DATASTMTCHK=NONE;
data advrpt.VeryImportant
set sashelp.class;
run;
The missing semicolon causes SAS to see the SET statement part of the DATA statements. The result is that there is no incoming data set; consequently, the created data sets will have no variables or observations.
The DATASTMTCHK system option protects us from this problem by not allowing datasets to be created with names like SET and MERGE. Setting DATASTMTCHCK to NONE removes this protection.
Password Protection DataSet Options
Data sets can be both encrypted and password protected. Password and encryption data set options to include:
ALTER |
Password to alter the data set |
ENCRYPT |
Encrypt the data set |
PW |
Specify the password |
PWREQ |
Password request window |
READ |
Password to read the data set. |
WRITE |
Password to write to the dataset. |
data Login(encrypt=yes pwreq=yes
read=readpwd write=writepwd);
DB='DEApp'; UID='MaryJ'; pwd='12z3'; output;
DB='p127'; UID='Mary'; pwd='z123'; output;
run;
While these password protections can be helpful within SAS, the protected files are still vulnerable to deletion or manipulation using tools outside of SAS.
Including the ENCRYPT option adds another layer of protection from tools other than SAS.
Read: How to Password protect SAS datasets?
KEEP, DROP, and RENAME Dataset Options
When using the KEEP, DROP, or RENAME in a DATA step, you can choose between data set options or DATA step statements.
The following examples highlight the differences between the KEEP statement and the KEEP= data set option.
data class(keep=name age sex);
set sashelp.class(keep=name age sex where=(age>'12'));
run;
- The KEEP statement below n is only applied to the new data set (WORK.CLASS) and in no way affects the Program Data Vector or what variables will be read from the incoming data set (SASHELP.CLASS).
- The KEEP statement variable list is applied to the new outgoing data set.
- The IF statement is executed after the entire observation is read and loaded into the PDV.
The KEEP statement is the same as specifying the KEEP= option on the dataset in the DATA statement.
The KEEP= option on the SET statement is applied before the PDV is built, and only the listed variables will be read from the incoming data set and included in the PDV. It can significantly improve performance when dealing with large datasets.
- The KEEP= data set option only impacts which variables will be written to the new data set.
- On the SET statement, the KEEP= data set option is applied to the incoming data set.
- The WHERE= filter is specified as a data set option and is used before observations are read.
- The RENAME option allows you to change the name of a variable either as it is read or written.
Should you use Dataset options or Statements
During the DATA step, the DROP, KEEP, and RENAME statements, or the DROP=, KEEP=, and RENAME= data set options, tell SAS which variables to process or output.
You can get the results you want by using one or more of these statements and data set options alone or together.
Order of Application
If your program needs you to use more than one data set option or a mix of data set options and statements, it’s helpful to know that SAS drops, keeps, and renames variables in the following order:
- First, SET, MERGE, and UPDATE statements evaluate the options on input data sets from left to right. Before the RENAME= option is used, the DROP= and KEEP= options are used.
- The RENAME statement comes after the DROP and KEEP statements.
- Lastly, the DATA statement looks at the options for output data sets from left to right. Before the RENAME= option is used, the DROP= and KEEP= options are used.
Using FIRSTOBS and OBS Data Set Options
FIRSTOBS and OBS can be used either individually or together to specify which observations should be read or written.
- FIRSTOBS: specifies the number of the first observation to be read
- OBS: specifies the last observation that is to be read.
When these two options are used together, they work independently. When FIRSTOBS is not used, OBS corresponds to the number of observations that will be read. OBS= option counts from observation one regardless of the value of FIRSTOBS.
proc print data=sashelp.class(firstobs=4 obs=6);
run;
Because OBS = 6, only the first six observations are printed. However, the first to be printed in the 4th observation (FIRSTOBS = 4). As a result, only three observations are printed.
In conclusion, SAS offers a wide variety of options for working with datasets, including merging datasets, updating observations, and modifying variables.
These options provide flexibility and allow users to manipulate and analyze their data effectively. It is important to carefully consider the specific needs of a project and choose the appropriate options to ensure accurate and reliable results.