An index in SAS is used to sort your data without physically sorting it logically. If you are sorting and then re-sorting data to accomplish merges, you may find indexes useful.
Creating an Index in SAS
Indexes are created using PROC DATASETS or PROC SQL; they can be created in a DATA step.
Indexing is storing the order of the data like physically sorting If there is an index for a dataset, SAS will be able to access it and allows us to use the data set with the appropriate BY statement without sorting the data.
SAS Indexes are stored in a separate file, and the size of this file can be substantially large, especially as the number of indexes, observations, and variables used to form the indexes are more. However, Indexes can substantially speed up processes.
Creating an index using proc datasets
PROC DATASETS LIBRARY=libref; MODIFY SAS-data-set; INDEX CREATE varlist / UNIQUE NOMISS UPDATECENTILES = ALWAYS | NEVER | integer;
- libref is the SAS data library that is to be modified
- INDEX CREATE statement is the list of variables for which you want to create an index.
- The UNIQUE option specifies that key variable values must be unique within the SAS data set.
- The NOMISS option specifies that no index entries will be built for observations with missing key variable values.
- The
UPDATECENTILES
option allows you to override when SAS updates the index’s centiles.
The UNIQUE, NOMISS, and UPDATECENTILES options are optional.
There are two types of Indexes- Simple and Composite.
A simple index consists of one variable, whereas a composite index contains multiple variables.
data subjects;
length subject_name $3;
do subject_id=100 to 100000;
do j=1 to 3;
substr(subject_name, j)=byte(int(65+26*ranuni(0)));
end;
output;
end;
run;
Creating a single index
proc datasets library=work;
modify subjects;
index create subject_id / unique;
run;
Creating Multiple Index
proc datasets library=work;
modify subjects;
index create idname=(subject_id subject_name) / nomiss;
run;
Creating an index using PROC SQL
CREATE INDEX index-name ON data-set-name(varlist)
Note that the UNIQUE, NOMISS, and UPDATECENTILES options are not available while creating an index using PROC SQL.
/*Single Index*/
proc sql;
create unique index subject_id on work.subjects;
quit;
/*Multiple Index*/
proc sql;
create index idname on work.subject(subject_id subject_name);
quit;
Creating an Index using DATA STEP
DATA data-set-name(INDEX=(varlist / <UNIQUE><NOMISS><UPDATECENTILES = | ALWYAYS | NEVER integer> ));
- Data-set-name is the name of the new SAS data set name. Varlist is the name of the key variable.
- The UNIQUE, NOMISS, and UPDATECENTILES options can be used while creating an index in the data step.
/*Single Index*/
data SubjectIndexed(index=(subject_id / unique));
set work.subject;
/*SAS Statements */
run;
/*Multiple Index*/
data SubjectIndexed(index=(idname=(subject_id subject_name) / nomiss));
set work.subjects;
/* SAS Statements…*/
run;
Using the index in SAS
Once any of the above methods create an Index, you will want to use them. You can use the index in any of the four places described below.
Using Index in a WHERE statement
The WHERE statement can be used in DATA and PROC steps as in the below example.
data subject1;
set subjects;
where subject_id eq 5678;
run;
Using the BY Statement
Using an index in a BY statement is simple. When the BY variable is an index, the index is automatically used, and the data does not need to be sorted.
data class(index=(age));
set sashelp.class;
run;
data class2;
set class;
by age;
if first.age;
run;
Using the KEY= Option
KEY= option can be used to look up a value when an index exists on only the data set that contains the values to be looked up. The KEY= option on the SET statement identifies an index to be used.
Please refer to the article below, which has an example of using the KEY= option for performing table lookup.
More on Indexes
SAS doesn’t need to use an existing index just because Index has been created, even when using a WHERE or BY statement.
SAS first analyzes if using an index will be more efficient than reading the entire data set sequentially.
There are a lot of factors that are considered before using an index. Some of these are dataset size, index availability, and centile information.
Hence, If SAS predicts that it will be more efficient to use a specific index to return observations than to read the entire data set, then it will use that index. Otherwise, it will read the entire data set sequentially to fetch the observations.
Mostly SAS makes a good decision to use an index or not. However, its internal algorithms can sometimes decide not to use Index, considering the resources consumed when reading a large subset of data via an index are greater than reading the entire SAS data set.
Override the SAS System decision about whether to use an index.
The IDXNAME= and IDXWHERE= dataset options override SAS’ default index usage.
You can use any one of the above but not both.
data subject1;
set subjects (idxname=idname);
where idname < 100;
run;
Preventing SAS from using an Index
To prevent SAS from using an index, you can use the IDXWHERE=NO option
data subject1;
set subjects (idxwhere=NO);
where idname < 100;
run;
SAS automatically uses an index when you specify the KEY option on a SET or a MODIFY statement.
How to find if SAS is using the index
Using the below system option, SAS will print the Index used for any operation.
options msglevel=i;
Message in the LOG
INFO: Index idname selected for WHERE clause optimization.
Check for the presence of the Index in the SAS
The index of datasets information is available in DICTIONARY.INDEX and SASHELP.VINDEX, from which you can view to get all the useful information about indexes. Alternatively, you can also use the proc content procedure to check if an index exists for a dataset.
PROC SQL outobs=5;
SELECT *
FROM dictionary.indexes;
QUIT;
DATA TEST;
SET SASHELP.VINDEX;
run;
Output:
Dropping an Index in SAS
To drop an index, specify the index’s name after the DROP statement as below.
proc sql;
drop index idanme from students;
quit;