Using Index in SAS to speed up programs6 min read

An index in SAS is used to logically sort your data without physically sorting it. If you are sorting and then re-sorting data to accomplish merges, you may find that indexes to be useful.

Creating an Index in SAS

Indexes are created using PROC DATASETS or through PROC SQL and even they can also be created in a DATA step.

Indexing is storing the order of the data like physically sorting If there is an index for a dataset, SAS will be able to access it and allows us to use the data set with the appropriate BY statement without sorting the data.

SAS Indexes are stored in a separate file, and the size of this file can be substantial large especially as the number of indexes, observations, and variables used to form the indexes are more. However, Indexes can substantially speed up processes.

Creating an index using proc datasets

PROC DATASETS LIBRARY=libref;
MODIFY SAS-data-set;
INDEX CREATE varlist / UNIQUE NOMISS
UPDATECENTILES = ALWAYS | NEVER | integer;
  • libref the SAS data library that is to be modified
  • INDEX CREATE statement is the list of variables for which you want to create an index.
  • The UNIQUE option specifies that key variable values must be unique within the SAS data set.
  • The NOMISS option specifies that no index entries are to be built for observations with missing key variable values.
  • The UPDATECENTILES option allows you to override when SAS updates the index’s centiles.

The UNIQUE, NOMISS, and UPDATECENTILES options are optional.

There are two types of Indexes- Simple and Composite.

Simple index consistes of one variable whereas composite index contains multiple variables.

data subjects;
 length subject_name $3;
 do subject_id=100 to 100000;
  do j=1 to 3;
   substr(subject_name, j)=byte(int(65+26*ranuni(0)));
  end;
  output;
 end;
run;
/*Creating single index*/
proc datasets library=work;
 modify subjects;
 index create subject_id / unique;
 run;
/*Creating multiple Index*/
proc datasets library=work;
 modify subjects;
 index create idname=(subject_id subject_name) / nomiss;
 run;

Using PROC SQL

CREATE  INDEX index-name ON data-set-name(varlist)

Note that the UNIQUE, NOMISS, and UPDATECENTILES options are not available while creating an index using PROC SQL.

/*Single Index*/
proc sql;
create unique index subject_id on work.subjects;
quit;
/*Multiple Index*/
proc sql;
create index idname on work.subject(subject_id subject_name);
quit;

DATA STEP

DATA data-set-name(INDEX=(varlist / <UNIQUE><NOMISS><UPDATECENTILES = | ALWYAYS | NEVER integer>
));
  • Data-set-name is the name of the new SAS data set name. Varlist is the name of the key variable.
  • The UNIQUE, NOMISS, and UPDATECENTILES options can be used while creating an index in the data step.
/*Single Index*/
data SubjectIndexed(index=(subject_id / unique));
set work.subject;
/*SAS Statements */
run;
/*Multiple Index*/
data SubjectIndexed(index=(idname=(subject_id subject_name) / nomiss));
set work.subjects;
/* SAS Statements…*/
run;

Using the index in SAS

Once, an Index is created by any of the above methods, you will want to use them. You can use the index in any of the four places described below.

  • WHERE statement in a DATA or PROC step
  • BY statement in a DATA or PROC step
  • KEY option on a SET or MODIFY statement

Using Index in a WHERE statement

The WHERE statement can be used in DATA and PROC steps as in the below example.

data subject1;
 set subjects;
 where subject_id eq 5678;
run;

Using the BY Statement

Using an index in a BY statement is simple. When the BY variable is an index, the index is automatically used, and the data does not need to be sorted.

data class(index=(age));
 set sashelp.class;
run;

data class2;
 set class;
 by age;

 if first.age;
run;

Using the KEY= Option

KEY= option can be used to look up a value when an index exists on only the data set that contains the values to be looked up. The KEY= option on the SET statement identifies an index that is to be used.

Please refer to the article below which has an example of using the KEY= option for performing table lookup.

Exploring the SET Statement in SAS

More on Indexes

It is not necessary that SAS will use an existing index just because Index has been created even when using a WHERE or BY statement.

SAS first analyzes if using an index will be more efficient than reading the entire data set sequentially.

There are a lot of factors which are considered before using an index. Some of these are dataset size, the availability of index and centile information.

Hence, If SAS predicts that it will be more efficient to use a specific index to return observations than to read the entire data set, then it will use that index. otherwise, it will read the entire data set, sequentially, to fetch the observations.

Mostly SAS makes a good decision to use an index or not. However, its internal algorithms can sometimes make a decision to not use Index considering the resources consumed when reading a large subset of data via an index are greater than reading the entire SAS data set.

Override the SAS System decision about whether to use an index.

The IDXNAME= and IDXWHERE= dataset option are used to override SAS’ default index usage.

IDXNAME – Using the IDXNAME dataset option will prevent SAS from an attempt to determine if the specified index is the best one or if a sequential search might be more resource-efficient.

IDXWHERE= This option enables you to override the SAS System decision about whether to use an index.

You can use any one of the above but not both.

data subject1;
   set subjects (idxname=idname);
   where idname < 100;
run;

Preventing SAS from using an Index

To prevent SAS from using an index, you can use the IDXWHERE=NO option

data subject1;
   set subjects (idxwhere=NO);
   where idname < 100;
run;

SAS will automatically use an index when you specify the KEY option on either a SET statement or a MODIFY statement.

How to find if SAS is using the index

Using the below system option, SAS will print the Index used for any operation.

options msglevel=i;

Message in the LOG

INFO: Index idname selected for WHERE clause optimization.

Check for the presence of Index in SAS

Index of datasets information is available in DICTIONARY.INDEX and SASHELP.VINDEX which you can view to get all the useful information about indexes. Alternatively, you can also use the proc content procedure to check if an index exists for a dataset.

PROC SQL outobs=5;
  SELECT *
  FROM dictionary.indexes;
  QUIT;
DATA TEST;
SET SASHELP.VINDEX;
run;
SAS INDEX

Dropping an Index in SAS

To drop an in index specify the name of the index after the DROP statement as below.

proc sql;
        drop index idanme
           from students;
     quit;

Important Points on Index in SAS

  • An index cannot be created on a SAS view.
  • Since resources are required to create indexes and these resources should be taken into consideration. It is good to keep the number of indexes to a minimum to reduce disk storage and update costs.
  • It is not beneficial to create an index for small tables since sequential access is faster on small tables.
  • Avoid creating an index on variables which have low cardinality or a small number of distinct values. For, example Gender will have two values of Male and Female.
  • Use indexes for queries that retrieve a relatively small subset of rows—that is, less than 15%.
  • Do not create more than one index that is based on the same column as the primary key.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share via
Copy link