SAS data sets consist of a descriptor portion and a data portion that contains the data values. The descriptor portion of a SAS data set holds the detailed information about the data set. This information includes:
- Dataset name and its ember type
- Creation time of the dataset
- The number of observations
- The number of variables
- The engine type
Most of the times we need to count the numbers of observations in a SAS dataset and pass it to a macro variable. Reading the descriptor portion is one of the most efficient and quickest ways of determining the number of observations in a SAS data set.
In this post, we will see various methods to count the number of rows (records) in a SAS dataset.
1. Using PROC SQL
proc sql;
select count(*) into :cnt from sashelp.class;
quit;
%put &cnt.;
Using the PROC SQL method is not considered to be an efficient way as it does not use metadata information of the SAS dataset. Instead, it reads through each record (row) of your SAS dataset which requires processing power. However, it is simple to understand and develop and can be used for smaller datasets.
2. Using END= Statement
data _null_;
set sashelp.class end=eof;
count+1;
if eof then
call symput("nobs", count);
run;
%put &nobs;
This method is also not efficient. It reads the entire data set and increments a counter to pick up the last value of the dataset. The END option returns true or 1 if the observation is the last observation in the dataset.
3. Using the Data Step
data _null_;
set sashelp.class nobs=obs;
call symputx('nobs', obs);
run;
%put &nobs.;
NOBS is a SAS automatic variable that contains the number of rows in a dataset and NOBS = obs holds the count of records in the variable obs.
CALL SYMPUTX is a DATA Step call routine that assigns a value produced in a DATA step to a macro variable.
4. Using IF 0 and STOP statement
data _NULL_;
if 0 then
set sashelp.class nobs=n;
call symputx('totobs', n);
stop;
run;
%put no. of observations = &totobs;
During the compilation phase, the data step reads in variables from the data set in the Set Statement. During execution, SAS reads in the observations from the input data set in sequential order. By using if 0 then set, we can bypass the execution part of the Set Statement which is conditional logic that always fails.
The ‘if 0‘ statement is not processed at all because the IF statement does not hold TRUE. The whole IF-THEN statement is used to pull the header information of the data set and later pass it to the compiler to adjust it to the PDV.
If 0 Then Set can also be coded as If (1=2) Then Set
The STOP statement is used to stop an endless loop.
Read – Exploring the SET Statement in SAS
5. Proc SQL Dictionary Method
proc sql noprint;
select nobs into :totobs separated by ' ' from dictionary.tables
where libname='SASHELP' and memname='CARS';
quit;
%put total records = &totobs.;
The metadata information of a dataset can be accessed with PROC SQL Dictionary.Tables.It is an efficient method as it does not look into each value of a dataset to determine the count.
The LIBNAME= refers to the name of the library in which data is stored. The MEMNAME= refers to the SAS table (dataset). The separated by ‘ ‘ is used in this case to left-align the numeric value.
6. Using the Library table – SASHELP.VTABLE
data _null_;
set sashelp.vtable;
where libname="SASHELP" and memname="CLASS";
Num_obs=Nobs-Delobs;
put "Nobs - Delobs: num_obs = " num_obs;
call symputx('obs_cnt',num_obs);
run;
%put &obs_cnt;
As we know that SAS library tables contain metadata relating to your data sets, and we can get this information from the SASHELP.VTABLE by specifying the LIBNAME and the name of the data set.
The two variables you need from the table are Nobs which contains the total number of observations including ones marked for deletion and Delobs contains only the number of observations marked for deletion.
This method of determining the number of observations in a SAS data set has an advantage over the previous methods described so far. That is if you (or other people) are modifying a data set, you need to know the total number of observations in a data set as well as the number of observations that have been marked for deletion (but are still counted when you use the NOBS= SET option).
7. Using PROC SQL automatic variable – SQLOBS
proc sql noprint;
select * from sashelp.class;
run;
%put &sqlobs.;
Proc SQL automatically creates SALOBS macro variable, when it runs a SQL Step. SQLOBS macro variable will have the number of observations count of the last proc SQL statement executed.
8. Using Data Access Functions
%macro totobs(mydata);
%let mydataID=%sysfunc(OPEN(&mydata.,IN));
%let NOBS=%sysfunc(ATTRN(&mydataID,NOBS));
%let RC=%sysfunc(CLOSE(&mydataID));
&NOBS
%mend;
%put %totobs(sashelp.cars);
References
How Many Observations Are In My Data Set?