Count missing and Non-missing values for each variable – In SAS, we often need to get the count of missing and non-missing values in a SAS dataset. The code in this example uses PROC FORMAT to create the format for character and numeric variables to be either “non-missing” or “missing” and then use that format with PROC FREQ.
We have used SASHELP.HEART dataset as an example.
We then get a compact table showing us the number of missing and non-missing for the variable type.
Below are the steps to help you understand the working of this macro.
Step 1. Create a user-defined format for the group missing and non-missing values.
proc format;
value $missfmt ' '='Missing Values' other='Non-Missing';
value missfmt .='Missing Values' other='Non-Missing';
run;
Step 2. Now, let’s create a macro variable with the count of variables in the dataset.
%let dsid=%sysfunc(open(SASHELP.HEART)); /*Open the dataset*/
%let cntvar=%sysfunc(attrn(&dsid, nvars)); /* The value of cntvar is 17*/
Step 3. Get the list of all variables with their type. I have used the PROC CONTENTS procedure to get the variable names and their type in a SAS dataset.
proc contents data=SASHELP.HEART varnum out=var(keep=name type) noprint;
run;
Below is the results of the output dataset generated using the PROC CONTENT Procedure.
Step 4. Now, we can create macro variables for each value of the SAS dataset.
data _null_;
set var;
suffix=put(_n_, 5.);
call symputx(cats('Name', suffix), Name);
call symputx(cats('Type', suffix), Type);
run;
Step 5. The next step is to find the count of missing and non-missing values using the PROC FREQ procedure.
For simplicity, we have taken only one variable from the dataset.
Notice the /missing keyword. This tells SAS to generate the frequency count of missing values as well.
Step 6. The next step is to apply the user-defined format to group the missing and non-missing values.
%macro test;
proc freq data=SASHELP.HEART;
tables AgeAtDeath /missing nopercent nocum nofreq nopercent
out=out1(drop=percent rename=(AgeAtDeath=value));
format AgeAtDeath
%if type=2 %then
%do;
$missfmt. %end;
%else
%do;
missfmt. %end;
;
run;
%mend;
%test;
Step 7. Enclose the above steps inside a macro, and a do loop to get the variables’ data. The results will be as below.
Step 8. Now, add the variable name as a value in each dataset to combine and create a single dataset.
data out&i;
set out&i;
length varname $32.;
varname="&&name&i";
Step 9. The below step combines all the datasets using the set statement.
data final;
set %do i=1 %to &cntvar;
out&i
%end;
run;
The combined dataset looks as below.
Step 10. Now, reshape the combined dataset using PROC TRANSPOSE.
proc transpose data=final out=combine(drop=_:);
by varname;
id value;
var count;
run;
here is the final code…
data hearts;
set sashelp.heart;
run;
proc format;
value $missfmt ' '='Missing Values' other='Non-Missing';
value missfmt .='Missing Values' other='Non-Missing';
run;
%macro varCounts (dsn=);
%let dsid=%sysfunc(open(&dsn));
%let cntvar=%sysfunc(attrn(&dsid, nvars));
%put &dsid;
%put &cntvar;
proc contents data=&dsn varnum out=var(keep=name type) noprint;
run;
proc print data=var;
data _null_;
set var;
suffix=put(_n_, 5.);
call symputx(cats('Name', suffix), Name);
call symputx(cats('Type', suffix), Type);
run;
%do i=1 %to &cntvar;
proc freq data=&dsn noprint;
tables &&name&i /missing nopercent nocum nofreq nopercent
out=out&i(drop=percent rename=(&&name&i=value));
format &&name&i
%if &&type&i=2 %then
%do;
$missfmt. %end;
%else
%do;
missfmt. %end;
;
run;
data out&i;
set out&i;
varname="&&name&i";
%if &&type&i=1 %then
%do;
value1=put(value, missfmt.);
%end;
%else %if &&type&i=2 %then
%do;
value1=put(value, $missfmt.);
%end;
drop value;
rename value1=value;
run;
%end;
data final;
set %do i=1 %to &cntvar;
out&i %end;
;
run;
proc transpose data=final out=combine(drop=_:);
by varname;
id value;
var count;
run;
proc print data=combine;
%mend;
%varCounts(dsn=hearts);
Very Helpful!
Thank you!!
Hi there, is there any way to prevent the ‘varname’ from being truncated?
yes you need to define the length for varname to maximum value you think is possible.