PROC SUMMARY in SAS procedures allows us to explore our data in terms of counts and distributions and statistically.
Proc summary in SAS Example
For this article, we have taken the example datasets SASHELP.SHOES.
We can use the following code to calculate descriptive statistics for the sales variable:
proc summary data=sashelp.shoes;
var sales;
OUTPUT OUT=SUMDS;
run;
proc print data=sumds;
The VAR statement names the numeric variables to be analyzed.
Here’s how to interpret the output table:
- _TYPE_: This column shows whether or not every row in the dataset was used to calculate the descriptive statistics. 0 = Every row was used.
- _FREQ_: The number of rows used to calculate each descriptive statistic.
- _STAT_: The name of the descriptive statistic.
- sales: The numerical value for the corresponding descriptive statistic.
Proc Summary with Multiple Variables
To get descriptive statistics for more than one variable simultaneously, just list the names of the variables in the var statement.
For example, to find the descriptive statistics for the Sales and returns variables, we can use the following code:
proc summary data=sashelp.shoes;
var sales returns;
OUTPUT OUT=SUMDS;
run;
proc print data=sumds;
Choose Statistics to Calculate
Most of the time, we don’t want all the default statistics in the SAS data set we’re creating. We can choose which statistics to show in the Output Statement of Proc Summary.
proc summary data=sashelp.shoes;
var sales returns;
OUTPUT OUT=SUMDS mean=mean sum=sum;
run;
proc print data=sumds;
Proc Summary by group
We can use the class statement to find descriptive statistics for one variable grouped by another.
The CLASS statement in PROC SUMMARY names the character or numeric variables you want the data to classify. The variables listed on the CLASS statement should be categorical. That is, they should have a few numbers of discrete values.
For example, we can use the following code to find descriptive statistics for Weight grouped by Species:
proc summary data=sashelp.shoes;
var sales;
class region;
OUTPUT OUT=SUMDS;
run;
proc print data=sumds;
The ID statement – With the ID Statement, you can retain any variable to the output dataset other than the variables specified in the class and var statements.
Remember that variables that you specify with the ID statement are not summarized. Rather, the value of the ID variable on the last row summarized is retained.
OUTPUT OUT – This statement names the output SAS data set. It also defines the statistics for the variables and the variable name in the output dataset.
Available options on the Output Statement
- OUTPUT OUT=: N NMISS PRT VAR MEAN RANGE CSS MIN SKEWNESS STD USS CV SUMWGT MAX KURTOSIS
See the Difference between Proc Means and Proc Summary.
Automatic Variables In Proc Summary
The SUMMARY procedure creates two variables automatically: _FREQ_ and _TYPE_.
_FREQ_ – This variable stores the number of rows from the input SAS dataset summarized into every row.
_TYPE_ – It contains a numeric value identifying the level of interaction between the variables in the CLASS list.
When a BY statement is used, the _TYPE_ variable will always equal 0.
When the CLASS statement is used, the _TYPE_ variable will contain 0 for a total row and values of 1 through n for various levels of interaction between the variables in the CLASS list.
Proc Summary Options
There are two important SUMMARY procedure options: MISSING and NWAY.
MISSING – The Missing option instructs the SUMMARY procedure to consider missing values in a class variable when creating summary rows.
NWAY – This option instructs the SUMMARY procedure only to create rows with a combination of all class variables.
DESCENDING – Arrange the lowest summary levels first (by default, they are arranged ascending)
ORDER= – Specify sort order of CLASS variables
Note that these options are available only when used with the CLASS statement, not with the BY statement.
When the SUMMARY procedure is used with the BY statement, it will produce the same output file as when used with the CLASS statement combined with the NWAY option.
The Options AUTONAME and AUTOLABEL request the procedure to create unique and meaningful column names and labels for the results.
Example
The below code snippet produces summary statistics like – Total Sales, Average Sales, and Minimum and Maximum Sales for each region from the SASHELP.SHOES dataset.
proc summary data=sashelp.shoes;
class Region;
var sales;
OUTPUT OUT=SUMDS SUM=TOTSales MEAN=AvgSales MIN=
MAX= /autoname autolabel;
run;
The Options AUTONAME and AUTOLABEL request the procedure to create unique and meaningful column names and labels for the results.
How Are The TYPE Values Useful?
The TYPE variable is a part of the output SAS data set.
Using the _TYPE_ Variable, you can query against this variable’s value and use it to create different reports, each containing different information, with different levels of detail and summarization.
We highly recommend reading the article written by Art Carpenter’s – which contains the relationship between _type_ and a class variable.
Interpreting _TYPE_values:
The below code snippet is used to group sashelp.cars dataset by Origin, Type and Drivetrain.
proc summary data=sashelp.cars;
class origin type drivetrain;
var msrp;
output out=cars_summary sum= mean= /autolabel autoname;
run;
TYPE =0 Represents the entire data set
TYPE =1 Origin (across all Origins)
TYPE =2 Represents Type(across all Types within Origins)
TYPE =3 Represents Drivetrain within Types and origins
After a PROC SUMMARY, a series of PROC PRINTs could be coded to select off different TYPE values and to create several different reports, each with a different level of information:
Suppose you want to find the Total MSRP by Region.
title "Summary Statistics by Origin";
proc print data=cars_suummary;
where _type_=4;
run;
title "Summary Statistics by Origin and Type";
proc print data=cars_suummary;
where _type_=6;
run;
Figuring Out TYPE Values
There are two ways to see your _TYPE_values:
Apply PROC PRINT to the output SAS data set, which may result in a LOT of output if you have several CLASS variables.
Another way is to manipulate the data to print only the first value of each _TYPE_ variable, which is enough to find out which _TYPE_ values you need (be careful of doing this if you have used the MISSING option on the PROC SUMMARY statement).
DATA cars;
set cars_summary;
BY _TYPE_;
IF FIRST. _TYPE_;
RUN;
PROC PRINT DATA=cars;
T1TLE1 'PRINTING ONLY THE FIRST OCCURRENCE OF EACH TYPE VALUE';
RUN;
Proc Summary NWAY and LEVELS option
As discussed earlier, with the NWAY option, you can have only the combination of class variables in the output.
To demonstrate this see the example below where I have used the NWAY option to find the MSRP sum of Cars by Region and Type.
proc summary data=sashelp.cars nway;
class origin type;
var msrp;
output out=summary_nway sum=msrp;
run;
Until now, you have seen that adding a class variable increases the _TYPE_ variable and eventually increases the output number.
Using the NWAY option, you can have only the combination of Origin and Type class variables.
WAYS Statement in Proc Summary
You can specify the number of ways to combine the class variables in the ways statement.
For example, if you want only the row combination of Origin, type and Drivetrain, then use the following WAYS statement:
proc summary data=sashelp.cars;
class origin type drivetrain;
var msrp;
ways 1;
output out=summary_nway sum=msrp;
run;
You can also request multiple ways. For example, if you want all the rows representing a combination of two class variables – and you want the row representing the total (no combination of any class variables), you can use the following WAYS statement:
WAYS 0 2;
Optional Variables
You can use the LEVELS and WAYS options to the OUTPUT statement to include in the _LEVEL_ and _WAY_ variables in the output.
The _LEVEL_ variable contains a value from 1 to n that indicates the combination of class variables.
The _WAY_ variable contains a value from 1 to the maximum number of class variables, indicating how many class variables the SUMMARY procedure combines to create a row in the output SAS data set.
To use these options, add them to the OUTPUT statement after a “/”.
proc summary data=sashelp.cars;
class origin type;
var msrp;
output out=summary_type sum=msrp /levels ways;
run;
TYPES Statements
The TYPES statement creates summary rows for combinations of variables that you specify in the CLASS statement.
The TYPES statement does not work with the BY statement.
You use this statement by specifying each combination of class variables you want to be included in the summary output by stating the class variables separated by an asterisk.
For example, if you want only the row combination of Origin and type, and you want the rows for DIV only, then use the following TYPES statement:
TYPES Origin*Type Origin;
proc summary data=sashelp.cars;
class origin type;
var msrp;
types origin*type Origin;
output out=summary_type sum=msrp /levels ways;
run;
If you also want the total row, use the syntax TYPES()
PROC SUMMARY Without a VAR Statement
You can use the PROC SUMMARY procedure without a VAR statement. In this case, it displays the counts or the number of occurrences of your CLASS variables’ values. This gives PROC SUMMARY the same functionality that we find in PROC FREQ.
PROC SUMMARY DATA=sashelp.cars;
CLASS origin drivetrain;
OUTPUT OUT=cars1;
RUN;
PROC PRINT DATA=cars1;
TITLE 'RESULT WITHOUT USING A VAR';
RUN;
PROC SUMMARY Without a CLASS Statement
PROC SUMMARY does not need to have a CLASS statement. PROC SUMMARY must have either a CLASS or a VAR statement, but it does not need to contain both. When no CLASS statement is provided, only a _TYPE_ =0 record is produced, and no other levels of variables are created.
PROC SUMMARY DATA=sashelp.cars;
VAR msrp;
OUTPUT OUT=cars2 SUM(msrp)= /autoname /autolabel;
RUN;
The Takeaway:
So, this was our side of the PROC SUMMARY Procedure. We hope that you must have found this tutorial useful.
Moreover, if you have any other suggestions regarding other plagiarism tools, suggest us below the comment section. We will take those lists in our further blog post.
Thanks for reading!
If you liked this article, you might also want to read PROC MEANS and PROC FREQ.
Do you have any tips to add? Let us know in the comments.
Please subscribe to our mailing list for weekly updates. You can also find us on Instagram and Facebook.