PROC TABULATE in SAS is used to display descriptive statistics in table format. PROC TABULATE has a number of statements that define how this procedure will summarize the data.
The PROC TABULATE procedure is a substitute for PROC MEANS and PROC FREQ.
The PROC TABULATE procedure must have a CLASS statement to specify any categorical variables to summarize. Any numerical variables must be included in the VAR statement of PROC TABULATE.
Proc Report and Proc Tabulate are both used to create summary tables. Although, Proc Report can create detail tables with COMPUTE Blocks which cannot be done using PROC Tabulate.
PROC REPORT, everything is done with the
DEFINE statements, along with
With PROC REPORT, you can have custom lines between groups for example “TOTAL SALES of SUV cars in USA Region” whereas TABULATE Procedure does not have this feature.
For more details on which procedure to use on which scenarios, you can read the article on PROC TABULATE versus PROC REPORT.
The 3 statements are necessary for every Proc Tabulate step.
|CLASS||The CLASS statement is used to form groups within rows and columns.|
|VAR||The VAR statements include the numeric variables which need to be summarized.|
|TABLE||The TABLE statements contain the table definition.|
The TABLE statement is the most important statement for the PROC TABULATE step. It has dimensions and definitions within those dimensions.
The tables generated by TABULATE can have up to three dimensions to their definition which are page, row, and column.
These dimensions always appear in page, row, column order.
|PAGE||Pages defines how the individual pages are formed.|
|ROW||It is the most used dimensions and it defines the rows of the table within each page.|
|COLUMN||Columns are always present and it defines the columns within rows and pages.|
There has to be at least one column dimension and you cannot have a page dimension without having both row and column dimensions.
The general syntax of the TABLE statement is
table page, row, column;
To build the individual page with row, and column dimensions, you will have to use a combination of options or elements. The three types of options are:
|SINGULAR||It is used when a single element is needed.|
|CONCATENATED||Using this option you can join or concatenate multiple elements using a space.|
|NESTED||One element is nested within another to form a hierachy using an asterisk|
A singular element consists of a single variable. A basic example os a singular table is below.
title "Single Column Table Example"; proc tabulate data=sashelp.class; class sex; var weight; table sex,weight; run;
The analysis variable, Weight, is specified in the
VAR the statement, and a single column. Since there is no statistic specified, the default statistic of SUM is displayed.
You can combine multiple elements within columns and/or rows using the concatenated statement. A concatenated definition is formed when two or more space-separated elements are included in the same dimension.
title "Concatenated Elements Example"; proc tabulate data=sashelp.cars format=dollar8.; class Origin Type; var MSRP INVOICE; table Origin Type, MSRP INVOICE; run;
There are two classification variables(Origin and Type) in the above example. The label associated with each analysis variable is by default used in the column header.
The analysis and classification variables can be used in a page, row, or column dimensions.
Nested definitions allow us to create tables within tables. The nested elements can be classification variables, analysis variables, statistics, options, and modifiers.
These are designated as nested elements through the use of the asterisk.
title "Nested Elements Example"; proc tabulate data=sashelp.cars; class ORIGIN TYPE; var MSRP; table ORIGIN,TYPE*MSRP*(N max min); run;
In the above example. the row dimension(ORIGIN) is singular and the column dimension (TYPE) has the analysis variable MSRP nested within a classification variable TYPE.
Three space-separated statistics are concatenated into a group with parenthesis and then the group is nested under the variable TYPE.
Combination of Elements
TABLE statement can contain a combination of nested and concatenated elements.
These will include not only variables and statistics but options as well.
proc tabulate data=sashelp.cars; class Origin Type; var MSRP; table Type='Type of cars' all='Total' origin all='Total', msrp='Price in dollars'*(n max*f=dollar8. min*f=dollar8. mean) /box='Type of cars across regions'; keylabel n='Number of cars' max="Maximum Price" min='Minimum Price' mean='Average Price'; run;
The table definition has two concatenated elements(Type and Origin) in the row dimension.
ALL keyword summarizes across the associated elements which are Type and Origin.
Dollar Format(f=dollar8.) is used for Max and Min Statistics.
BOX= option adds the text in the upper left corner of the table.
KEYLABEL statement allows you to assign a text label to statistics and to the keyword ALL
Calculating Percentages Using PROC TABULATE
The PCTN and PCTSUM options request the calculation of percentages based on the denominator specified using angle brackets.
PCTN calculates the percentages based on counts (N), while PCTSUM calculates the percentages based on the total of an analysis variable.
Percentage based on counts
proc tabulate data=sashelp.cars; class origin drivetrain; table origin,drivetrain *(n pctn<drivetrain>='%') all; run;
Within each value of ORIGIN, the percentage of observations for each value of DRIVETRAIN is calculated.
Since PCTN is nested within ORIGIN, the denominator(DRIVETRAIN) is the total count for that value of ORIGIN.
From the above table, we can interpret that, 21.52% (out of 158 ) of All-Wheel drive is from Asia
Percentage Generation Statistics
Instead of providing denominator in angular brackets, you can also use several percentage generation statistics. For these statistics, the denominator is predetermined.
The denominator can be based on a report, page or column.
|Percentage applies to:||Percent Frequency||Percent Total|
proc tabulate data=sashelp.cars(where=(upcase(type) in ('SUV','SEDAN','SPORTS'))); class origin drivetrain type; tables origin*(type all)all, drivetrain*(n colpctn rowpctn pctn<type all>="Type PCTN" pctn<drivetrain>="DriveTrain PCTN") all(n colpctn rowpctn pctn<type all>); run;
Percentage based on the total of analysis variable
Proc Format; inValue status 'Dead' = 1 'Alive' = 0; Run; data heart; set sashelp.heart; Status_num= input( status, status. ); run; proc tabulate data=heart; class BP_Status sex; var status_num; table sex * (sum='Deaths' pctsum<bp_status>='% of row' pctsum<sex>='% of column' pctsum) all,bp_status*status_num; run;
<bp_status>sums the values of Deaths for all occurrences of BP_STATUS within the same value of SEX. Thus, for SEX=”FEMALE”, the denominator is 529+ 248+82 = 859 and the row percentage is 529/859*100 =61.58.
<BP_STATUS>sums the frequency counts for all occurrences of SEX within the same value of BP_STATUS. Thus, for BP_STATUS=”HIGH”, the denominator is 529+602 =1131 and the Percentage is 529/1131*100 = 46.77%
The third use of
PCTN has no denominator definition and therefore all class variables (BP_STATUS and SEX)are included in the denominator definition.
Thus, for all cells, the denominator is 529 + 602 + 248 + 379 + 82 + 82 = 1922 and the percentage is 529/1922*100 = 27.52%
You can also read Tips for Generating Percentages Using the SAS TABULATE Procedure for more advanced usage of PROC Tabulate.