Learn how to create Summary Reports using PROC Tabulate

0
131

PROC TABULATE in SAS is used to display descriptive statistics in table format. PROC TABULATE has a number of statements that define how this procedure will summarize the data.

PROC MEANS is used to summarize numerical data and PROC FREQ is used to obtain summaries for categorical data.

The PROC TABULATE procedure is a substitute for PROC MEANS and PROC FREQ.

The PROC TABULATE procedure must have a CLASS statement to specify any categorical variables to summarize. Any numerical variables must be included in the VAR statement of PROC TABULATE.

Proc Report and Proc Tabulate are both used to create summary tables. Although, Proc Report can create detail tables with COMPUTE Blocks which cannot be done using PROC Tabulate.

TABULATE produces 3 dimensions of reports which are PAGE, ROW and COLUMN dimension using a TABLE a statement along with a CLASS and VAR a statement which is used for calculating a summary function.

In PROC REPORT, everything is done with the COLUMN and DEFINE statements, along with COMPUTE blocks.

With PROC REPORT, you can have custom lines between groups for example “TOTAL SALES of SUV cars in USA Region” whereas TABULATE Procedure does not have this feature.

For more details on which procedure to use on which scenarios, you can read the article on PROC TABULATE versus PROC REPORT.

The 3 statements are necessary for every Proc Tabulate step.

CLASSThe CLASS statement is used to form groups within rows and columns.
VARThe VAR statements include the numeric variables which need to be summarized.
TABLEThe TABLE statements contain the table definition.

The TABLE statement is the most important statement for the PROC TABULATE step. It has dimensions and definitions within those dimensions.

The tables generated by TABULATE can have up to three dimensions to their definition which are page, row, and column.

These dimensions always appear in page, row, column order.

PAGEPages defines how the individual pages are formed.
ROWIt is the most used dimensions and it defines the rows of the table within each page.
COLUMNColumns are always present and it defines the columns within rows and pages.

There has to be at least one column dimension and you cannot have a page dimension without having both row and column dimensions.

The general syntax of the TABLE statement is

table page, row, column;

To build the individual page with row, and column dimensions, you will have to use a combination of options or elements. The three types of options are:

SINGULARIt is used when a single element is needed.
CONCATENATEDUsing this option you can join or concatenate multiple elements using a space.
NESTEDOne element is nested within another to form a hierachy using an asterisk

Singular Elements

A singular element consists of a single variable. A basic example os a singular table is below.

title "Single Column Table Example";
proc tabulate data=sashelp.class; 
class sex; 
var weight;
table sex,weight;  
run; 
Learn how to create Summary Reports using PROC Tabulate 1

The analysis variable, Weight, is specified in the VAR the statement, and a single column. Since there is no statistic specified, the default statistic of SUM is displayed.

Concatenated Elements

You can combine multiple elements within columns and/or rows using the concatenated statement. A concatenated definition is formed when two or more space-separated elements are included in the same dimension.

title "Concatenated Elements Example";
proc tabulate data=sashelp.cars format=dollar8.; 
class Origin Type; 
var MSRP INVOICE; 
table Origin Type, MSRP INVOICE; 
run;
Learn how to create Summary Reports using PROC Tabulate 3

There are two classification variables(Origin and Type) in the above example. The label associated with each analysis variable is by default used in the column header.
The analysis and classification variables can be used in a page, row, or column dimensions.

Nested Elements

Nested definitions allow us to create tables within tables. The nested elements can be classification variables, analysis variables, statistics, options, and modifiers.

These are designated as nested elements through the use of the asterisk.

title "Nested Elements Example";
proc tabulate data=sashelp.cars; 
class ORIGIN TYPE; 
var MSRP; 
table ORIGIN,TYPE*MSRP*(N max min);
run;
Learn how to create Summary Reports using PROC Tabulate 5

In the above example. the row dimension(ORIGIN) is singular and the column dimension (TYPE) has the analysis variable MSRP nested within a classification variable TYPE.

Three space-separated statistics are concatenated into a group with parenthesis and then the group is nested under the variable TYPE.

Combination of Elements

TABLE statement can contain a combination of nested and concatenated elements.
These will include not only variables and statistics but options as well.

proc tabulate data=sashelp.cars;
 class Origin Type;
 var MSRP;
 table Type='Type of cars' all='Total' origin all='Total', 
  msrp='Price in dollars'*(n max*f=dollar8. min*f=dollar8.
 mean) /box='Type of cars across regions';
 keylabel n='Number of cars' max="Maximum Price" min='Minimum Price' 
  mean='Average Price';
run;
Learn how to create Summary Reports using PROC Tabulate 7

The table definition has two concatenated elements(Type and Origin) in the row dimension.
The ALL keyword summarizes across the associated elements which are Type and Origin.

Dollar Format(f=dollar8.) is used for Max and Min Statistics.

The BOX= option adds the text in the upper left corner of the table.

The KEYLABEL statement allows you to assign a text label to statistics and to the keyword ALL

Calculating Percentages Using PROC TABULATE

The PCTN and PCTSUM options request the calculation of percentages based on the denominator specified using angle brackets.

PCTN calculates the percentages based on counts (N), while PCTSUM calculates the percentages based on the total of an analysis variable.

Percentage based on counts

proc tabulate data=sashelp.cars; 
class origin drivetrain; 
table origin,drivetrain *(n pctn<drivetrain>='%') all;
run;
Learn how to create Summary Reports using PROC Tabulate 9

Within each value of ORIGIN, the percentage of observations for each value of DRIVETRAIN is calculated.

Since PCTN is nested within ORIGIN, the denominator(DRIVETRAIN) is the total count for that value of ORIGIN.

From the above table, we can interpret that, 21.52% (out of 158 ) of All-Wheel drive is from Asia

Percentage Generation Statistics

Instead of providing denominator in angular brackets, you can also use several percentage generation statistics. For these statistics, the denominator is predetermined.

The denominator can be based on a report, page or column.

Percentage applies to:Percent FrequencyPercent Total
Reportreppctnreppctsum
Pagepagepctnpagepctsum
Columncolpctncolpctsum
Rowrowpctnrowpctsum
proc tabulate data=sashelp.cars(where=(upcase(type) in ('SUV','SEDAN','SPORTS')));
class origin drivetrain type;
tables origin*(type all)all,
drivetrain*(n colpctn rowpctn pctn<type all>="Type PCTN" pctn<drivetrain>="DriveTrain PCTN") all(n colpctn rowpctn pctn<type all>);
run;
Learn how to create Summary Reports using PROC Tabulate 11

Percentage based on the total of analysis variable

Proc Format;
inValue status 
'Dead'  = 1
'Alive' = 0;
Run;
data heart;
set sashelp.heart;
Status_num= input( status, status. );
run;

proc tabulate data=heart;
class BP_Status sex;
var status_num;
table sex * (sum='Deaths' pctsum<bp_status>='% of row' pctsum<sex>='% of column' pctsum) all,bp_status*status_num;
run;
Learn how to create Summary Reports using PROC Tabulate 13

<bp_status>sums the values of Deaths for all occurrences of BP_STATUS within the same value of SEX. Thus, for SEX=”FEMALE”, the denominator is 529+ 248+82 = 859 and the row percentage is 529/859*100 =61.58.

<BP_STATUS>sums the frequency counts for all occurrences of SEX within the same value of BP_STATUS. Thus, for BP_STATUS=”HIGH”, the denominator is 529+602 =1131 and the Percentage is 529/1131*100 = 46.77%

The third use of PCTN has no denominator definition and therefore all class variables (BP_STATUS and SEX)are included in the denominator definition.

Thus, for all cells, the denominator is 529 + 602 + 248 + 379 + 82 + 82 = 1922 and the percentage is 529/1922*100 = 27.52%

You can also read Tips for Generating Percentages Using the SAS TABULATE Procedure for more advanced usage of PROC Tabulate.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.