Using PROC RANK for ranking variables

Home Contact

Base SAS

Subhro Kar

May 31, 2020

3 min

What is Proc Rank?

Computing rank of numeric variables

Ranking in descending order

Quartile Ranking

Partitioning Observations into Groups Based on Ranks

Ranking with BY group variable

Dealing with TIES

Using Proc Rank to find TOP N rows per BY Group

Ranking of variables is often necessary to analyze the performance or gain an insight on which are the values are on top or at the bottom and by using PROC RANK procedure, there is no need to write complex code using PROC SORT, MACRO calls and DATA STEPS to rank or decile these scores or values.

The problem with complex programs is these often do not handle score relationships, which can result in 2 or more scores with overlapping ranks. Using PROC RANK in SAS will provide a quick and simple way to rank or decile individuals which will handle relationships with the PROC RANK TIES option.

What is Proc Rank?

PROC RANK is a SAS procedure that calculates RANKS from one or more numerical variables in observations in the SAS dataset and creates a new data set capturing these rankings.

The rank procedure does not produce any printed output. Still, it has several options for specifying the order of rank, which handles relationships in variable values and can generate variable bins or groups based on the specification of the GROUPS option.

Computing rank of numeric variables

In the following example from SASHELP.CLASS dataset, We used PROC RANK to rank students based on their weight from lowest to highest.

proc rank data=sashelp.class out=class_r_low;
 var weight;
 ranks r_weight;
run;

proc sort data=class_r_low; 
by weight; 
proc print;

proc rank example7

The **OUT** option tells SAS to store the output of the RANK procedure in a SAS dataset.

The VAR option is the variable for which you want to compute the rank.

The RANKS option creates a new variable named r_weight with the rank value. The rank values will overwrite the original variable if the ranks option is not given.

The weight variable is ranked in ascending order by default. Observations 4 and 5 have the same weight, so the same rank is shared.

If there are ties, proc rank will calculate the mean and share the rank.

But how is 4.5 calculated behind the scene?

For observations 4 and 5, the sum of the raw ranks of the tied value ( 4 + 5 ) is calculated and divided by the number of observations which is 2 in this case.

So the average rank is calculated as (4+5)/2 = 4.5

Ranking in descending order

For ranking in descending order, i.e. The largest value of a variable has rank 1, and the lowest has the last rank, you can use the DESCENDING keyword as illustrated in the below example.

proc rank data=sashelp.class out=class_r_weight descending;
 var weight;
 ranks r_weight;
run;

proc sort data=class_r_low(keep=name weight r_weight); 
by weight; 
proc print;

proc rank example8

Quartile Ranking

You can generate ranks in groups like quartiles(4th), quintiles(5th), deciles(10th) or percentiles(100). The variable named in the RANKS statement will contain values ranging from 0 to 4 for the groups.

proc rank data=sashelp.class out=class_r_low group=4;
var age;
ranks r_age;
run;

proc sort data=class_r_low; 
by age; 
run;

![Proc Rank](../../assets/how-to-use-proc-rank-in-sas/proc-rank-example5.png)

How ranks are calculated with groups and ties

The values are assigned to groups ranging from 0 to the number of groups -1 based on tied values.

Partitioning Observations into Groups Based on Ranks

In some cases, you might want to find the highest or lowest two Ranks for analysis. In this case, you can use the GROUP option to group ranks.

We have eliminated the duplicate age by using PROC SORT with the NODUPKEY option from the input dataset for easy understanding. I will explain how to handle ties later in this article.

proc sort data=sashelp.class out=class3 nodupkey;
 by age;
run;

![Proc Rank](../../assets/how-to-use-proc-rank-in-sas/proc-rank-example3.png)

You see what happens when we apply the group= option for the input dataset.

proc rank data=class3 out=class_r_low group=2;
 var age;
 ranks r_age;
run;

![Proc Rank](../../assets/how-to-use-proc-rank-in-sas/proc-rank-example4.png)

Proc Rank places the data into the number of groups you specify. In this example, groups are 2. Two Students with the smallest age are assigned to group 0, and two students with the largest age value are placed in group 1.

If the number of observations in the input dataset is the multiple of the number of groups you specify, then each group will have the same number of observations.

If you had one more observation in the input dataset, that observation would wind up in Group 1, which is the higher group.

In this way, you can use Proc Rank to get an insight into the top 5% and bottom 5 % of the data. For this, you will need to specify the group as 20.

Ranking with BY group variable

You can rank variables based on the group. For example, ranking the age of students grouped by gender. It is essential to sort the data based on the BY variables you want to group.

/*Sorting the data based on sex*/
proc sort data=sashelp.class out=class;
by sex;
run;

proc rank data=class out=class_r_low ties=low; 
by sex; 
var age; 
ranks r_age; 
run;

proc print data=class_r_low n; 
by sex; 
run;

![Proc Rank](../../assets/how-to-use-proc-rank-in-sas/proc-rank-example6.png)

Dealing with TIES

This is the most important and useful feature of this procedure. The TIES option allows us to control the ranking number when there is a TIE between the same Rank. There are four options that you can use for dealing with TIES.

LOW The tied Values are assigned to the lower rank with the LOW option specified.
HIGH Tied values are assigned to the higher rank.
MEAN Tied values are assigned the mean of the corresponding ranks, which is also the default option.
DENSE The ranks are consecutive integers that begin with one and end with the number of unique values of the VAR variable.

Below is an example of all the options and their result on the Ranking of the variable age.

NAME	AGE	LOW	HIGH	DENSE
Philip	16	1	1	1
Marry	15	2	5	2
Jannet	15	2	5	2
Ronald	15	2	5	2
William	15	2	5	2
Carol	14	6	9	3
Henry	14	6	9	3
Judy	14	6	9	3

Using Proc Rank to find TOP N rows per BY Group

You can solve the “top N” questions using the Proc Rank step, and the following example demonstrates how to select the top 3 students with the highest weight value for sex.

proc sort data=sashelp.class out=class;
by sex;
run;

proc rank data=class out=r_test descending; 
by sex; 
var weight; 
ranks r_weight; 
run;

proc sort data=r_test; 
by sex r_weight; 
where r_weight <=2; 
run;

![Proc Rank TOP N](../../assets/how-to-use-proc-rank-in-sas/top-n.jpg)

Table Of Contents

What is Proc Rank?

Computing rank of numeric variables

Ranking in descending order

Quartile Ranking

Partitioning Observations into Groups Based on Ranks

Ranking with BY group variable

Dealing with TIES

Using Proc Rank to find TOP N rows per BY Group

Tags

Share

Related Posts

Table Of Contents

.css-bz6hia{box-sizing:border-box;margin:0;min-width:0;display:block;color:var(--theme-ui-colors-heading,#f5f5f7);font-weight:bold;-webkit-text-decoration:none;text-decoration:none;margin-bottom:1rem;font-size:1.5rem;position:relative;}What is Proc Rank?

Computing rank of numeric variables

Ranking in descending order

Quartile Ranking

Partitioning Observations into Groups Based on Ranks

Ranking with BY group variable

Dealing with TIES

Using Proc Rank to find TOP N rows per BY Group

Tags

Share

Related Posts

What is Proc Rank?