A RETAIN statement allows you to tell SAS not to set missing values to the variables during each iteration of the data step.
Basic Usage of Retain in SAS
data Example; input profit; datalines; 12 54 14 44 45 ; run; data example1; set example; cum_sum=cum_sum+profit; run; data example1; set example; retain cum_sum 0; cum_sum=cum_sum+profit; run;
In the above example, SAS resets the values of cum_sum to missing for each observation.
By adding the RETAIN statement the values of cum_sum are retained for the next iteration.
Points to Remember
- If you do not specify any variable names, then SAS retains the values of all of the variables created in an INPUT or assignment statement.
- SAS sets the initial value of a variable to be retained to missing if you don’t specify an initial value.
- It is also important to understand what retain does and what it does not.
The following items need not require in a RETAIN statement since their values are implicitly retained in a data step.
- Variables that are read with a
- Variables whose value is assigned in a
- Variables that are created by the
The RETAIN statement is not an executable statement, therefore it can appear anywhere in the DATA step.
Retain in SAS with BY Groups
For each age group, you want to check if BMI for that age is less than 18.5.
proc sort data=sashelp.bmimen out=bmi; by age; run; data underweight; length underweight $3.; set bmi; by age; retain underweight; if first.age then underweight="NO"; if bmi lt 18.5 then underweight="YES"; if last.age then output; run; proc print data=underweight(firstobs=215 obs=225); run;
This program uses the retained variable Underweight to “remember” if age ever had underweight. I.e BMI of less than 18.5.
As each new age is processed, Underweight is set to No.
Then, if any BMI is less than 18.5, Underweight is set to Yes. Because this value is retained, it remains equal to Yes even if the BMI is greater than 18.5 on all subsequent age.
When SAS reaches the last age group, an observation is written to the output data set.
Calculating Cumulative sum within BY Groups
First and Retain statements can be used together to calculate cumulative sum within each by group.
For example, you would like to determine the cumulative sales within each month. The last observation within the BY group (month) contains the total sales for that month and then resets the calculation for the next month.
data example3; do i=1 to 20; month=rand('integer', 1, 12); sales=rand('integer', 1, 100); output; end; drop i; run; proc sort data=example3; by month; data cum_sum; set example3; by month; retain cum_sum; if first.month then cum_sum=sales; else cum_sum=cum_sum + sales; run;
The RETAIN statement tells SAS to RETAIN the values of cum_sum for each observation within the BY group.
Generate Serial Number by Group
Another use of FIRST. and LAST. variables with the RETAIN statement is to generate sequential numbers within each BY group.
data count; set example3; by month; retain count; if first.month then count=1; else count+1; run;
In this case, we will start by creating a new variable, COUNTER, and tell SAS to retain that variable. Next, we will initialize the value of COUNTER to 1 at the start of each BY group using the FIRST.MONTH variable. For all remaining observations after the first observation, we will add 1 to the value of COUNTER, always retaining the previous value of COUNTER before adding 1. Here is what the syntax looks like: