In the previous post, we discussed Creating a single record from multiple records in SAS. What if you want to go the other approach, that is, creating a data set with multiple observations from a single observation per ID?
The process is simple, and we have taken the output from the previous example as input for this example. You can download the example dataset and programs from the link at the end of this article.
So, first, sort the input data set by ID.
Read: The Mystery of Proc Sort Options
proc sort data=single_to_multiple;
by id Visits;
run;
data multiple_to_single;
set single_to_multiple;
by id Visits;
array D{3};
retain D1-D3;
if first.id then call missing(of D1-D3);
D{Visits} = Diagnosis;
if last.id then output;
keep id D1-D3;
run;
Output:
Next, you must set up an array to hold the three-D values and retain these three variables. You have to retain these three variables because they don’t come from a SAS data set and are, by default, set equal to a missing value for each iteration of the DATA step.
Read: Using RETAIN in SAS to remember values
Read: Essential guide to using Arrays in SAS
The RETAIN
statement prevents this from taking place. Next, once you start processing the first visit for each ID, you set the three values of D to missing.
If you don’t do that, an ID with fewer than three visits could result in a diagnosis from the previous topic. The <a href="https://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=p1iq436yh8838rn1ud38om45n99k.htm&docsetVersion=9.4&locale=en">CALL MISSING</a>
routine can set any number of numeric and/or character values to missing at one time.
As with most of the SAS functions and CALL routines, if you happen to use a variable list in the form Var1–Varn, you have to precede the variable list with the key phrase OF
.
Next, you have to assign the value of Diagnosis to the appropriate D variable (D1 if Visit=1, D2 if Visit=2, and D3 if Visit=3).
Finally, if you’re processing the last visit for an ID, you must provide the output with a single observation keeping the variables ID and D1–D3.