Most SAS programmers are familiar with the SET and MERGE statements in the Data Step. The SAS UPDATE Statement, on the other hand, is less well-known. Nevertheless, it can result in a much more efficient DATA steps programming under the right circumstances.
With the help of this article, you’ll learn about some of the built-in features of the Update Statement and how you may make use of its effectiveness.
In SAS, you can use either the MERGE statement or the UPDATE statement in a DATA step to update the values of observations in a master data set. The BY statement should follow both statements, which specify the primary key (variable) sorted by the SORT procedure.
The SAS UPDATE statement is a special kind of merge. Its purpose is to apply transactions to a master file in the form of a SAS dataset (observations from another SAS dataset).
The UPDATE statement is used to do the following:
- Update variable values in the master SAS dataset
- Add observations to the master SAS dataset.
Syntax:
UPDATE master-data-set transaction-data-set <UPDATEMODE=MISSINGCHECK | NOMISSINGCHECK>;
BY by-variable;
Master Dataset
It specifies the SAS data set used as the master file.
Transaction dataset
The master and transaction data sets typically contain the same variables. To reduce processing time, create a transaction dataset containing only the variables that need to be updated. The transaction data set may also include new variables added to the output data set.
Each observation in the output data set corresponds to one in the master data set. If any transaction observations do not match master observations, they are added as new observations to the output data set. Observations that will not be updated can be excluded from the transaction data set.
UPDATEMODE=MISSINGCHECK | NOMISSINGCHECK
UPDATEMODE=MISSINGCHECK
It prevents a transaction data set’s missing variable values from being replaced by values in the master data set. It’s the default setting.
UPDATEMODE=NOMISSINGCHECK
It specifies whether missing variable values in a transaction data set are allowed to replace existing variable values in a master data set. However, special missing values are the exception and replace values in the master data set even when MISSINGCHECK (the default) is in effect.
Requirements for Update Statement
The rules and requirements for using the UPDATE statement are:
- The SAS UPDATE statement must be followed by a BY statement that specifies the variables by which observations are matched.
- The data sets listed in the UPDATE statement must be sorted by the BY variable values or have an appropriate index.
- Each observation in the master data set should have a unique value of the BY variable or BY variables. If there are multiple values for the BY variable, only the first observation is updated. The transaction data set can contain more than one observation with the same BY value. (Multiple transaction observations are all applied to the master observation before it is written to the output file.)
A Basic SAS Update Statement Example
The program statements below create a new dataset paylist_new By applying transactions to a master data set (paylist2). The BY variable IdNum must appear in both paylist2 and paylist_update and its values in the master data set should be unique:
option yearcutoff=1930;
data paylist_update;
input IdNum $ gender $ Jobcode $ Salary Birth hired;
informat birth hired date7.;
format birth hired date7.;
datalines;
1401 M TA3 38822 13DEC55 17NOV93
1919 M TA2 34376 12SEP66 04JUN87
2021 F TA3 35000 01MAY89 01JAN22
;
run;
proc sort data=sql.paylist2;
by idnum;
run;
proc sort data=paylist_update;
by idnum;
run;
data Paylist_new;
update sql.paylist2 paylist_update;
by idnum;
run;
Updating By Renaming Variables
This example shows renaming a variable in the health_update data set so that the same variable in the program data vector is not overwritten.
The WEIGHT variable is renamed in the master dataset- health, and a new WEIGHT variable is calculated.
/*Transaction Data Set*/
data health_update;
input IdNum weight;
datalines;
7258 195
6726 130
1025 150
;
run;
/* Sort both data sets by IDNum */
proc sort data=health_update;
by idnum;
quit;
proc sort data=cert.health;
by idnum;
quit;
/* Update Master with Transaction */
data health_new;
length STATUS $11;
update cert.health(rename=(weight=Original) in=a) health_update(in=b);
by idnum;
if a and b then
do;
Change=abs(Original - weight);
if weight<Original then
status='Reduced';
else if weight>Original then
status='Increased';
else
status='Same';
end;
else
do;
status='No Updates';
weight=original;
end;
run;
Updating with Missing Values
The UPDATEMODE=MISSINGCHECK
option is enabled by default, which means that missing values in the transaction data set do not overwrite existing values in the master data set. Specify the UPDATEMODE=NOMISSINGCHECK
option in the UPDATE statement if you want missing values in the transaction data set to replace existing values in the master data set.
data paylist_update;
input IdNum $ gender $ Jobcode $ Salary Birth hired;
informat birth hired date7.;
format birth hired date7.;
datalines;
1401 M TA3 . 13DEC55 17NOV93
1919 M TA2 . 12SEP66 04JUN87
1350 F FA3 33000 31AUG55 29JUL91
1499 M . 23025 26APR74 07JUN92
;
run;
proc sort data=paylist_update;
by idnum;
run;
data Paylist_new;
update paylist2 paylist_update updatemode=nomissingcheck;
by idnum;
run;
Special Missing Values (A-Z)
Suppose you want to update some variables in the transaction data set that have missing values, but not all of them.
You can define a special missing value with the MISSING statement and then place that in the transaction data set where the special character should appear.
Any single character from A to Z is a valid value for these special missing value characters.
Even if UPDATEMODE=MISSINGCHECK is enabled, you can use the MISSING statement with a letter assignment to replace existing values with missing values.
data paylist_update;
input IdNum $ gender $ Jobcode $ Salary Birth hired;
informat birth hired date7.;
format birth hired date7.;
missing Z;
datalines;
1401 M TA3 . 13DEC55 17NOV93
1919 M TA2 Z 12SEP66 04JUN87
1350 F FA3 33000 31AUG55 29JUL91
1499 M Z 23025 26APR74 07JUN92
;
run;
proc sort data=paylist_update;
by idnum;
run;
data Paylist_new;
update sql.paylist2 paylist_update ;
by idnum;
run;
Underscore ( _ )
Suppose you want the resulting value in the master data set to be a regular missing value in the updated master data set. In that case, you can use the underscore (_) character in the MISSING statement. Insert the underscore where you want the regular missing value to appear in the transaction data set.
data paylist_update;
input IdNum $ gender $ Jobcode $ Salary Birth hired;
informat birth hired date7.;
format birth hired date7.;
missing _;
datalines;
1401 M TA3 _ 13DEC55 17NOV93
1919 M TA2 _ 12SEP66 04JUN87
1350 F FA3 33000 31AUG55 29JUL91
1499 M _ 23025 26APR74 07JUN92
;
run;
proc sort data=paylist_update;
by idnum;
run;
data Paylist_new;
update sql.paylist2 paylist_update;
by idnum;
run;
Difference between SAS Update Statement and Merge Statement
The SAS UPDATE statement combines records from two files horizontally, like the MERGE statement. However, there are several significant differences between the two statements.
An essential feature of the Update Statement is dealing with missing values. The option Updatemode=Missingcheck means that missing values in the Transaction data set do not overwrite the Master data set values.
- UPDATE can only process two SAS datasets at a time –the master and transaction datasets. However, the single MERGE statement can process two or more SAS datasets.
- MERGE automatically replaces existing values in the first data set with missing values in the second data set. UPDATE, however, does not do so by default. To cause UPDATE to overwrite existing values in the master data set with missing ones in the transaction data set, you must use UPDATEMODE=NOMISSINGCHECK.
- UPDATE changes or updates the values of selected observations in a master file by applying transactions. UPDATE can also add new observations.
- The BY statement is optional with MERGE (although commonly used), but it is mandatory with an UPDATE Statement.
Key Takeway
Some powerful features documented in the SAS UPDATE statement make it useful in ‘updating’ SAS datasets. This article has presented some of the significant benefits of the UPDATE statement and compares and contrasts it to the MERGE statement.
So, this was our side of using the SAS Update Statement. We hope that you must have found it helpful.
Download the SAS code from here.
Moreover, if you have any other suggestions, suggest them below the comment section. We would take those lists in our further blog post.
Thanks for reading!
If you liked this article, you might also want to read Combining data Vertically in SAS (6 Methods).
Do you have any tips to add? Let us know in the comments?
Please subscribe to our mailing list for weekly updates. You can also find us on Instagram and Facebook.