What are SAS Generation Datasets?
SAS generation datasets are historical copies of a SAS data set. You can request SAS to keep multiple copies of a SAS data set by using the generations feature.
Generation datasets are useful for backup and recovery a specific data set, comparing two or more data sets for audit purposes.
Read the paper "Generation Why: How Generation Data Sets Can Help"
Terminologies of SAS Generation Datasets
- Base version – The base version is the most recent version.
- Oldest version – It is the oldest version in a generation group.
- Youngest version- It is the version that is chronologically closest to the base version.
- Generation group – is a group of data sets that represent a series of replacements to the original data set. The generation group consists of the base version and a set of historical versions.
- GENMAX- is an output data set option to specifies how many versions (including the base version and all historical versions) to keep.
- GENNUM- is an INPUT data set option that specifies which version of a data set to open. Positive numbers are absolute references to a historical version by its generation number. Negative numbers are a relative reference to historical versions. For example, GENNUM=-1 refers to the youngest version. GENNUM=0 refers to the current version.
- Generation number – is a monotonically increasing number that identifies one of the historical versions in a generation group. For example, the data set named AIR#272 has a generation number of 272.
- Historical versions – are the older copies of the base version of a data set. Names for historical versions have a four-character suffix for the generation number, such as #003.
- Rolling over-specifies the process of the version number moving from 999 to 000. When the generation number reaches 999, its next value is 000.
- shift down – specifies a demotion of the base version to be the youngest version and deletion of the oldest version, if applicable. This typically happens when you create a new base version.
- shift up – specifies a promotion of the youngest version to be the base version. This typically happens when you delete the base version.
Creating Generation Dataset
To create generation data sets and to specify the number of versions to maintain, specify
GENMAX = <n> in the output dataset option.
For example, the following DATA step creates a new data set and requests that up to four copies be kept (one base version and three historical versions):
Example 1: Creating Generation dataset
data test(genmax=4); a=1; output; run; data test; a=5; run; data test; a=10; run; data test; a=15; b=20; run;
The first time a data set with generations is replaced SAS keeps the replaced data set and appends a four-character version number to its member name, which includes # and a three-digit serial number.
The name of the generation dataset is limited to 28 characters because the last 4 characters are reserved for the version number.
If your data set is named as test, the 1st replaced data set becomes test#001. When the data set is replaced for the second time, the replaced data set becomes test#002; that is, test#002 is the version that is chronologically closest to the base version.
After three replacements, the result is:
test base (current) version
test#003 most recent (youngest) historical version
test#002 second most recent historical version
test#001 oldest historical version.
With GENMAX=4 option, a fourth replacement deletes the oldest version, which is test#001. Similarly, a fifth replacement will delete test#002.
SAS will always maintain the maximum number of copies specified in the GENMAX=. as and when replacement happens.
For example, after ten replacements, the result is:
test base (current) version
test#010 most recent (youngest) historical version
test#009 2nd most recent historical version
test#008 oldest historical version
The limit for version numbers that SAS can append is #999. That is, after 999 replacements, the youngest version is #999.
So, after 1,000 replacements, SAS rolls over the youngest version number to #000. After 1,001 replacements, the youngest version number is #001
To view the Max Generation value of a dataset, you can use the
PROC CONTENTS procedure as below.
proc contents data=test; run;
To view how many generation dataset exists, use the
PROC DATASET procedure.
proc datasets library=work memtype=data; run;
Accessing generation datasets
To access one of the archived prior versions a new parameter (GENNUM=n) is specified with the dataset. This is called the generation number.
If the value is negative, this refers to a relative generation number, in that it is relative to the base data set.
Accessing base version
proc print data=test;
proc print data=test(gennum=0);
Processing the Newest Version
proc print data=test(gennum=-1);
Processing 2 Generations Back
proc print data=test(gennum=-2);
Processing 3 Generations Back
proc print data=test(gennum=-2);
data = test (gennum = -4 ) causes a program exception at run time and aborts the step.
(because the generation does not exist in our example due to the GENMAX option is set to 4).
ERROR: The version number specified in this file generation group is out of range: Current GENMAX value is 4; current GENNEXT value is 10.
Instead of a relative generation number, an actual generation number can be specified.
This is accomplished by specifying a positive number for GENNUM.
data = test (gennum = 3 ) will give access to test#003, as long as it is still available
Modifying the Number of Versions
When you modify the attributes of a data set, you can increase or decrease the number of versions for an existing generation group.
For example, the following
MODIFY statement in the
DATASETS procedure changes the number of generations for data set test to 5:
proc datasets library=work; modify test(genmax=5); run;
Note: SAS deletes the oldest version if you decrease the number of versions. This is not to exceed the new maximum number specified.
For example, the following MODIFY statement is used to decreases the number of historical versions of test dataset from 4 to 2. SAS automatically deletes the 2 historical versions:
proc datasets library=work nodetails; modify test(genmax=2); run;
Deleting Versions in a Generation Group
When you delete data sets, you can specify a specific version or an entire generation group to delete.
Using the DELETE statement in PROC DATASETS, the following options can be specified:
Example: Delete Base version and shifts up historical versions
proc datasets library=work; delete test(gennum=-2); run;
Example 2: Deletes TEXT#002.
proc datasets library=work; delete test(gennum=2); run;
¸Example 3: Delete all historical copies.
proc datasets library=work; delete test(gennum=hist); run;
Example 4: Delete all datasets including historical versions.
proc datasets library=work; delete test(gennum=all); run;
Renaming Versions in a Generation Group
When you rename a data set, you can rename an entire generation group:
Example 5: Renaming dataset name of the generation group.
proc datasets; change test=mydata; run;
You can also rename a single version by including GENNUM=:
Example 6: Renaming the most recent dataset name
proc datasets; change a(gennum=-1)=recent;