What are SAS Generation Datasets?
SAS generation datasets are historical copies of a SAS data set. You can request SAS to keep multiple copies of a SAS data set by using the generations feature. Generation datasets are useful for backup and recovery of a specific data set, comparing two or more data sets for audit purposes.
Terminologies of SAS Generation Datasets
- Base version – The base version is the most recent version.
- Oldest version – It is the oldest version in a generation group.
- Youngest version- It is the version that is chronologically closest to the base version.
- Generation group – is a group of data sets that represent a series of replacements for the original data set. The generation group consists of the base version and a set of historical versions.
- GENMAX- is an output data set option to specify how many versions (including the base version and all historical versions) to keep.
- GENNUM- is an INPUT data set option that specifies which version of a data set to open. Positive numbers are absolute references to a historical version by its generation number. Negative numbers are a relative reference to historical versions. For example, GENNUM=-1 refers to the youngest version. GENNUM=0 refers to the current version.
- Generation number – is a monotonically increasing number that identifies one of the historical versions in a generation group. For example, the data set named AIR#272 has a generation number of 272.
- Historical versions – are the older copies of the base version of a data set. Names for historical versions have a four-character suffix for the generation number, such as #003.
- Rolling over-specifies the process of the version number moving from 999 to 000. When the generation number reaches 999, its next value is 000.
- shift down – specifies a demotion of the base version to be the youngest version and deletion of the oldest version, if applicable. This typically happens when you create a new base version.
- shift up – specifies a promotion of the youngest version to be the base version. This typically happens when you delete the base version.
Creating Generation Dataset
To create generation data sets and to specify the number of versions to maintain, specify GENMAX = <n in the output dataset option.
For example, the following DATA step creates a new data set and requests that up to four copies be kept (one base version and three historical versions):
Example 1: Creating a Generation dataset
data test(genmax=4);
a=1;
output;
run;
data test;
a=5;
run;
data test;
a=10;
run;
data test;
a=15;
b=20;
run;
The first time a data set with generations is replaced SAS keeps the replaced data set and appends a four-character version number to its member name, which includes # and a three-digit serial number.
The name of the generation dataset is limited to 28 characters because the last 4 characters are reserved for the version number.
If your data set is named “test”, the 1st replaced data set becomes test#001. When the data set is replaced for the second time, the replaced data set becomes test#002; that is, test#002 is the version that is chronologically closest to the base version.
After three replacements, the result is:
test base (current) version test#003 most recent (youngest) historical versions test#002 second most recent historical versiontest#001 oldest historical version.
With the GENMAX=4
option, a fourth replacement deletes the oldest version, which is test#001. Similarly, a fifth replacement will delete test#002.
SAS will always maintain the maximum number of copies specified in the GENMAX=. as and when replacement happens.
For example, after ten replacements, the result is: test base (current) versiontest#010 most recent (youngest) historical versiontest#009 2nd most recent historical versiontest#008 oldest historical version
The limit for version numbers that SAS can append is #999. That is, after 999 replacements, the youngest version is #999.
So, after 1,000 replacements, SAS rolls over the youngest version number to #000. After 1,001 replacements, the youngest version number is #001
To view the Max Generation value of a dataset, you can use the PROC CONTENTS procedure as below.
proc contents data=test;
run;
To view how many generation dataset exists, use the PROC DATASET procedure.
proc datasets library=work memtype=data;
run;
Accessing generation datasets
To access one of the archived prior versions a new parameter (GENNUM=n) is specified with the dataset. This is called the generation number.
If the value is negative, this refers to a relative generation number, in that it is relative to the base data set.
Accessing the base version
proc print data=test;
or
proc print data=test(gennum=0);
Processing the Newest Version
proc print data=test(gennum=-1);
Processing 2 Generations Back
proc print data=test(gennum=-2);
data = test (gennum = -4 )
causes a program exception at runtime and aborts the step. (The generation does not exist in our example due to the GENMAX option being set to 4).
ERROR: The version number specified in this file generation group is out of range:
Current GENMAX value is 4; current GENNEXT value is 10.
Instead of a relative generation number, an actual generation number can be specified.
This is accomplished by specifying a positive number for GENNUM.
data = test (gennum = 3 );
will give access to test#003, as long as it is still available
Modifying the Number of Versions
When you modify the attributes of a data set, you can increase or decrease the number of versions for an existing generation group.
For example, the following MODIFY statement in the DATASETS procedure changes the number of generations for the data set test to 5:
proc datasets library=work;
modify test(genmax=5);
run;
Note: SAS deletes the oldest version if you decrease the number of versions. This is not to exceed the new maximum number specified.
For example, the following MODIFY statement decreases the number of historical versions of test datasets from 4 to 2. SAS automatically deletes the 2 historical versions:
proc datasets library=work nodetails;
modify test(genmax=2);
run;
Deleting Versions in a Generation Group
When you delete data sets, you can specify a specific version or an entire generation group to delete.
Using the DELETE statement in PROC DATASETS, the following options can be specified:
Example: Delete the Base version and shifts up historical versions
proc datasets library=work;
delete test(gennum=-2);
run;
Example 2: Delete a specific generation dataset.
proc datasets library=work;
delete test(gennum=2);
run;
Example 3: Delete all historical copies.
proc datasets library=work;
delete test(gennum=hist);
run;
Example 4: Delete all datasets including historical versions.
proc datasets library=work;
delete test(gennum=all);
run;
Renaming Versions in a Generation Group
When you rename a data set, you can rename an entire generation group:
Example 5: Renaming dataset name of the generation group.
proc datasets;change test=mydata;
run;
You can also rename a single version by including GENNUM=:
Example 6: Renaming the most recent dataset name
proc datasets;
change a(gennum=-1)=recent;
run;
Note: For the CHANGE statement in PROC DATASETS, specifying GENNUM=0 refers to the entire generation group.
Thank you sir for all the free tutorials.