Arrays in SAS are a temporary grouping of SAS variables arranged in a particular order and identified by an array name.
Arrays exist only for the session of the current data step and are referenced by the array name and subscript.
One of the main reasons for using arrays in SAS is to reduce the number of statements required for processing variables.
Syntax
ARRAY array-name{dimension}
- array-name specifies the name of the array.
- dimensions are the number and arrangement of array elements.
- elements list are the numeric or character variables to include in the array.
Array Dimensions
In the dimension, you need to specify the number and arrangement of elements included in the array. There are several ways to specify the dimension:
You can specify the number of array elements in a one-dimensional array. The array elements are variables you want to reference and process elsewhere in the DATA step.
array sales{4} qtr1 qtr2 qtr3 qtr4
Instead of specifying the number of array elements in the array dimension, you can specify a range of values for the dimension while defining the array. For example, you can define the range for array Sales as follows:
array sales{96:99} totals96 totals97 totals98 totals99
An asterisk (*) can also be used to specify the dimension of an array. In this way, SAS determines the dimension of the array by counting the number of elements.
array sales{*} qtr1 qtr2 qtr3 qtr4
Specifying array Elements
When specifying the elements of an array, list each variable name you want to include in the array. When listing elements, separate each element with space.
array sales{4} qtr1 qtr2 qtr3 qtr4
As in the example below, array elements can also be specified as a variable list. Variables qtr1 through qtr4 are grouped into a one-dimensional array.
array sales{4} qtr1-qtr4
Referencing Elements of an Array
A subscript is assigned to each array of elements when you define an array in a data step.
Specifying the name of the array followed by a subscript value enclosed in parenthesis will reference to an array element in the data step.
array-name {subscript}
Subscript specifies variables, or it can be a SAS expression or an integer. It is also within the lower and upper bounds of the array’s dimensions.
The DIM function
The DIM
function returns the number of elements in the array.
DIM array-name
Using the arrays
Using multiple assignment statements.
data emp1;
set mylib.emp;
qtr1=qtr1*1.25;
qtr2=qtr2*1.25;
qtr3=qtr3*1.25;
qtr4=qtr4*1.25;
run;
Using array:
data emp1;
set mylib.emp;
array incr{4} qtr1-qtr4;
do i =1 to dim(incr);
incr{i} = incr{i}*1.25;
end;
drop i;
run;
Output:
In the above example, incr array can also be specified by any of the below methods.
array incr[4] qtr1 qtr2 qtr3 qtr4
array incr{*} qtr1 qtr2 qtr3 qtr4
array incr(4) qtr1-qtr4
array incr(4) _NUMERIC_
Creating new variables with the ARRAY statement
If you do not specify the elements of the array, SAS automatically creates new variables.
The new variables’ names are obtained by concatenating the array’s name and numbers 1,2,3…
To create an array of character variables, add a $ symbol after the dimension.
E.g. : ARRAY Months{5} $;
The default length is 8, but you can specify your preferred length.
E.g.: ARRAY Months{5} $ 20
;
You can assign initial values to the elements of an array when you define the array by placing the initial values after the elements, enclosed in parentheses and separated by blanks (if characters, enclose them in quota)on marks).
E.g.:
ARRAY Months{3} m1 m2 m3 (1 2 3)
ARRAY Months{3} $ m1 m2 m3 (‘Jan’ ‘Feb’ ‘Mar’)
Creating variables with arrays
In the above Employee dataset example, if you would have to calculate each quarter’s percentage contribution concerning each employee’s annual contribution, you can use arrays as below.
data emp1;
set mylib.emp;
array incr{4} qtr1-qtr4;
array pct{4};
do i=1 to dim(incr);
incr{i}=incr{i}*1.25;
Total=SUM(OF incr{*} );
pct[i]=incr{i}/Total;
end;
drop i;
run;
Output:
Here is another example, where I want a new dataset, called WeatherChange, with the variables of the dataset Weather on and two more variables (Change1, Change2) which correspond to the differences between the temperature of each city for each of the months.
data weatherChange;
set mylib.weather;
array month{3} January February March;
array change(2);
do i=1 to dim(change);
change(i)=month(i+1)-month(i);
end;
drop i;
run;
Output:
Temporary arrays
A temporary array only exists for the duration of the data step where it is defined. A temporary array is useful for storing constant values used in calculations.
No corresponding variables exist to identify the array elements in a temporary array.
The elements are defined by the keyword TEMPORARY. When the keyword TEMPORARY is used, a list of temporary data elements is created in the Program Data Vector.
The values do not appear in the output data step but are retained across each data step iteration.
Temporary data elements can only be referenced using the array elements since they do not have names.
Explicit array bounds must be specified for temporary arrays, and the asterisk subscript cannot be used when defining a temporary array.
data mon;
set sashelp.holiday(where=(month ne 0 and month le 6));
array rate {6} $ _temporary_ ('Jan' 'Feb' 'Mar' 'Apr' 'May' 'Jun');
do i = 1 to 6;
mon = rate{month};
end;
run;
Multi-Dimensional arrays
Multidimensional arrays are useful when you group your data into a tabular arrangement with rows and columns.
You can think of the first dimension as rows. The second is for columns. For more dimensions, you need to add another comma and then specify the size for that dimension.
This goes in front of the row subscript. The first dimension or rows is the outermost of the nested loops, and columns are the inner loop. SAS fills the array starting with the first dimension.
do row = 1 TO 2;
[Code to process the array]
DO COLUMN 1 TO 5;
[Code to process the data]
end;
end;
The below example dataset records temperatures for two cities five different times. The objective is to round off all the temperature values using a multidimensional array.
data temps;
set mylib.temp_records;
array temprg{2,5} c1t1-c1t5 c2t1-c2t5;
do i=1 to dim1(temprg);
do j=1 to dim2(temprg);
temprg{i,j}=round(temprg{i,j});
end;
end;
drop i j;
run;
Output:
You can also read the article on Advance Array Processing Techniques, which has examples of some of the useful uses of Array.Read: Changing the Case of All Character Variables in a Data Set(Using Arrays)