PROC SQL is a SAS Procedure that combines the functionality of DATA and PROC steps into a single step.
PROC SQL can perform sorting of data, creating summaries of data, subsetting, joining (merge), concatenation of datasets, create new or calculated variables, printing the results or create a new table or view all in a single step.
PROC SQL in SAS can be used to retrieve, update, and report on information from SAS data sets or other database products.
The SELECT Statement and its Clauses
The purpose of a
SELECT the statement is to retrieve data from underlying tables. Although it supports multiple clauses, the SELECT statement has only one clause which is required to be specified –
FROM clause. All the remaining clauses are optional and can be used when needed.
To prevent syntax errors from occurring when using the SELECT statement, the clauses must be specified in the correct order as below.
PROC SQL options; SELECT column(s) FROM table-name | view-name WHERE expression GROUP BY column(s) HAVING expression ORDER BY column(s);
proc sql; select Origin, Make, Model, DriveTrain, MPG_CITY, count(Model) as ModelCount from sashelp.cars where Drivetrain='All' group by Make having MPG_CITY > 20 order by origin, make; quit;
Ordering outputs by column
The PROC SQL has the ability to impose order in a table using the
ORDER BY clause. It orders the query results according to the values in one or more selected columns specified after the
Rows of data can be ordered in ascending (default) or descending (DESC) for each column specified.
proc sql; select * from sashelp.class order by sex, 3, weight desc; quit;
In the above example, the output is first arranged by sex, then within sex by height(3 specifies the column ordinal position) and within height in descending order by weight.
Grouping data using summary functions
GROUP BY clause is used to aggregate and order groups of data using a designated column of the same value.
proc sql; select min(low) as Lowest, max(high) as highest, stock from sashelp.stocks group by stock; quit;
Grouping data and sorting
Without the ORDER BY clause, the SQL procedure automatically sorts the results from a GROUP BY query in the same order as specified in the GROUP BY clause.
When both the ORDER BY and GROUP BY are specified for the same column, no additional processing occurs to satisfy the request.
The Group BY clause first sorts the result in the grouping column and then aggregate the rows of the query by the same grouping column.
If the columns specified in the ORDER BY and GROUP BY are not the same, additional processing may be required which involves remerging summary statistics with the original data.
proc sql; select min(low) as Lowest, max(high) as highest, stock from sashelp.stocks group by stock order by lowest; quit;
Subsetting groups with the having clause
Using the combination of
GROUP BY and
HAVING clause, aggregated data can be filtered out from one group at a time instead of filtering one row at a time.
Suppose that you would like to find out only those product grouping that has an average price greater than 100 from the price data table, then using a HAVING clause restricts the number of groups selected and is always performed after the GROUP BY clause.
proc sql; select productname, avg(price) from sashelp.pricedata group by productname having avg > 100; quit;
Finding duplicate and Unique values
To select only one of several identical values in a table, SAS supports and processes the
DISTINCT keyword and the
proc sql; select distinct make from sashelp.cars; run;
To find the count of unique values, you can use the distinct and count function together as below.
proc sql; select count(make), count(distinct make) from sashelp.cars; run;
The main difference between UNIQUE and DISTINCT in PROC SQL is that Unique ensures that all the values in a column are different while Distinct removes all the duplicate records when retrieving the records from a table. However, Unique can also be used to retrieve unique records from a table.
proc sql; create table subjects(subjid int unique , subject_name varchar(255), subject_age int); quit;
Since the Unique constraint is defined for Subjid, we cannot insert the same value for subjid.
proc sql; insert into subjects values (8, "Subject1", 37); quit; proc sql; insert into subjects values (8, "Subject1", 37); quit;
ERROR: Add/Update failed for data set WORK.SUBJECTS because data value(s) do not comply with integrity constraint UN0001.
Counting the number of missing Values
NMISS function of PROC SQL in SAS is used to find the count of missing values in a column. The advantage of this function is, it works both for the character as well as numeric data types.
select count(*) as count, nmiss(agechddiag) as MissingCount from sashelp.heart; quit;
Character Operations and Functions
proc sql outobs=5; select Make || "," || Model as MakeAndModel from sashelp.cars; quit;
Finding the length of string using Proc SQL
LENGTH function is used to find the length of the character string in Proc SQL. It returns the number equal to the number of characters in the argument.
proc sql outobs=5; select model, length(Model) as Modellength from sashelp.cars; quit;
Finding the occurrence of a pattern using Index
To find the occurrence of a pattern, the
INDEX function is used. The character string passed in the argument is searched from left to right for the first occurrence of the specified value.
If the string is found, the column position of the first character is returned otherwise a value of 0 is returned.
In the below example, we can find cars with a manual transmission by writing the following code. Also, note that the search is case sensitive.
proc sql outobs=5; select make, model from sashelp.cars where index(model, 'manual') > 0; quit;
Extracting information from a string
Sometimes we may need to extract specific information from a string. We can use the
SUBSTR function with a character column by specifying a starting position and the number of characters to extract.
In the following example, we can extract the first and last names from the name column by writing the following code.
proc sql outobs=5; select name,team, substr(name,index(name,',')+1,length(name)) as FirstName, substr(name,1,index(name,',')-1) as LastName from sashelp.baseball; quit;
For extracting the first name, the INDEX function is used with SUBSTR to find the position of “,” in the Name column. The value returned is then passed as the second argument to the SUBSTR function specifying the starting position of the character to be extracted.
Similarly, for extracting las name, the INDEX function is used in the last argument of SUBSTR function which will return the ending position of “,”.
Phonetic matching – Sounds like operator (=*)
With sounds like operator, we can find names that sound alike or have spelling variations. This is performed using the soundex algorithm.
proc sql; select name,team,substrn(name,index(name,',')+1,length(name)) as FirstName from sashelp.baseball where calculated firstname =* ' Andre' ; quit;
Finding the first non-missing value
COALESCE function is used to find the first non-missing value in a column or list. It scans the argument from left to right and returns the first non-missing or null value. If all the values are missing, the result is also missing.
data custDetails; input custid Mobile Landline; datalines; 001 9800231356 9800231356 002 8800230356 . 003 . 9733231250 ; run;
proc sql; select custid, coalesce(mobile, landline) format=best32. as Phone from custDetails; quit;
In the above example, we are instructing SAS to show the Landline Number of the customer, if the Mobile number column is missing otherwise show the Mobile number.
Using the COALESCE function you can process these types of data with a single function call instead of writing multiple IF-THEN/ELSE statements.
Producing a ROW number
A unique feature of PROC SQL in SAS is the ability to obtain the row count using the
MONOTONIC() function which is similar to the
_n_ automatic variable in a data step.
proc sql outobs=5; select monotonic() as Row_Numbers, * from sashelp.class; run;
A row number can also be produced with the documented SQL procedure
NUMBER. Unlike the MONOTONIC(), the NUMBER option does not create a new column in the table.
proc sql outobs=5 number; select * from sashelp.class; run;
Replicate a data set without data
Using PROC SQL in SAS, it is very easy to create a new empty data set while keeping all the structure of the original data set.
proc sql; create table class1 like sashelp.class; quit;