PROC SQL
is a SAS Procedure that combines the functionality of DATA and PROC steps into a single step.
PROC SQL can sort data, create summaries of data, subsetting, join (merge), concatenate datasets, create new or calculated variables, print the results, create a new table, or view all in a single step.
PROC SQL in SAS can be used to retrieve, update, and report on information from SAS data sets or other database products.
The SELECT Statement and its Clauses
The purpose of a SELECT
the statement is to retrieve data from underlying tables. Although it supports multiple clauses, the SELECT statement has only one clause which is required to be specified – FROM
clause. All the remaining clauses are optional and can be used when needed.
To prevent syntax errors from occurring when using the SELECT statement, the clauses must be specified in the correct order as below.
PROC SQL options; SELECT column(s) FROM table-name | view-name WHERE expression GROUP BY column(s) HAVING expression ORDER BY column(s);
Example
proc sql;
select Origin, Make, Model, DriveTrain, MPG_CITY, count(Model) as ModelCount
from sashelp.cars where Drivetrain='All' group by Make having MPG_CITY > 20
order by origin, make;
quit;
Ordering outputs by column
The PROC SQL can impose order in a table using the ORDER BY
clause. It orders the query results according to the values in one or more selected columns specified after the FROM
clause.
Rows of data can be ordered in ascending (default) or descending (DESC) for each column specified.
proc sql;
select * from sashelp.class order by sex, 3, weight desc;
quit;
In the above example, the output is first arranged by sex, then within sex by height(3 specifies the column ordinal position), and within height in descending order by weight.
Grouping data using summary functions
The GROUP BY
clause is used to aggregate and order data groups using a designated column of the same value.
proc sql;
select min(low) as Lowest, max(high) as highest, stock from sashelp.stocks
group by stock;
quit;
Grouping data and sorting
Without the ORDER BY clause, the SQL procedure automatically sorts the results from a GROUP BY query in the same order as specified in the GROUP BY clause.
No additional processing occurs to satisfy the request when the ORDER BY and GROUP BY are specified for the same column.
The Group BY clause first sorts the result in the grouping column and then aggregates the query rows by the same grouping column.
If the columns specified in the ORDER BY and GROUP BY are not the same, additional processing may be required, which involves remerging summary statistics with the original data.
proc sql;
select min(low) as Lowest, max(high) as highest, stock from sashelp.stocks
group by stock order by lowest;
quit;
Subsetting groups with the having clause
Using the combination of GROUP BY
and HAVING
clause, aggregated data can be filtered out from one group instead of one row at a time.
Suppose you would like to find only those product grouping with an average price greater than 100 from the price data table.
A HAVING clause restricts the number of selected groups and is always performed after the GROUP BY clause.
proc sql;
select productname, avg(price) from sashelp.pricedata group by productname
having avg > 100;
quit;
Finding duplicate and Unique values
To select only one of several identical values in a table, SAS supports and processes the DISTINCT
keyword and the UNIQUE
constraint.
proc sql;
select distinct make from sashelp.cars;
run;
To find the count of unique values, you can use the distinct and count function together as below.
proc sql;
select count(make), count(distinct make) from sashelp.cars;
run;
The main difference between UNIQUE and DISTINCT in PROC SQL is that Unique ensures that all the values in a column are different.
In contrast, Distinct removes all the duplicate records when retrieving the records from a table. However, Unique can also retrieve unique records from a table.
proc sql;
create table subjects(subjid int unique , subject_name varchar(255),
subject_age int);
quit;
Since the Unique constraint is defined for Subjid, we cannot insert the same value for subjid.
proc sql;
insert into subjects values (8, "Subject1", 37);
quit;
proc sql;
insert into subjects values (8, "Subject1", 37);
quit;
ERROR:Add/Update failed for data set WORK.SUBJECTS because data value(s) do not comply with integrity constraint UN0001.
Counting the number of missing Values
The <a class="rank-math-link" href="https://9to5sas.com/sas-numeric-functions/#NMISS_function">NMISS</a>
function of PROC SQL in SAS is used to find the count of missing values in a column. The advantage of this function is that it works both for the character and numeric data types.
select count(*) as count, nmiss(agechddiag) as MissingCount from sashelp.heart;
quit;
Character Operations and Functions
Character operations are used with character data. You can also use most data step character functions within PROC SQL.
The below example uses the || – Contanation operator to join two rows, Make and Model.
proc sql outobs=5;
select Make || "," || Model as MakeAndModel from sashelp.cars;
quit;
Finding the length of string using Proc SQL
The LENGTH
function is used to find the length of the character string in Proc SQL. It returns the number equal to the number of characters in the argument.
proc sql outobs=5;
select model, length(Model) as Modellength from sashelp.cars;
quit;
Finding the occurrence of a pattern using the Index
To find the occurrence of a pattern, the <a class="rank-math-link" href="https://9to5sas.com/character-function-in-sas/#INDEX">INDEX</a>
function is used. The character string passed in the argument is searched from left to right for the first occurrence of the specified value.
If the string is found, the column position of the first character is returned. Otherwise, a value of 0 is returned.
In the below example, we can find cars with a manual transmission by writing the following code. Also, note that the search is case-sensitive.
proc sql outobs=5;
select make, model from sashelp.cars where index(model, 'manual') > 0;
quit;
Extracting information from a string
Sometimes we may need to extract specific information from a string. We can use the <a class="rank-math-link" href="https://9to5sas.com/character-function-in-sas/#SUBSTR">SUBSTR</a>
function with a character column by specifying a starting position and the number of characters to extract.
We can extract the first and last names from the name column in the following example by writing the following code.
proc sql outobs=5;
select name,team,
substr(name,index(name,',')+1,length(name)) as FirstName,
substr(name,1,index(name,',')-1) as LastName
from sashelp.baseball;
quit;
For extracting the first name, the INDEX function is used with SUBSTR to find the position of “,” in the Name column.
The value returned is then passed as the second argument to the SUBSTR function specifying the character’s starting position to be extracted.
Similarly, for extracting the last name, the INDEX function is used in the last argument of the SUBSTR function, which will return the ending position of “,”.
Phonetic matching – Sounds like operator (=*).
With sounds like operator, we can find names that sound alike or have spelling variations. This is performed using the Soundex algorithm.
proc sql;
select name,team,substrn(name,index(name,',')+1,length(name)) as
FirstName from sashelp.baseball
where calculated firstname =* ' Andre'
;
quit;
Finding the first non-missing value
The COALESCE
function is used to find the first non-missing value in a column or list. It scans the argument from left to right and returns the first non-missing or null value. If all the values are missing, the result is also missing.
data custDetails;
input custid Mobile Landline;
datalines;
001 9800231356 9800231356
002 8800230356 .
003 . 9733231250
;
run;
proc sql;
select custid, coalesce(mobile, landline) format=best32. as Phone from
custDetails;
quit;
In the above example, we are instructing SAS to show the Landline Number of the customer; if the Mobile number column is missing, show the Mobile number.
Using the COALESCE function, you can process these data types with a single function call instead of writing multiple IF-THEN/ELSE statements.
Producing a ROW number
A unique feature of PROC SQL in SAS is the ability to obtain the row count using the MONOTONIC()
function, which is similar to the _n_
automatic variable in a data step.
proc sql outobs=5;
select monotonic() as Row_Numbers, * from sashelp.class;
run;
A row number can also be produced with the documented SQL procedure NUMBER
. Unlike the MONOTONIC(), the NUMBER option does not create a new column in the table.
proc sql outobs=5 number;
select * from sashelp.class;
run;
Replicate a data set without data
Using PROC SQL in SAS, it is very easy to create a new empty data set while keeping all the structure of the original data set.
proc sql;
create table class1 like sashelp.class;
quit;
The Takeaway:
So, this was our side on Proc SQL in SAS. We hope that you must have found it useful. Moreover, if you have other suggestions, mention them in the comment section below. We would take those lists in our further blog post. Thanks for reading!
If you liked this article, you might also want to read Everything You Need to Know About SQL SET Operators and Intermediate Proc SQL Tutorials with Examples.
Do you have any tips to add? Let us know in the comments.
Please subscribe to our mailing list for weekly updates. You can also find us on Instagram and Facebook.