This article explains the data summarization by rows and columns and how you can use Predicates in PROC SQL to perform comparisons between two conditions or expressions.
Summarizing Data
The SQL procedure is a wonderful tool for summarizing (or aggregating) data. It provides several useful summary (or aggregate) functions to help perform calculations, descriptive statistics, and other aggregating operations in a SELECT statement or HAVING clause.
Proc SQL COUNT
The below example uses the COUNT function with the (*) argument to produce a total number of rows, whether data is missing or not.
The asterisk (*) is specified as the argument to the COUNT function to count all rows in the PURCHASES table.
proc sql;
select count(*) as row_count
from sashelp.shoes;
quit;
Unlike the COUNT(*) function syntax that counts all rows, whether data is missing or not, the following example uses the COUNT function with the (column-name) argument to produce a total number of non-missing rows based on the products column.
proc sql;
select count(product) as non_missing_row_count
from sashelp.shoes;
quit;
Proc SQL outobs and inobs options
You could use the PROC SQL options as an alternative to FIRSTOBS= and OBS=.
INOBS= is used to find out how many rows are to be processed.
OUTOBS= to decide how many rows to include in the report or output dataset.
proc sql inobs = 3 ;
create table class_2 as
select *
from sashelp.class
;
quit ;
proc sql outobs = 3 ;
create table class_1 as
select *
from sashelp.class
where sex = 'M'
;
quit ;
proc sql inobs = 3 ;
create table class_2 as
select *
from sashelp.class
where sex = 'M'
;
quit ;
generates the following in the LOG:
This means that the INOBS= or OUTOBS= options are used to choose rows for the output dataset after the WHERE clause has chosen rows from an internal table.
Summarizing Data Down Rows
The SQL procedure can be used to produce a single aggregate value by summarizing data down rows (or observations).
Suppose you want to know the average sales for all Sport Shoes in the SASHELP.SHOES table containing a variety of products.
The following query computes the average product cost and produces a single aggregate value using the AVG function.
proc sql;
select avg(sales) as average_sales format=dollar10.2 from sashelp.shoes where
upcase(product) in ("SPORT SHOE");
quit;
Summarizing data across columns
When a computation is needed on two or more columns in a row, the SQL procedure can be used to summarize data across columns.
Suppose you want to know the remaining inventory after-sales for each product.
proc sql outobs=5;
select product,sales,inventory,
(inventory-sales) as remaining_inventory format=dollar10.
from sashelp.shoes;
quit;
Predicates
Predicates are used in PROC SQL to perform direct comparisons between two conditions or expressions. Six predicates will be looked at:
- BETWEEN
- IN
- IS NULL
- IS MISSING
- LIKE
- EXISTS
Selecting a range of values
The IN predicate creates an OR condition between each value and returns a Boolean value of True if a column value equals one or more values in the expression list.
proc sql outobs=5;
select *
from sashelp.shoes
where upcase(product) = 'SANDAL';
quit;
Testing for null or missing values
The IS NULL predicate selects one or more rows by evaluating whether a column amount is missing or null.
The IS MISSING predicate performs identically to the IS NULL predicate by selecting one or more rows containing a missing value (null).
data shoes;
set sashelp.shoes;
if _n_ in (4, 7, 8, 9) then
call missing(subsidiary);
run;
proc sql;
select * from shoes where subsidiary is null;
quit;
The only difference is that specifying IS NULL is the ANSI standard way of expressing the predicate, and IS MISSING is commonly used in SAS.
See our guide on Working With Missing Values In SAS for more information on missing values in SAS.
Finding patterns in a string (pattern matching % and _)
Constructing specific search patterns in string expressions can be achieved with the LIKE Predicate.
The % (per cent sign) acts as a wildcard character representing any number of characters, including any combination of upper or lower case characters.
To find patterns in the product name (PRODUCT) containing the characters, ”Men’ in the first three positions followed by any number of characters is specified with the following WHERE clause.
proc sql outobs=5;
select *
from sashelp.shoes
where product like 'Men%';
quit;
The wildcard character “%” precedes and follows the search word to select all products whose name contains the word “DRESS” in its name.
proc sql outobs=5;
select *
from sashelp.shoes
where upcase(product) like '%DRESS%';
quit;
The resulting output contains product types such as “Men’s Dress” and any other products containing the word “Dress”.
LIKE predicate is used to check a column for the existence of trailing blanks.
% (Percent ) – The wildcard character % followed by a blank space is specified as the search argument.
_ (Underscore) – The wildcard _ is used to search for a specific number of characters, and using the LIKE predicate, the underscore (_) provides a way to pattern match character-by-character.
Thus, a single underscore (_) in a specific position acts as a wildcard placement holder for that position only.
Two consecutive underscores (__) act as a wildcard placement holder for those two positions.
Three consecutive underscores act as a wildcard placement holder for those three positions. And so forth.
In the following example, the first position used to search product type contains the character “S”, and the following six positions (represented with six underscores) act as a placeholder for any value.
proc sql outobs=5;
select *
from sashelp.shoes
where upcase(product) like 'S______';
quit;
The following example illustrates a pattern search of the product name (PRODUCT) where the first position is represented as a wildcard; the second position contains the lowercase character “o”, followed by any combination of uppercase or lowercase characters.
proc sql outobs=5;
select *
from sashelp.shoes
where product like '_o%';
quit;
Testing for the existence of a value
The EXISTS predicate is used to test for the existence of a set of values.
In the following example, a subquery is used to check for the existence of customers in the CUSTOMERS table with purchases from the PURCHASES table.
proc sql;
select custnum, custname, custcity from customers c where exists
(select * from purchases p where c.custnum=p.custnum);
quit;
The EXISTS condition is an operator whose right operand is a subquery. The result of an EXISTS condition is true if the subquery resolves to at least one row.
The result of a NOT EXISTS condition is true if the subquery evaluates to zero rows.
Download the entire code from this post here.
The Takeaway:
So, this was our side on some practical use of PROC SQL. We hope that you must have found it useful.
Moreover, if you have other suggestions, mention them in the comment section below. We would take those lists in our further blog post.
Thanks for reading!
If you liked this article, you might also want to read A Comprehensive Guide To PROC SQL In SAS (15 + Examples) and Everything You Need To Know About SQL SET Operators.
Do you have any tips to add? Let us know in the comments.
Please subscribe to our mailing list for weekly updates. You can also find us on Instagram and Facebook.