It is one of the most common data manipulation tasks to find records that exist in table one that also exists in table two. In other words, finding common rows that exist in both tables. This post includes 3 methods with PROC SQL and 1 method with data step merge to solving it.
Suppose you have two data sets (tables), one and two. You want to find the records that are present in both tables.
Creating two datasets – one and two.
data one;
input id $ value;
datalines;
A 1
B 2
C 3
D 4
E 5
;
run;
data two;
input id $ value;
datalines;
F 6
B 2
G 7
D 4
H 8
;
run;
Method 1: Using Proc SQL Subquery
A PROC SQL subquery returns a single row and column. This method uses a subquery in its SELECT clause to select ID from table two. The subquery is evaluated first, and then it returns the id from table two to the outer query.
proc sql;
select * from one where id in (select id from two);
quit;
Method 2: Using PROC SQL Inner Join
PROC SQL INNER JOIN returns rows common to both tables (data sets). The query below returns values B and D from the variable ID in the combined table as these two values are common in datasets one and two.
proc sql;
select distinct t1.* from one as t1 inner join two as t2 ON T1.id=T2.id;
quit;
Method 3: Using INTERSECT Operator
The INTERSECT operator returns common rows in both tables.
proc sql;
select * from one intersect select * from two;
quit;
Method 4: Using Data step Merge
Proc sort data=one;
By id;
Run;
Proc sort data=two;
By id;
Run;
Data final;
Merge one(in=t1) two(in=t2);
By id;
If t1 and t2;
Run;