FIRST and LAST processing are temporary variables created automatically by a DATA step when a BY statement is used.
The values of these Boolean variables will either be 1 for true or 0 for false.
FIRST.”variable”: This variable gets a value of 1 the moment a new group begins within your sorted dataset (based on your BY variables). For all other records within that group, its value remains 0.
LAST.”variable”: This variable acquires a value of 1 for the final record of each group within your sorted dataset. Otherwise, it stays at 0.
“Note”: Input Data set must be sorted or indexed before applying FIRST. and LAST. Variables.
A Simple Example
Suppose you have a dataset with sales figures organized by ‘Region’ and ‘Month’. Here’s how to use FIRST. and LAST. variables to calculate the first sale and total sales per “region”:
data transactions;input customer_id $ date mmddyy10. purchase_amount;format date date9.;datalines;A001 01/20/2024 25.50B005 02/15/2024 12.99A001 02/28/2024 45.00C012 01/12/2024 30.75B005 01/24/2024 8.50;run;proc sort data=transactions;by customer_id;run;data first_last_purchases;set transactions;by customer_id;if first.customer_id then first_purchase_date = date;if last.customer_id then last_purchase_date = date;/* Output if we want to see the results */if first.customer_id or last.customer_id;format first_purchase_date last_purchase_date date9.;run;proc print;

“Explanation”:
Sorting (Required): We sort by customer_id to group the data for FIRST. and LAST. to work correctly.
Detecting First/“Last”:
if first.customer_id then ... : When a new customer ID appears, we store the date in first_purchase_date.
if last.customer_id then ... : When it’s the last record for a customer ID, we store the date in last_purchase_date .
Output (Optional): The last if statement outputs rows only when it’s either the first or last record of each customer, making it easier to see the results.
Calculating Differences Between “Records”: Let’s say you have stock prices over time. You could use FIRST. and LAST. variables to calculate the change in price from the beginning to the end of each day, week, or “month”:
Example 1 : Calculating difference between records
data stock_data;infile datalines truncover;input ticker_symbol $4. date :yymmdd10. price best32.;format date date9.; /* Apply the date format for output */datalines;AAPL 2024-02-22 150.50TSLA 2024-02-21 205.75AAPL 2024-02-21 148.30MSFT 2024-02-22 250.22GOOG 2024-02-22 1200.15AMZN 2024-02-21 97.25GOOG 2024-02-21 1195.00AMZN 2024-02-22 98.50FB 2024-02-22 175.32;run;proc sort data=stock_data;by ticker_symbol date;quit;data price_change;set stock_data;by ticker_symbol;retain first_price last_price;if first.ticker_symbol then first_price = price;if last.ticker_symbol then do;last_price = price;price_change = last_price - first_price;output;end;run;

“Explanation”:
The RETAIN statement prevents first_price and last_price from being reset to missing at the beginning of each iteration of the data step.
if first.ticker_symbol ...: If it’s the first record of a ticker group, store the ‘price’ in first_price.
if last.ticker_symbol ...: If it’s the last record of a ticker “group”:
last_price.price_change (last price - first price).output;: Write a record to the ‘price_change’ dataset.Example “2”: Calculating Running Totals by Group
data sales;input month date9. sales;format month monyy7.;datalines;01Jan2024 10001Jan2024 15001Feb2024 20001Feb2024 25001Mar2024 300;run;proc sort data=sales;by month;run;proc print;data sales_with_total;set sales;by month;retain total_monthly 0 total_running 0;if first.month then total_monthly = 0; /* Initialize total for each month */total_monthly + sales; /* Calculate monthly total */total_running + sales; /* Update running total */if last.month then do;output; /* Output total for each month */total_monthly = .; /* Reset total for next month */end;run;proc print;

“Explanation”:
retain ...;: Prevents total_monthly and total_running from resetting to missing at the start of each data step iteration.
if first.month ...: If it’s the first record of a month, reset total_monthly to 0.
total_monthly + sales;: Accumulates sales for the current month.
total_running + sales;: Accumulates the overall running total.
if last.month ...: When it’s the last record of a “month”:
output;: Writes a record to the ‘sales_with_total’ dataset (containing the calculated totals).total_monthly = .;: Resets total_monthly for the next month.Example “3”: Finding Maximum and Minimum Values
data grades;input student_id grade;datalines;1 901 852 802 953 753 85;run;data max_min_grade;set grades;by student_id;retain max_grade min_grade;if first.student_id then do;max_grade = grade;min_grade = grade;end;else do;if grade > max_grade then max_grade = grade;if grade < min_grade then min_grade = grade;end;if last.student_id;run;

Example “4”: Calculating Group wise Average
proc sort data=sales;by region;quit;data region_average;set sales;by region;retain total count;if first.region then do;total = 0;count = 0;end;total + sales;count + 1;if last.region then do;average = total / count;output;end;run;proc print;

Example 5 : Identifying Duplicate records
You can use first and last variables to identify duplicate records without using the proc sort procedure.
data customer;input customer_id $;datalines;AABCCC;run;data duplicate_records;set customer;by customer_id;if first.customer_id then count = 1;else count + 1;if last.customer_id and count > 1 then output;run;

Always remember to sort your data using a PROC SORT and a BY statement for FIRST. and LAST. variables to function correctly.
FIRST. and LAST. variables are temporary; they exist within the DATA step and aren’t added to the final output dataset unless you explicitly create new variables to store their values.
