Proc Univariate and Proc Means are SAS procedures that calculate statistics for quantitative variables.
Proc Univariate provides a wider variety of statistics and graphs than the proc means. It helps you to discover key information about the distribution of each variable, such as
“Syntax”:
PROC UNIVARIATE <options>; <statements>
The commonly used options for PROC UNIVARIATE “include”:
The Commonly used statements used with PROC UNIVARIATE “include”:
The BY group specification causes UNIVARIATE to calculate statistics separately for group observations (i.e., treatment means).
The OUTPUT OUT= statement allows you to output the means to a new data set.
PROC UNIVARIATE DATA=sashelp.class;var weight;RUN;

The first table generated is the Moments table. it provides a list of descriptive statistics for the variable weight.
The second table from PROC UNIVARIATE provides several measures of the central tendency and spread of the data.

The test for locations is used to determine whether the mean of the data is significantly different from 0 or another hypothesized value.

In this table, the commonly used quantiles of the data are listed.
The quantiles provide information about the distribution’s tails and include the five number summaries for each variable. These consist of the variables’ minimum, lower quartile, median, upper quartile, and maximum values.
You can also calculate custom percentiles with the PCTLPTS\= option, like 10, 20, 30, 40, 50 and Q3(75th) percentiles.
proc univariate data = sashelp.iris ;var sepallength;output out = pctdspctlpts = 10 to 50 by 10,q3=quartile3 pctlpre = pct_ pctlname=P10 P20;run;

The pctlpre is the prefix to add for the variables. The pctlname is to create suffixes to create the names for the variables that contain the PCTLPTS= percentiles, and the Output statement is to save the values in a SAS dataset.

The extreme observations table lists the largest and smallest values in the data set. This is useful for locating outliers in the data.
Most of the statistical techniques assume data should be normally distributed. It is important to check this assumption before running a model.
There are multiple ways to check Normality :
A histogram is a commonly used plot for visually examining the distribution of a set of data. You can create a histogram in PROC UNIVARIATE with the following statement.
HISTOGRAM SEPALLENGTH/NORMAL
The normal option creates a superimposed normal curve.
proc univariate data=sashelp.shoes NOPRINT;var sales;HISTOGRAM / NORMAL (COLOR=RED);run;

Skewness is a measure of the degree of asymmetry of a distribution. If skewness is close to 0, it indicates data is normally distributed.
If Skewness > 0, data is Positively skewed, meaning that there are a few extreme values or outliers with large values. In positively skewed data, the mean is greater than the median, and the median is greater than the mode.
If skewness < 0, it indicates data is negatively skewed, meaning there are a few outliers with small values.
Rules for Skewness :

In the above example, skewness is close to 0, which means data is normally distributed.
Test for normality is another way to assess whether the data is normally distributed. Four test statistics are displayed in the table.
The NORMAL option is included in the PROC UNIVARIATE to test for the normality of data.
Shapiro Wilk and Kolmogorov tests are the two mainly used methods. The p-values below are for testing the null hypothesis that the variable is normally distributed. If the p-value is greater than 0.05, you may assume that the data is normally distributed.
proc univariate data = sashelp.iris normal;var sepallength;run;

The Shapiro-Wilk test gives you a W value. Smaller values indicate data is not normally distributed, and you can reject the null hypothesis. This test works well for a sample size of less than 2000.
The Kolmogorov test is also known as KS Test, and this test can handle a large sample size.
From Wikipedia,
The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples.
The Winsorized and Trimmed Means are extremely sensitive to a single outlier. When the data has highly skewed, a percentage of data is removed, and then the mean is calculated when the data is highly skewed.
The percentage tells you what percentage of data to remove. For example, with a 5% trimmed mean, the data’s lowest 5% and highest 5% are excluded. The mean is calculated from the remaining 90% of data points.
Trimmed mean calculates 10th, and 90th percentile values, remove those extreme values and then calculates the mean.
proc univariate trimmed= 0.1 data=sashelp.shoes;var sales;histogram / normal;run;

In the example above, we are calculating a 10% Trimmed Mean.
10% of values are Trimmed from each tail (upper and lower side), and 40 values are trimmed from the left and right tails.
Winsorized means is a method that replaces extreme values (smallest or largest) with the closest observations, and then means are calculated. It is the same as the trimmed mean except for removing the extreme values. We are capping a percentage of values from both ends of the data.
proc univariate winsorized= 0.2 data=sashelp.shoes;var sales;histogram / normal;run;

PROC UNIVARIATE with the PLOT option generates the following plots :
proc univariate data = sashelp.shoes plot;var sales;run;

The horizontal histogram (top-left) is a visual representation of the distribution of the sales value. In normally distributed data, the peak will be in the middle with equal trails trailing on either side.
The box-and-whiskers plot (top-right) is a graphical representation of the quartiles of the data. The box represents 50 % of the data(the middle), and the whiskers represent 25% of the data on each side.
The centre line represents the median which is the 50th percentile.
The diamond symbol ◇ indicates the mean.
The circles ◦ that stand at the top of the box plot indicate extreme values.
The normal probability plot (bottom) provides a graphical representation of the plot of points shown as dots that lie in a tight scatter around the reference (diagonal) line.
