Proc Format in SAS

SAS user-defined formats allow you to assign labels to the values of variables. PROC FORMAT in SAS can be used to assign formats or Informats to a character or numeric variables.

For example, if gender is a numeric variable with 1 for males and 2 for females, the user-defined format can be set up so that the values “male” and “female” are printed in SAS output, rather than 1 and 2.

The underlying variables are still numeric, so the same variables can also be used in numeric procedures, such as PROC REG, or PROC MEANS, or PROC FREQ.

The format does not change the underlying values of the variable, but how they are displayed in the SAS output.

Formats can be associated for specific values, or to ranges of values and they can be temporary or permanent.

There are also default SAS formats, such as formats for date variables, that can be used at any time.

This article covers only some of the more basic uses of SAS user-defined formats for numeric variables.

User defined formats are stored in a separate file called a formats catalog and are not part of a SAS data set.

Proc format;
value format-name Range-1 = 'Label1'
                  Range-2 = 'Label2'
                  Range-3 = 'Label3'
                  .................
                  ;
run;

Rules for defining FORMAT NAME

  • For character values, first character must be a dollar sign ($), and a letter or underscore as the second character. For numeric values, name must have a letter or underscore as the first character.
  • Name cannot end with a number
  • Cannot be the name of an existing SAS format
  • Should not end with a period in the VALUE statement

Creating Labels for each data value

The most common way of labelling data is to simply assign each unique code its own label. I have taken an example from the sashelp.pricedata . The region column has values of 1,2 and 3.

We would like to assign region names the USA, EUROPE and ASIA corresponding to the values of 1,2 and 3.

You can use the Proc Freq procedures to find out the unique values of the region as below.

proc freq data=sashelp.pricedata;
	table region;
run;
Proc Format in SAS 1
proc format;
	value region 1=USA 
	             2=EUROPE 
	             3=ASIA
	             ;
run;

Creating formats for Multiple values

Beyond just a simple one-to-one mapping as illustrated above, we can also map multiple values to a single label.

Ranges can be multiple values separated by commas.

proc format;
	value line 1,2,3= "Home Appliances" 
	             4,5= "Clothing" 
	             ;
run;

Creating formats with numeric ranges

In addition, we can use ranges and lists of values to create formats. For example, we can apply a single ‘label’ to ranges of values with code like this:

proc format;
 value weight
 0 - 18      = "Underweight"
 18.1 - 24.9 = "Normal weight"
 25 - 29.9   = "Overweight"
 ;
run;

Creating formats with HIGH and LOW

Ranges can include or exclude the bounding values, depending on the use of various keywords like HIGH or LOW

proc format;
 value weight
 low - < 18  = "Underweight"
 18 - 24.9   = "Normal weight"
 25 - 29.9   = "Overweight"
 30 < - high = "Obesity"
 other       = "No Records"
 ;
run;

LOW refers to the least available number, HIGH refers to the highest available number and OTHER includes all numbers not specified. Others also include missing values if it is not specified.

Creating Character Formats

Value labels can also be applied to character/string data values. The most important differences are:

  • The name of the format must start with a dollar sign ($)
  • The code values (on the left of the equals signs) must be quoted.
proc format;
	value $genderFmt 
	     'M'='Male' 
	     'F'='Female' 
	     Other='Error'
	     ;
run;

data class;
	length sex $10.;
	set sashelp.class;
	format sex $genderfmt.;
run;

We can also apply the basic principles of ranges to character formats as well. We could create the following

proc format;
  value $alphabets
  'A' - <'K' = 'First 10'
  'K' - <'U' = 'Second 10'
  'U' - high = 'Remaining'
  Other = 'Error'
  ;
 run;
 
 data a1;
 set a;
 format var1 $alphabets.;
 run;

In the above example Any values which come after A (including A) and before K (not including K), in an alphabetical sort, will be labeled as ‘First 10’.

Any values which come after K (including K) and before U (not including U), in an alphabetical sort, will be labelled as ‘Second 10’.

Any values which come after ‘U’ (including U) will be labelled as ‘Remaining’. All other values (including missing) will be displayed as ‘Error ‘.

In an alphabetical sort, SAS sorts numeric values before A. So, in this example, missing values and any values beginning with numbers will be labelled as ‘Error ‘.

Nesting Formats

You can also include the name of a SAS format or user-defined format or informat, rather than a text string in place of a label. Here is an example:

You want to read dates from July 15, 2005, to December 31, 2006, using the mmddyy10. informat. Dates before July 15, 2005, should be formatted as “Not Open” and dates after December 31, 2006 should be formatted as “Too Late.”

You can use nested formats as follows to accomplish this task:

proc format;
value registration 
      low - <'15Jul2005'd = 'Not Open'
      '15Jul2005'd - '31Dec2006'd = [date9.] 
      '01Jan2007'd - high = 'Too Late'
      ;
run;

Multilevel Formats

Under normal circumstances, you get an error message if any of your format ranges overlap. However, you can create a format with overlapping ranges if you use the multilabel option on the VALUE statement.

Certain multilabel enabled procedures can then use the multilabel format to produce tables showing all of the format ranges.

Here is an example:
You want to see the variable Age (from the SURVEY data set) broken down two ways—one, in 20 year intervals; the other by a split at 50 years old. You first create a multilabel format like this:

proc format;
   value agegroup (multilabel)
0 - <20 = '0 to <20' 20-<40 ='20to<40'
40-<60 ='40to<60' 60-<80 ='60to<80' 80 - high = '80 +'
0 - <50 = 'Less than 50'
50 - high = '> or = to 50'; run;

Creating Informats

Creating informats is similar to creating formats except the VALUE statement is replaced with an INVALUE statement.

One important distinction is that the right-hand side of the equals sign in a VALUE statement (the ‘label’ side) for a format must be a character.

In creating an informat, the right-hand side of the equals sign in the INVALUE statement can be numeric.

A second important distinction is that for informats the type of informat that we use/create (character vs numeric) indicates the type of the OUTPUT variable. This is the opposite of formats.

For formats, the type we use/create indicates the type of the INPUT variable.

proc format;
 invalue convert 'A+' = 100
 'A' = 96
 'A-' = 92
 'B+' = 88
 'B' = 84
 'B-' = 80
 'C+' = 76
 'C' = 72
 'F' = 65;
run;
data grades;
 input @1 ID $3.
 @5 Grade grades.;
datalines;
001 A
002 B+
003 F
004 C+
005 A
; 

The name of the informat is followed by the INVALUE keyword and a dollar sign are used to create character informat.

Even though you are reading character values, the resulting data will be a numeric variable.

If you used the name $marks instead of marks as in this program, the variable marks would be a character.

proc format;
 invalue $marks 100 = 'A+'
 96 = 'A' 
 92 = 'A-'
 88 = 'B+'
 84 = 'B'
 80 = 'B-'
 76 = 'C+'
 72 = 'C'
 65 = 'F'
 ;
run;
data marks;
 input @1 ID $3.
 @5 marks $marks.;
datalines;
001 96
002 88
003 65
004 76
005 96
;
run;

INFORMAT options UPCASE and JUST

There are some useful options that you can use when you create your informats. UPCASE and JUST are two such options.

UPCASE, as the name implies, will convert the data values to uppercase before checking on the informat ranges.

JUST will left-align character values.

data grades;
 input @1 ID $3.
 @5 Grade convert2.;
datalines;
001 A
002 B+
003 F
004 C+
005 A
; 

In general, it is best to use a format when converting from numeric to character. Informats are better suited to converting character to numeric or character to character.

Using an informat to convert from numeric to a character and/or numeric to numeric will generate a Note:

NOTE: Numeric values have been converted to character values”

Using numeric informat to read a combination of character and numeric data.

proc format;
   invalue readtemp(upcase)
96 - 106 = _same_ 
'N' = 98.6
other = .;
run;
data temperatures;
input Temp : readtemp5. @@;
datalines;
101 N 97.3 n N 67 104.5 ;

The UPCASE option is used converts any character to uppercase. The keyword _SAME_ option keeps any numeric values in the range of 96 to 106 like it is.

Values of ‘N’ are converted to the numeric value of 98.6 and any values that are not in the range 96 to 106 or equal to ‘N’ are set to a numeric missing value.

USING EXISTING FORMATS/INFORMATS

Using a format is fairly simple. There are two rules to remember:

  1. If we are using a format we use the PUT() statement and using an informat we use the INPUT() statement
  2. For character variables we must use a character format and for numeric variables we must use a numeric format.

Where the variable is a character or numeric variable and format are a SAS or user-defined format. The result of a PUT function is always a character value.

Given a numeric variable called region and a numeric format region format in sashelp.pricedata, to use the format to create a new variable the code would look like this:

 newRegion=put(region, $region.);

Similarly, for a character variable called sex and a character format $genferfmt., to use the format to create a new variable the code would look like this:

newvar=put(sexcd, $sexf.);

Similar code works for informats, replacing the PUT() with an INPUT().

We can also apply a format/informat directly to a variable by using the FORMAT/INFORMAT statement.

format racecd racef.; or format sexcd $sexf.;

The FORMAT/INFORMAT statement can be used in a DATA step and in most PROCS.

Always use the formatted value of the variable for the analysis process in PROC steps.

VIEWING EXISTING FORMATS

Once we have located all existing format catalogs, we are most likely curious to see what formats are in these catalogs.

The following code can be used to do that:

proc format library=work fmtlib;
run;

This code will print a map of the existing format catalog to the LST file. A small sample might look like this:

Proc Format in SAS 3

Saving User-Defined Formats in a SAS Formats Catalog:

SAS formats catalogs are saved with the extension as sas7bcat. You can save user-defined formats permanently.

Below are the steps for creating and saving permanent user-defined SAS formats and assigning them to variables in a permanent data set.

  • Submit libname statements for the data set and for the formats catalog. Both the libnames may point to the same folder or they may point to different folders. The libname for your formats has to be “library”.
  • Run proc format to create the user-defined formats. Proc format is used to set up the formats definitions. Format names may be up to 32 characters long, and may not end with a number. Creating the formats does not link them to variables in the data set.
  • Create a permanent data set using a data step.
  • Run Proc Datasets to link the formats to the variables in the data set. Be sure when assigning formats to variables using Proc Datasets, that you follow the format name by a period.

All SAS formats are stored in a catalog (collection of formats). When we create a Format, it gets stored in the catalog.

If we don’t specify the catalog, then SAS stores formats in the WORK library in a catalog called FORMATS. Like other datasets of WORK library, they also get deleted at the end of the session.

Now to save User-defined formats, we need to specify where to store the catalog and what to call it. This can be achieved by storing formats in a library other than WORK.

  1. First of all, we have to define a library (Here I am using the SAS University edition)

Syntax: – LIBNAME Library_Name “Path”

LIBNAME MYLIB “/home/subhroster20070/examples”;  
  1. Use library option in PROC format and provide a library name with the format filename. The file name must be a valid SAS dataset name.

Example:

proc format library=mylib;
	value $genderFmt 
	     'M'='Male' 
	     'F'='Female' 
	     Other='Error'
	     ;
run;    

How to Use stored Format

Now, whenever we want to use stored format, we have to tell SAS to look for formats in that catalog file. This is done with the fmtsearch option. So before using it, we need to write a statement.

Syntax: – Options fmtsearch = (Library_Name.MYFILENAME)

Example:

Options fmtsearch = (mylib);

Creating formats from SAS datasets

In PROC FORMAT, we have an option CNTLIN, that allows us to create a format from a data set rather than a VALUE statement. Before using this option we first look at the guidelines below:-

Proc Format in SAS 5
  • Input dataset must contain three variables required by PROC FORMAT – START, LABEL and FMTNAME.
  • Here START is the key field between these two tables and it must be unique in the input dataset (Dataset, we are using in CNTLIN option).
  • LABEL variable is the value we want to map it to another dataset.
  • The FMTNAME variable is the name of the format and it must be passed in single quotes.
  • After defining format, we can use put function to create a variable in the dataset based on the key field and the format we have defined.
data control;
set countryCodes(rename=
(AlphaCode = Start Country = Label));
   retain Fmtname '$ALPHAFMT'
Type 'C';
run;

The RENAME= data set option is used to rename Alphacode to START and Country to LABEL.

A RETAIN statement is used to set FMTNAME to ‘$ALPHAFMT’ and TYPE equal to ‘C’ for Character type format.

Using a RETAIN statement is more efficient than an assignment statement since these values are set at compile-time—an assignment statement executes for each iteration of the DATA step.

Proc Format in SAS 7
proc format cntlin=control fmtlib; 
run;

The CNTLIN= option names this data set and the FMTLIB option generates a table showing the ranges and format labels. Below are both a listing of the CONTROL data set and the output from PROC FORMAT:

Proc Format in SAS 9
data Country;
format AlphaCodes $ALPHAFMT.; 
input AlphaCodes : $15. @@;
datalines;
US IN GE IT
;

run;

ADVANCED TECHNIQUES

USING THE ERROR OPTION

With the ERROR option, whenever the specified value(s) is encountered and ERROR message will be issued in the LOG file. The ERROR option only applies with informats (not formats.)

proc format;
  invalue racef
  '1'='White'
  '2'='Black'
  '3'='Asian'
  '4','5','6','7','8'=ERROR
  '9'='Other'
  ' '='Unknown'
  ;
 run;

When we use the informat, if it encounters a value of 4,5,6,7 or 8 an ERROR message will be printed to the log.

NOTE: Invalid argument to function INPUT at line 107 column 8.
var=5 var2= ERROR=1 N=2

USING THE SAME OPTION

If a value in the data is not included in the informat (or format), when we apply the informat, the omitted value will be displayed exactly as it appears in the data.

The SAME option is an alternate method for stating this default.

Like the ERROR option above, it is only applicable to INFORMATS. This code:

proc format;
  invalue phasecd
  1,2,3,4,5,6,7='Treatment'
  8,9,10,11,12='Follow-up'
  '-1','0' = _SAME_
  other = _ERROR_
  ;
  run;

The -1 and 0 values will now appear exactly as they are in the data, rather than issuing an error to the LOG. All other values will still issue an error in the LOG

USING THE DEFAULT OPTION

Occasionally, there will appear a NOTE in the LOG about “the length of the format is defaulted to XX”. This happens particularly when the length of the character strings on the output (right) hand side are long or when we use an existing format to create a new format (see below).

The warning can be removed by specifying with the DEFAULT option how long we want the format to be. It is the same general idea as when we apply a length to a variable.

proc format;
  value phasecd (default=15)
  -1, 0 = SAME
  1,2,3,4,5,6,7=’Treatment’
  8,9,10,11,12=’Follow-up’
  other=ERROR
  ;
  run;

This code sets the length of the formatted values to 15 characters. It is not necessary to include the DEFAULT option every time. Adding the DEFAULT can remove the warning.

It can also help reduce dead spaces appearing at the end of the output.

If the length of the label (value on the right-hand side) is greater than the length set by the DEFAULT option then the label will be truncated.

Applying the Multilabel Option

Multilabel format enables you to assign multiple labels to a value or a range of values.

The MULTILABEL option can be specified in the VALUE statement of PROC FORMAT to assign multiple labels.

proc format;
value $genderfmt (multilabel)
'M'='Male'
'F'='Female'
'M','F',' '='Total people';
quit; 

PROC FREQ doesn’t understand multilabel formats. Only PROC MEANS, PROC SUMMARY, and PROC TABULATE can use multilabel formats. We can use PROC MEANS to determine the frequencies we want:

Hybrid Format

TO REMOVE A FORMAT

You can use proc dataset step to remove the formats associated with a dataset. The _all_ will remove formats of all the variables. Alternatively, you can provide the variables names to remove formats for that variables.

proc datasets lib=mylib memtype=data;
   modify class;
     attrib _all_ label=' ';
     attrib _all_ format=;
contents data=work.class;
run;
quit;

Subhro Kar

Been in the realm with the professionals of the IT industry. I am passionate about Coding, Blogging, Web Designing and deliver creative and useful content for a wide array of audience.

Click Here to Leave a Comment Below
Aashish - May 5, 2020 Reply

Thank you!! Really informative and helpful.

Hi nice website https://google.com - May 6, 2020 Reply

Hi nice website https://google.com

Leave a Comment: