proc import

PROC IMPORT in SAS: A Complete Guide

  • Post author:
  • Post category:Base SAS
  • Post comments:0 Comments
  • Reading time:20 mins read

Do you want to import data using Proc Import?

Introduction to Proc Import in SAS

If you’ve ever been puzzled about how to seamlessly import data into your SAS environment, then you’ve come to the right place.

PROC IMPORT in SAS is used to read data to SAS. Reading data from an external file is the most frequent task of a SAS programmer.

Before we start, remember that the PROC IMPORT procedure can import data only if SAS and SAS support that data type can import only numeric and character data types.

So, first, look at the syntax of Proc Import.

PROC IMPORT DATAFILE=filename OUT=sas-dataset<data-set-options> 
DBMS= data-source-identifier REPLACE;
SHEET= sheet-name GETNAMES=Yes/No DATAROW=n;
RANGE=range-values
RUN;

Arguments:

DATAFILE=filename: It is used to specify the complete path and filename or fileref for the input file. A fileref is a SAS name associated with the output file’s physical location.

Note: A pathname for a file can have a maximum length of 201 characters.

OUT= SAS data-set: This tells SAS to create a SAS dataset. By default, the SAS dataset is created in your work library. You can also specify a two-level SAS dataset name.

For more information, see our guide on SAS Libraries.

DBMS=data-source-identifier: This is an optional argument used to specify the data type to import. To import a DBMS table, specify DBMS = using a supported database identifier.

For example, you must specify TAB as the identifier to import a tab-delimited file. If you have a delimited file that does not end in .csv, specify the DLM as the identifier and the  delimiter= option. The default delimiter is blank.

The table below shows some of the values frequently used for the DBMS identifier.

Identifier Output Data Source Extension
CSV Delimited file (comma-separated values) .csv
DLM Delimited file (other than blank) .dat or .txt
TAB Delimited file (tab-delimited values) .txt
XLS Excel 97-2003 workbooks .xls
XLSX Microsoft Excel 2007 and later .xlsx
ACCESS Microsoft Access 2000 and later .mdb

You can find the Other DBMS here in the SAS Help Center.

REPLACE: This is also an optional argument. If you specify the REPLACE option, it overwrites an existing SAS data set.

If you do not specify REPLACE, the IMPORT procedure and the dataset already exists, the PROC IMPORT procedure does not overwrite the existing dataset. A NOTE is written in the log with the below message:

NOTE: Import canceled. Output dataset dataset-name already exists. 
Specify REPLACE option to overwrite it.

SHEET: When importing data from a specified sheet in excel, you can use this option to specify the sheet name. By default, SAS will import sheet1 from excel.

GETNAMES: SAS imports the first row of an excel sheet as the Variable name for the SAS variable. Similarly, if you specify the No option, it will tell SAS not to use the first row of data as variable names, and instead, SAS creates variable names as VAR1, VAR2, and so on.

DATAROW= Using this option, you can specify the starting row from where SAS would import the data. If you omit this option, SAS will import data starting from the 1st row of excel.

An important point to note here is:

When GETNAMES=YES, DATAROW must be greater than or equal to 2.
When GETNAMES=NO, DATAROW must be greater than or equal to 1

RANGE= For specifying the range of rows and columns of an excel sheet, use the range option with the sheet range as arguments.

Importing a Delimited File

A delimited file is a plain text file that contains a separator between the data fields. Delimiters can be commas, tabs, or other characters such as a pipe character or an exclamation point. See the example below of a raw text file in which each data field is delimited with the ‘|’ symbol.

FirstName|LastName|Gender|Country|Age
Dulce|Abril|Female|United States|32
Mara|Hashimoto|Female|Great Britain|25
Philip|Gent|Male|France|36
Kathleen|Hanner|Female|United States|25
Nereida|Magwood|Female|United States|58

Reading delimited text files into SAS is a much easier process. The below code snippets are used to import a delimited .txt file where the records are separated by ‘|’.

Note the use of DBMS=DLM and delimiter='|' option.

proc import datafile="/home/9to5sas/examples/data.txt" dbms=dlm 
        out=mydata replace;
    delimiter='|';
    getnames=yes;
run;

proc import in sas

Importing a Tab-Delimited File into SAS

A tab is specified by its hexadecimal value. The value for ASCII systems (UNIX, Windows, and Linux) is ’09’x. For EBCDIC systems (z/OS and MVS), the value is ‘05’x.

As an example, the syntax to specify a tab delimiter on an ASCII system is DLM=’09’x or delimiter='09'x

Note : The positioning of the quotation marks and the x in hexadecimal values is critical. Do not give any space between the x and the quotation marks.

The below code snippet is an example of importing a Tab-delimited file in SAS.

PROC IMPORT DATAFILE= "/home/9to5sas/examples/data.txt"
OUT= mydata replace
DBMS=dlm;
delimiter='09'x;
GETNAMES=YES;
RUN;

Importing a Space-Delimited File

Similar to the tab, the hexadecimal value for a space in the ASCII system is ’20x’. Hence, to import a space-delimited file, specify delimiter = '20'x

PROC IMPORT DATAFILE= "/home/9to5sas/examples/data.txt"
OUT= mydata replace
DBMS=dlm;
delimiter='20'x;
GETNAMES=YES;
RUN;

Importing a Comma-Delimited File with TXT extension

To import a comma-separated file that has a  txt extension, specify delimiter = ‘,’

PROC IMPORT DATAFILE= "/home/9to5sas/examples/data.txt"
OUT= mydata replace
DBMS=dlm;
delimiter=',';
GETNAMES=YES;
RUN;

Importing a Specific Delimited File Using a Fileref

When you use a fileref to specify a delimited file to import, the logical record length (LRECL) defaults to 256 unless you specify the LRECL= option in the FILENAME statement. The maximum LRECL that the PROC IMPORT procedure supports is 32767.

FirstName LastName Gender Country Age
Dulce Abril Female 'United States' 32
Mara Hashimoto Female 'Great Britain' 25
Philip Gent Male France 36
Kathleen Hanner Female 'United States' 25
Nereida Magwood Female 'United States' 58
filename mydata2 '/home/9to5sas/examples/data2.txt' lrecl=100;
proc import datafile=mydata2 dbms=dlm out=mydata2 replace;
    getnames=yes;
run;

Proc Import Output

Importing a Comma-Delimited File with a CSV Extension

By default PROC IMPORT procedure recognizes .csv as an extension for a comma-separated file so if you are importing a .csv file DBMS option is not required. However, it is required if you are importing a.txt file that has comma-delimited data.

proc import datafile="/home/9to5sas/examples/mydata3.csv"
        out=mydata3;
run;

Proc Import Output

Importing a file containing multiple delimiters

It is also possible to I’m[ort a file containing multiple delimiters. If two or more delimiters are present in the input file, quote each delimiter following the delimeter= option.

As in the below example, if the input file has both comma and tab as the delimiter, you need to quote each of the delimiters – delimiter=','09'x '

PROC IMPORT DATAFILE= "/home/9to5sas/examples/mydata.txt"
OUT= outdata
DBMS=dlm
REPLACE;
delimiter=','09'x ';
GETNAMES=YES;
RUN;

Importing records from a specified row

In some scenarios, you can have Variable names and data beginning at later other than row 1. For example, a delimited file has the variable names on row 4 and the data starts on row 5.

PROC IMPORT in SAS: A Complete Guide

Using PROC IMPORT, you can specify the row number at which SAS should begin reading the data using the DATAROW= option.

Although you can specify which record to start within PROC IMPORT, you cannot extract the variable names from any other row except the first row of an external file that is Xls or backwards compatible.

SAS will automatically rename variable names in the form VARx (where x is a sequential number) in the SAS Enterprise Guide and Sequential Alphabets in SAS Studio.

Here is an example.

proc import datafile="/home/9to5sas/external_files/class.xlsx" 
out=class dbms=xlsx replace;
datarow=5;
run;

Data will be read from row 5 due to the DATAROW= option.

PROC IMPORT in SAS: A Complete Guide

Importing variable names other than the first row

Suppose you have variable names starting from the 2nd row in an excel file. In this case, you can use DATAROW= and STARTROW= or NAMEROW and DATAROW=

proc import datafile="/home/9to5sas/external_files/class.xls" 
out=class dbms=xls replace;
NAMEROW=4;datarow=5;
run;

NAMEROW=2 tells SAS to read variable names from the second row, and STARTROW=3 is used to read values starting from the third row. You need to use both options. Otherwise, the variable names will also be read in the 2nd observation.

Note: STARTROW and NAMEROW only work with XLS and not with XLSX format. For XLSX formats, you can use the RANGE= option discussed later in this post.

Using the RANGE= option

The range option is used to subsets a specified section of an Excel file. RANGE="Sheet1$A4:E7" tells SAS to import data from range A4:E7 from sheet1.

If you omit the RANGE= statement, the entire sheet will be imported, and you might get more columns or missing data than you want. The sheet selected also might not be the one that you wanted. Therefore, it is strongly recommended that you use the RANGE= statement.

If you use GETNAMES=YES, then the first row of data in the range is used for the column names, and the data starts from the second row in the range. By default, the data starts from the first row, and column names are generated by the IMPORT procedure.

proc import datafile="/home/subhroster20070/9to5sas/external_files/class.xls" 
        out=class dbms=xls replace;
    range="Sheet1$a4:e7";
run;

Using Name Range

The range- the name is the name that is assigned to a range address within the sheet. Range names are not case-sensitive. The range address is identified by the top-left cell that begins the range and the bottom-right cell that ends the range within the Excel worksheet file. You can create your defined name.

To create a name, select a cell or range of cells, click on the box above Column A, Type any name you want, and press Enter.
In the example below, Range A4:E7 is selected, and then a name of “class” is given.

PROC IMPORT in SAS: A Complete Guide

The beginning and ending cells are separated by two periods. The range address A15:E28 indicates a cell range that begins at Cell a15 ends at Cell E28 and includes all cells in between.

proc import datafile="/home/9to5sas/external_files/class.xls" 
out=class dbms=xls replace;
range="class";
run;

A worksheet in a .xls file can save up to 256 columns and 65,536 rows. Excel 2007 and later files (.xlsx files) have been enhanced to support 16,384 columns and 1,048,576 rows in a worksheet.

Using the GUESSINGROWS= option

SAS scans the first 20 rows for delimited files to determine the variable type and length attributes. You can increase the number of scanned rows by using the GUESSINGROWS= statement.

Here is a scenario.

Suppose you have the first 30 rows numeric, and the remaining are character values. SAS would make all the character values blank. To avoid this, you can use specify the GUESSINGROWS=100 option.

See the column last name and Age in the below input file.

input file

Row Number 29 has a character value in the Age field, and Row Number 35 has a length more than the first 20 rows.

Proc Import guessing rows

While importing this file, SAS writes an error message in the log. This is because SAS has determined the datatype of Age as Numeric by scanning the first 20 rows. And there is a character variable in the Numeric field. Hence the error.

Also, see the output below where there is a missing value at observation 28, and at observation 34, the value of the last name variable has been truncated.

guessingrows

To avoid these errors and truncation of values, use the guessing rows option with a value of more than the rows in your data or use guessingrows=Max.

Note that if your input data contains thousands of records, it will slow the import process. Here, we have used. GUESSINGROWS=100.

proc import datafile="/home/9to5sas/examples/data8.csv"
        out=mydata5 replace;
        guessingrows=100;
run;

So, this is our importing data using the Proc Import Procedure. We hope that you must have found it useful.

Moreover, if you have any other suggestions regarding other tips or tricks, suggest us below the comment section. We would take those lists in our further blog post.

Thanks for reading!

If you liked this article, you might also want to read the article on Column Input In SAS.

Do you have any tips to add? Let us know in the comments.

Please subscribe to our mailing list for weekly updates. You can also find us on Instagram and Facebook.

Every week we'll send you SAS tips and in-depth tutorials

JOIN OUR COMMUNITY OF SAS Programmers!

Subhro

Subhro provides valuable and informative content on SAS, offering a comprehensive understanding of SAS concepts. We have been creating SAS tutorials since 2019, and 9to5sas has become one of the leading free SAS resources available on the internet.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.