You are on page 1of 26

three basic

# input styles:
1 list
2 column
3 formatted

SAS has now reached the end of the DATA step, and the program automatically d
the following:
writes the rst observation to the data set
loops back to the top of the DATA step to begin the next iteration
increments the _N_ automatic variable by 1
resets the _ERROR_ automatic variable to 0
except for _N_ and _ERROR_, sets variable values in the program data vector to
missing values, as the following gure shows

ram automatically does

Program: Basic List Input


data club1;
input IdNumber Name $ Team $ StartWeight EndWeight;
datalines;
1023 David red 189 165
1049 Amelia yellow 145 124
1219 Alan red 210 192
1246 Ravi yellow 194 177
1078 Ashley red 127 118
1221 Jim yellow 220 .
;
proc print data=club1;
title Weight of Club Members;
run;

List Input: Points to Remember

:The points to remember when you use list input are


.Use list input when each eld is separated by at least one blank space or delimiter
.Specify each eld in the order that they appear in the records of raw data
Represent missing values by a placeholder such as a period. (Under the defaultbehavior, a blank
.(mismatched
.Character values cannot contain embedded blanks
The default length of character variables is eight bytes. SAS truncates a longer value when it writ
. (variable that contains more than eight characters with list input, use a LENGTH statement
.Data must be in standard character or numeric format (that is, it can be read without an informa

When the Data Is Delimited by Characters, Not Blanks


options pagesize=60 linesize=80 pageno=1 nodate;
data club1;
inle datalinesv dlm=,;
input IdNumber Name $ Team $ StartWeight EndWeight;
datalines;
1023,David,red,189,165
1049,Amelia,yellow,145,124
1219,Alan,red,210,192
1246,Ravi,yellow,194,177
1078,Ashley,red,127,118
1221,Jim,yellow,220,.
;
proc print data=club1;
title Weight of Club Members;
run;

ce or delimiter
data
he defaultbehavior, a blank eld causes the variable names and values to become

s a longer value when it writes the value to the program data vector. (To read a character
e a LENGTH statement
be read without an informat)

er

Understanding How to Make List Input More Flex


Creating Longer Variables Special Characters

By simply modifying list input with the colon format modi


character data that contains more than eight characters
numeric data that contains special characters.
data january_sales;
input Item : $12. Amount : comma5.;
datalines;
Trucks
1,382
Vans
1,235
Sedans
2,391
SportUtility
987
;
proc print data=january_sales;
title January Sales in Thousands;
run;

st Input More Flexible

Data set created with modified list input


Reading Character Data T

on format modier (:) you can read

t characters

the ampersand format mod


you can use list input to read data th
blanks. The only restriction is that at
next data value in the record.
data club2;
input IdNumber
Name & $18. Team $ StartWeight En
datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 12
1221 Jim Brown yellow 220 .
;
proc print data=club2;
title Weight Club Members;
run;

nput
Character Data That Contains Embedded Blanks

rsand format modier (&) for reading variable with a space in between.

list input to read data that contains single embedded


only restriction is that at least two blanks must divide each value from the
alue in the record.

8. Team $ StartWeight EndWeight;

Shaw red 189 165


a Serrano yellow 145 124
own yellow 220 .

ata=club2;
Club Members;

Program: Reading Data Aligned in Columns


data club1;
input IdNumber 1-4 Name $ 6-11 Team $ 13-18 StartWeight 20-22
EndWeight 24-26;
datalines;
1023 David red 189 165
1049 Amelia yellow 145
1219 Alan red 210 192
1246 Ravi yellow 177
1078 Ashley red 127 118
1221 Jim yellow 220
;
proc print data=club1;
title Weight Club Members;
run;

Column Input: Points to Remember


Remember the following rules when you use column input:
Character variables can be up to 32,767 bytes (32KB) in length and are not limited
to the default length of eight bytes.
Character variables can contain embedded blanks.
You can read elds in any order.
A placeholder is not required to indicate a missing data value. A blank eld is
read as missing and does not cause other values to be read incorrectly.
You can skip over part of the data in the data record.
You can reread elds or parts of elds.
You can read standard character and numeric data only. Informats are ignored.

Program: Reading Data That Requires Special Instructions


data january_sales;
input Item $ 1-16 Amount
comma5.;
datalines;
trucks
1,382
vans
1,235
sedans
2,391
;
proc print data=january_sales;
title January Sales in Thousands;
run;

In the next example, SAS reads data lines by using formatted input wit
column-pointer control:
In the following pr
data january_sales;
input Item $10.

pointer to move six colum


data january_sales;

@17 Amount comma5.;

input Item $10.

+6 Amount comma5.;
datalines;
trucks 1,382
datalines;
vans 1,235
trucks 1,382
sedans 2,391
vans 1,235
;
sedans 2,391
After SAS reads the rst value for the variable Item, the pointer is left in the ;
position, column 11. The absolute column-pointer control, @17, then directs The data in these two pro
to move to column 17 in the input buffer. Now, it is in the correct position toinstruct the pointer to mo
value for the variable Amount.
specications; with forma
together with pointer cont

Formatted Input: Points to Remember


Remember the following rules when you use formatted input:
SAS reads formatted input data until it has read the number of columns that the
informat indicates. This method of reading the data is different from list input,
which reads until a blank space (or other dened delimiter character) is reached.
You can position the pointer to read the next value by using pointer controls.
You can read data stored in nonstandard form such as packed decimal, or data
that contains commas.
You have the exibility of using informats with all the features of column input, as
described in Column Input: Points to Remember on page 50.

rmatted input with a

n the following program, the relative column-pointer control, +6, instructs the

ointer to move six columns to the right before SAS reads the next data value.
ata january_sales;

nput Item $10.

+6 Amount comma5.;

rucks 1,382
ans 1,235
edans 2,391

he data in these two programs is aligned in columns. As with column input, you
nstruct the pointer to move from eld to eld. With column input you use column
pecications; with formatted input you use the length that is specied in the informat
ogether with pointer controls.

, +6, instructs the

Testing a Condition before Creating an Observation


Using the

Single Trailing @ Line-Hold Specier

To read from a record twice, you must prevent SAS from automatically placing a new
record into the input buffer when the next INPUT statement executes. Use of a trailing
@ in the rst INPUT statement serves this purpose.

The trailing @ is one of two

line-hold speciers that enable you to hold a record in the input buffer for further
processing.

data red_team;
input Team $ 13-18 @;
if Team=red;
input IdNumber 1-4 StartWeight 20-22 EndWeight 24-26;
datalines;
1023 David red 189 165
1049 Amelia yellow 145 124
1219 Alan red 210 192
1246 Ravi yellow 194 177
1078 Ashley red 127 118
1221 Jim yellow 220 .
;
proc print data=red_team;
title Red Team;
run;

Using the

Double Trailing @ Line-Hold Specier

Sometimes you may need to create multiple observations from a single record o
data. One way to tell SAS how to read such a record is to use the other line-hold

one of two

specier, the double trailing at-sign (@@ or double trailing @). The double tra

er for further

not only prevents SAS from reading a new record into the input buffer when a ne
INPUT statement is encountered, but it also prevents the record from being relea
when the program returns to the top of the DATA step. (Remember that the trail
does not hold a record in the input buffer across iterations of the DATA step.)
data body_fat;
input Gender $ PercentFat
@@;
datalines;
m 13.3 f 22
m 22 f 23.2
m16 m12
;
proc print data=body_fat;
title Results of Body Fat Testing;
run;

Results of Body Fat Testing


Percent
Obs Gender Fat
1 m 13.3
2 f 22.0
3 m 22.0
4 f 23.2
5 m 16.0
6 m 12.0

Each record is held in the input buffer until the end of th

ns from a single record of raw


o use the other line-hold

iling @). The double trailing @

he input buffer when a new


e record from being released
Remember that the trailing @
ns of the DATA step.)

Body Fat Testing

until the end of the record is reached

Reading Multiple Records to Create a Single Observation


consider situation: when information for a single observation contained in a single record of raw data

Method 1: Using Multiple Input Statements

Method 2: Using the / Line-P

data club2;
input IdNumber 1-4;
input;
input StartWeight 1-3 EndWeight 5-7;
datalines;
1023 David Shaw
red
189 165
1049 Amelia Serrano
yellow
145 124
1219 Alan Nance
red
210 192
1246 Ravi Sinha
yellow
194 177
1078 Ashley McKnight
red
127 118
1221 Jim Brown
yellow
220 .
;
proc print data=club2;
title Weight Club Members;
run;

data club2;
input IdNumber 1-4 / / StartWeight 1
datalines;
1023 David Shaw
red
189 165
1049 Amelia Serrano
yellow
145 124
1219 Alan Nance
red
210 192
1246 Ravi Sinha
yellow
194 177
1078 Ashley McKnight
red
127 118
1221 Jim Brown
yellow
220 .
;
proc print data=club2;
title Weight Club Members;
run;

ingle record of raw data but is scattered across several records.

2: Using the / Line-Pointer Control

ber 1-4 / / StartWeight 1-3 EndWeight 5-7;

McKnight

ata=club2;
Club Members;

Using

#n Line pointer control

data club2;
input #2 Team $ 1-6 #1 Name $ 6-23 IdNumber
#3 StartWeight 1-3 EndWeight 5-7;
datalines;
1023 David Shaw
red
189 165
1049 Amelia Serrano
yellow
145 124
1219 Alan Nance
red
210 192
1246 Ravi Sinha
yellow
194 177
1078 Ashley McKnight
red
127 118
1221 Jim Brown
yellow
220 .
;
proc print data=club2;
title Weight Club Members;
run;

Weight Club Members 1


Id Start End
Obs Team Name Number Weight Weight
1 red David Shaw 1023 189 165
2 yellow Amelia Serrano 1049 145 124
3 red Alan Nance 1219 210 192
4 yellow Ravi Sinha 1246 194 177
5 red Ashley McKnight 1078 127 118
6 yellow Jim Brown 1221 220 .

ointer control

1 Name $ 6-23 IdNumber 1-4


dWeight 5-7;

er Weight Weight

o 1049 145 124

46 194 177
1078 127 118

Problem Solving: When an Input Record Unexpectedly Does Not Have Enough Values
Understanding the Default Behavior
When a DATA step reads raw data from an external le, problems can occur when
SAS encounters the end of an input line before reading in data for all variables
specied in the input statement. This problem can occur when reading variable-length
records and/or records containing missing values.
The following is an example of an external le that contains variable-length records:

3
44
555
This DATA step uses the numeric informat 5. to read a single eld in each record of
raw data and to assign values to the variable TestNumber:
data numbers;
inle your-external-le;
input TestNumber 5.;
run;
proc print data=numbers;
title Test DATA Step;
run;
The DATA step reads the rst value (22). Because the value is shorter than the 5
characters expected by the informat, the DATA step attempts to nish lling the value
with the next record (333). This value is entered into the PDV and becomes the value of
the TestNumber variable for the rst observation. The DATA step then goes to the next
record, but encounters the same problem because the value (4444) is shorter than the
value that is expected by the informat. Again, the DATA step goes to the next record,
reads the value (55555), and assigns that value to the TestNumber variable for the
second observation.
The following output shows the results. After this program runs, the SAS log
contains a note to indicate the places where SAS went to the next record to search for
data values.
Output
Test DATA Step 1
Test
Obs Number
1 333
2 55555

Methods of Control: Your Options

Four Options: FLOWOVER, STOPOVER, MISSOVER, and TRUNCOVER


To control how SAS behaves after it attempts to read past the end of a data line, you
can use the following options in the INFILE statement:
infile your-external-file flowover;
is the default behavior.
infile your-external-file stopover;
causes the DATA step to stop processing if an INPUT statement reaches the end of
the current record without nding values for all variables in the statement.
infile your-external-file missover;
prevents the DATA step from going to the next line if it does not nd values in the
current record for all of the variables in the INPUT statement. Instead, the DATA
step assigns a missing value for all variables that do not have values.
infile your-external-file truncover;
causes the DATA step to assign the raw data value to the variable even if the
value is shorter than expected by the INPUT statement. If, when the DATA step
encounters the end of an input record, there are variables without values, the
variables are assigned missing values for that observation.

The following example uses the MISSOVER option:


3.
4 55555

The
following example demonstrates the use of the TRUNCOVER statement:
3 4444
4 55555

ER statement:

You might also like