Professional Documents
Culture Documents
SUMMARY OF PROGRAMMING TASKS 1. Read long records from a flat file, keeping
only selected records
This paper presents efficiency techniques for the
following programming tasks. Task
1. Create a SAS data set by reading long records from Create a SAS data set by reading long records from a flat
a flat file with an INPUT statement. Keep selected file with an INPUT statement. Keep selected records
records based on the values of only a few incoming based on the values of only a few incoming variables.
variables.
Technique
2. Create a new SAS data set by reading an existing
SAS data set with a SET statement. Keep selected First, read only the variables needed to determine if the
observations based on the values of only a few record should be kept. Test the values of the variables,
incoming variables. and only read the rest of the record if necessary.
For OR operators or an IN operator, order the conditions 2. Use a WHERE statement to select observations in
in descending order of likelihood. Put the condition which CTRY is equal to 'czech' or 'slovakia'. The
most likely to be true first, the condition second most following two examples are efficient because they order
likely to be true second, and so on. the values of CTRY from most likely to least likely. The
two WHERE statements are equally efficient.
For AND operators, order the conditions in ascending
order of likelihood. Put the condition most likely to be Using an IN operator.
false first, the condition second most likely to be false
second, and so on. data two ;
set one ;
When the SAS system processes an IF, WHERE, DO where ctry in('czech','slovakia') ;
WHILE, or DO UNTIL statement, it tests the minimum
number of conditions needed to determine if the Using an OR operator.
statement is true or false. This technique is known as
Boolean short circuiting. Prior information is often data two ;
available about data being processed. Use this set one ;
information to improve program efficiency. Order the where ctry = 'czech' or ctry = 'slovakia' ;
conditions in IF, WHERE, DO WHILE, and DO UNTIL
statements to take advantage of Boolean short circuiting. 3. Use a WHERE statement to select observations in
which CTRY is equal to 'czech', INV is greater than 900,
An IN operator and a series of OR operators are equally and TAX is greater than 500. The following example is
efficient, though the IN operator can make programs efficient because it orders the conditions from most
easier to read. likely to be false to least likely to be false.
Previous sections of this paper demonstrated the Example 2. Select observations in which the value of the
WHERE statement. This section describes WHERE character variable NAME is not missing. The following
statement operators, several useful operators that can be three statements are equivalent.
used only in WHERE statements.
where name is not null ;
5.1. BETWEEN-AND operator. where name is not missing ;
where name ne "" ;
The BETWEEN-AND operator is used to test whether the
value of a variable falls in an inclusive range. This The IS MISSING operator allows you to test whether a
operator provides a convenient syntax but no additional variable is missing without knowing if the variable is
functionality. The following three statements are numeric or character, preventing the errors generated by
equivalent. the following statements.
where salary between 4000 and 5000 ; ! This statement generates an error if NAME is a
where salary >= 4000 and salary <= 5000 ; character variable.
where 4000 <= salary <= 5000 ;
where name = . ;
5.2. CONTAINS or question mark (?) operator.
! This statement generates an error if GNP is a
The CONTAINS operator is used to test whether a numeric variable.
character variable contains a specified string.
CONTAINS and ? are equivalent. where gnp = "" ;
A numeric variable whose value is a special missing 5.5. Sounds-like (=*) operator.
value (a-z or an underscore) is recognized as missing by
the IS MISSING operator. The Sounds-like (=*) operator uses the Soundex
algorithm to test whether a character variable contains a
5.4. LIKE operator. spelling variation of a word. This operator can be used
to perform edit checks on character data by checking for
The LIKE operator is used to test whether a character small typing mistakes, and will uncover some but not all
variable contains a specified pattern. A pattern consists of the mistakes.
of any combination of valid characters and the following
wild card characters. In the following example, the WHERE statement keeps
all but the last two input records.
! The percent sign (%) represents any number of
characters (0 or more). data one;
input bankname $1-20;
! The underscore (_) represents any single character. length bankname $20;
cards;
The LIKE operator provides some of the functionality of new york bank
the UNIX command grep. It is much more powerful than new york bank.
the SAS functions INDEX or INDEXC, which must be NEW YORK bank
used multiple times to search for complex character new york bnk
patterns. The LIKE operator is case sensitive; upper and new yrk bank
lower case characters are not equivalent. ne york bank
neww york bank
Example 1. Select observations in which the variable nnew york bank
NAME begins with the character 'J', is followed by 0 or ew york bank
more characters, and ends with the character 'n'. John, new york mets
Jan, Jn, and Johanson are selected, but Jonas, JAN, and ;
AJohn are not selected. run;
Example 2. Select observations in which the variable To identify cases where leading characters are omitted
NAME begins with the character 'J', is followed by any from a character variable, use the Sounds-like operator
single character, and ends with the character 'n'. Jan is multiple times in a WHERE statement, removing one
selected, but John, Jonas, Jn, Johanson, JAN, and AJohn additional character in each clause of the WHERE
are not selected. statement, as follows.
where gnp > 100 and inv < 10 and con< 20 ; data two ;
set one (keep = a) ;
Notes b = a*1000 ;
run;
1. See pages 498-504 in the "SAS Language Reference,
Version 6, First Edition," for more information about In method 1, 1001 variables are read into the PDV from
WHERE statement operators. data set ONE, and 1002 variables are written to data set
TWO. 1000 of the variables are not needed.
(drop = x1-x1000);
4. In method 2, either of the following statements are variables or variables with the same name but different
equivalent to KEEP A B ;. lengths or types (character versus numeric).
Concatenate SAS data set TWO to SAS data set ONE. Method 1, less efficient.
ACKNOWLEDGMENTS
TRADEMARK INFORMATION