Professional Documents
Culture Documents
SAS Training
Data Entry, Retrieval and Management Report writing and graphics Statistical and Mathematical Analysis Business planning, Forecasting and Decision support Operations research and Project management Quality improvement Applications development
The core of the SAS system is base SAS software, which consists of:
SAS Informats
SAS Formats Variables Functions Statements Miscellaneous(SAS Programs, Outputs, Log and Errors )
SAS GUI
1. 2.
Project Designer: Project Explorer: Code Editor: Server List: Log Window: Output Window:
Shows the Process Flow of a Project in Flow charts Shows the Process Flow of a Project as Drop Down Menu Used to write and Edit codes Show the Physical Storage Locations of Data Information about the execution of a program and Lists the errors while execution Displays the output of execution of a program
3. 4. 5.
6.
SAS Programs
SAS programs can be used to access, manage, analyze, or present your data
SAS statements can be specified in uppercase or lowercase In most situations, text that is enclosed in quotation marks is case sensitive
SAS Libraries
Every SAS file is stored in a SAS library SAS Library is a collection of SAS files A SAS data library is the highest level of organization for information within SAS In the Windows and UNIX environments, a library is typically a group of SAS files in the same folder or directory.
Temporary library
Permanent library
Depending on the library name that is used when create a file, we can store SAS files temporarily or permanently
Temporary Library:
No specific library name is used while creating a file Specify the library name as Work
Permanent Library:
Its the Permanent storage location of data files Permanent SAS libraries are available in subsequent SAS sessions Permanent SAS data libraries are stored until delete them To store files permanently in a SAS data library:
Specify a library name Other than the default library name Work
To create a permanent library use libname statement It creates a reference to the path where SAS files are stored The LIBNAME statement is global, which means that the librefs remain in effect until modify them , cancel them, or end your SAS session The LIBNAME statement assigns the libref for the current SAS session only Assign a libref to each permanent SAS data library each time a SAS session starts SAS no longer has access to the files in the library, once the libref is deleted or SAS session is ended. Contents of Permanent library exists in the path specified
Syntax :
It can be 1 to 8 characters long Begins with a letter or underscore Contains only letters, numbers, or underscores
Example:
Here,
Taxes
A library reference name This keyword assigns the libref taxes to the folder called training in the path:
libname -
Data lib1.emp; Length name$ 12; Input id name$ doj sal; Informat doj mmddyy8. sal dollar7.; Format doj date9. sal dollar7.; Label id = Employee Id name = Employee Name doj = Date of Joining Sal = Salary; Cards; 1076 abcasdayut 12/23/05 $10,000 1983 aaaertgr 07/12/98 $40,000 1723 xyzasdsf 04/15/98 $25,000 ; Run;
Rules for SAS Data Set Names: SAS data set names :
can be 1 to 32 characters long must begin with a letter (AZ, either uppercase or lowercase) or an underscore (_) can continue with any combination of numbers, letters, or underscores.
Descriptor Portion:
The descriptor portion of a SAS data set contains information about the data set, including:
The name of the data set The date and time that the data set was created The number of observations The number of variables.
Data Set Name: CLINIC.INSURE Member Type: DATA Engine: V8 Created: 10:05 Tuesday, March 30, 1999 Observations: 21 Variables: 7 Indexes: 0 Observation Length: 64
Data Portion:
Example: Here, Jones is a data value, the weight 158.3 is a data value, and so on
Observations:
Rows are called observations in SAS It is a Collections of data values that usually relate to a single object in SAS Data Sets The values Jones, M, 48, and 128.6 constitute a single observation in the data set shown below
Variables:
Columns are called variables in SAS It is a collection of values that describe a particular characteristic The values Jones, Laverne, Jaffe and Wilson contribute the variable Name in the data set shown below
Missing Values:
If a data is unknown for a particular observation, a missing value is recorded . (called period) indicates missing value of a numeric variable (blank) indicates missing value of a character variable
Variable Attributes:
In addition to general information about the data set, the descriptor portion contains information about the attributes of each variable in the data set The attribute information includes the variable's: Name Type Length Format Informat Label
Example: Listing of the attribute information in the descriptor portion of the SAS data set Clinic.Insure Variable Type Length Format Policy Total Name Num Num Char 8 8 20 Informat Policy Label Number
Name:
Variable names follow exactly the same rules as SAS data set names
Like data set names, variable names:
Can be 1 to 32 characters long Must begin with a letter (AZ, either uppercase or lowercase) or an underscore (_) Can continue with any combination of numbers, letters, or underscores.
Type:
A variable's type is either character or numeric Character variables, such as Name (shown below), can contain any values Numeric variables, such as Policy and Total (shown below), can contain only numeric values (the digits 0 through 9, +, -, ., and E for scientific notation)
Length:
A variable's length (the number of bytes used to store it) is related to its type Character variables can be up to 32,767 bytes long In the example below, Name has a length of 20 characters and uses 20 bytes of storage. All numeric variables have a default length of 8 Numeric values (no matter how many digits they contain) are stored as floating-point numbers in 8 bytes of storage, unless specify a different length.
Format:
A Format is an instruction that SAS uses to write data values Format is used to control the written appearance of data values, or in some cases, to group data values together for analysis SAS software offers a variety of character, numeric, and date and time formats Formats can be created and stored Can permanently assign a format to a variable in a SAS data set, or can temporarily specify a format in a PROC step to determine the way the data values appear in the output
Informat:
Used to Read data values in certain formats into standard SAS values It determines how data values are read into a SAS data set Informats are used to read numeric values that contain letters or other special characters
Label:
A variable can have a label consisting of descriptive text up to 256 characters long By default, many reports identify variables by their names To display more descriptive information about the variable assign a label to that variable
Example:
Label Policy as Policy Number, Total as Total Balance, and Name as Patient Name to display these labels in reports
Two-Level Names:
Two-level name are used to reference a permanent SAS file in SAS programs
Example:
Clinic.Admit is the two-level name for the SAS data set Admit Admit is assigned to the library named Clinic
To reference temporary SAS files specify the default libref Work, a period, and the filename
Example: Here, The two-level name Work.Test references the SAS data set named Test that is stored in the temporary SAS library Work
One-Level name
One-level name (the filename only) can be used to reference a file in a temporary SAS library When a one-level name is used, the default libref Work is assumed
Example: Here, The one-level name Test also references the SAS data set named Test that is stored in the temporary SAS library Work.
Data Step:
Typically create or modify SAS data sets and they can also be used to produce custom-designed reports
Compute values
Check for and correct errors in data Produce new SAS data sets by subsetting, merging, and updating existing data sets
Proc Step:
They pre-written routines that enable us to analyze and process the data in a SAS data set and to present the data in the form of a report PROC steps sometimes create new SAS data sets that contain the results of the procedure PROC steps can list, sort, and summarize data
DATA steps typically create or modify SAS data sets Can also be used to produce custom-designed reports. SAS DATA steps can be used to:
put data into a SAS data set compute values check for and correct errors in your data produce new SAS data sets by subsetting, merging, and updating existing data sets.
Entering data as input Reading existing raw data Accessing external files (files that were created by other software)
The fig below shows how to design and write a DATA step program to create a SAS data set from raw data that is stored in an external file
Syntax:
DATA <dataset1> ; SET <dataset2> ; Where, dataset1 is the Destination Data Set
Data can be entered into SAS data set directly through SAS program
Reading instream data is useful when to create data and test programming statements on a few observations
To read instream data use:
DATALINES statement as the last statement in the DATA step (except for the RUN statement) and immediately preceding the data lines a null statement (a single semicolon) to indicate the end of the input data
Only one DATALINES statement can be used in a DATA step Use separate DATA steps to enter multiple sets of data If the data contains semicolons, use the DATALINES4 statement plus a null statement that consists of four semicolons (;;;;) to indicate the end of the input data
Syntax:
DATA <datasetname>; INPUT <variablename1>[$] <variablename2>[$] ; DATALINES; . . data lines go here . . ; run ;
After the DATALINES statement specify the data values After typing in the values give a semicolon to indicate the end of the data values. Can also use Cards instead of datalines
Example:
Data emp_details ; Input id name$ age ; Datalines ; 2458 Murray, W 2462 Almers, C 2501 Bonaventure, T 2523 Johnson, R 2539 LaMance, K 2544 Jones, M
; run ; Here,
42 38 48 39 45 49
A dataset called emp_details is created with variables id, name & age, and having 6 observations
Name is a character variable which is indicated by $ sign after name
SAS GUI can be used to import different file types data such as:
Excel File
Comma separated Files (CSV)
Proc import procedure step can be used to import an external file of different file types
Syntax:
proc import datafile = External file path out= <dataset name> dbms= <file type> replace; delimiter= special character ; getnames= <yes/no> ; datarow= n ; Where, External file path is the path of the external file to import Out= specifies the dataset to be created using the imported file dbms specifies the file type to be imported or dlm if delimited files are imported replace replaces already existing files getnames=yes tells SAS to read the variable names from the first line of the data file delimiter= specifies the delimiter in the external file. It is specified only when the dbms= dlm is specified datarow =n specifies the row from which the data has to read from the external file. Where, n is a number
Comma separated file is a special external file with file extension .csv (comma separated variables)
proc import datafile="comma.csv" out= mydata dbms=csv replace; getnames=no; run;
Here, A comma separated file called comma.csv is imported A new dataset called mydata is created getnames=no indicates that the first row in the file is not variable names replace indicated SAS to replace the existing file mydata
Example 2:
Another way of reading a comma delimited file is to consider a comma as an ordinary delimiter Here is a program that shows how to use the dbms=dlm and delimiter="," proc import datafile="comma1.txt" out=mydata dbms=dlm replace; Delimiter =", ; Getnames =yes ; Datarow =5 ; Run ;
Here,
comma1.txt is a comma separated text file whose variable values are separated by commas dbms=dlm indicates that comma1.txt is a delimiter file delimiter=, indicates the delimiter as , Datarow=5 tell SAS to read data from the 5th row
Here, tab.txt is a tab separated text file dbms=tab indicates tab.txt as tab separated file
Data Understanding
Proc Contents Step:
The CONTENTS procedure is used to create SAS output that describes either of the following:
The contents of a library The descriptor information for an individual SAS data set
Describes the structure of the data set rather than the data values Displays valuable information at the...
Data set level Name Engine Creation date Number of observations Number of variables File size (bytes)
Syntax:
Where,
libref is the libref that has been assigned to the SAS library. _ALL_ requests a listing of all files in the library A period (.) is used to append _ALL_ to the libref NODETAILS (NODS) suppresses the printing of detailed information about each file when _ALL_ is specified. Specify NODS only when you specify _ALL_
Example:
To view the contents of the Mylib library, submit the following PROC CONTENTS step:
The output from this step lists only the names, types, sizes, and modification dates for the SAS files in the Mylib library
To view the descriptor information for the Mylib.Admit data set, submit the following PROC CONTENTS step: proc contents data = mylib .admit ; run ;
The output from this step lists information for Mylib.Admit data set, including an alphabetic list of the variables in the data set
Proc Print:
Prints a listing of the values of some or all of the variables in a SAS data set
Syntax:
proc print data = libref .Datasetname [ (firstobs = n obs = n) split = Special Character double label n noobs ] ; [ Id Variable list ; Var Variable list ; By Variable list ; Sum Varibale list
] Run ; Where,
[ ] are optionals Libref is the library in which Datasetname is the dataset whose values are to be printed
Id -Identify observations by the formatted values of the variables which can be listed instead of observation numbers Var -Select variables that appear in the report and determine their order By - Produce a separate section of the report for each BY group Sum - Total values of numeric variables
Example:
proc print data = candy_products (firstobs=1 obs=16 ) n noobs double label ; id Prodid ; var Prodid Product Category Retail_price ; by Category ; Sum Retail_price ; Run ; Here,
Candy_products is the dataset which is present in work library First observation to 16thobservation are printed (firstobs=1 and obs=16) N gives the number of observation Double - Double spacing between observations printed (only in list input) Label - Prints the label of each variable instead of variable names Id - Prodid becomes the row identifier instead of observation no: Var - Only the variables indicated here are printed By - The outputs are grouped by category Sum - Sum of the Retail_price
We can create a new data set from an existing SAS data set
To create the new data set, read a data set using the DATA step and use the programming features of the DATA step to manipulate data
Store the manipulated data to new data set or the same which will overwrite the existing data
SAS-data-set in the DATA statement is the name (libref.filename) of the SAS data set to be created (Destination Data Set) SAS-data-set in the SET statement is the name (libref.filename) of the SAS data set to be read (Source Data Set)
Example:
libname lab23 c : \ drug\ allergy \ labtests ; libname research c : \ drug \ allergy ; data lab23.drug1h ; set research.cltrials ; Run ; Where
Lab23 and research are two libraries which are created in two different locations The DATA statement creates the permanent SAS data set Drug1H Drug1H will be stored in a SAS data library to which the libref Lab23 has been assigned The SET statement below reads the permanent SAS data set Research.CLTrials.
Syntax 2: Data Transfer from one library to another using Proc Copy
proc copy in = libref1 out = libref2 ; [ select Ds1 Ds2 . . . ; ] run ; Where,
Libref1 is the library from which the data sets are to copied Libref2 is the library to which the data sets are to be copied Select is an option which selects the data sets Ds1, Ds2, etc form libref1 to libref2 If Select is not used, all the data sets from libref1 is copied to libref2
Example:
Here,
Data Set admit is copied from clinic libref to temporary library work
Some of the options for manipulating data are: Firstobs Obs Label Rename Delete Drop Keep by group point= option Output END= option
Firstobs and Obs options are used to select a range of observations from a data set It can be used in both Data step and proc step When used in Data step the selected observation remain in memory When used in proc print step the output displays the selected observations Firstobs specifies the starting no: of the observations to be selected Obs specifies the ending no: of the observations to be selected Firstobs and Obs can be used together to select a range of observations If only Firstobs is specified, observations from that position to the end of file are selected
If only Obs is specified, observations from first to the specified no: are selected
Syntax:
data SAS-Data-Set; Set SAS-Data-Set (firstobs = n obs = n); run; or data SAS-Data-Set (firstobs = n obs = n); Set SAS-Data-Set; run;
Where,
SAS-Data-Set in Data Step is the Destination Data set SAS-Data-Set in Set Step is the Source Data set N ;- Any numeric value Firstobs specifies the observation to start with Obs specifies the last observation Firstobs and Obs options can be used both in Data Step or Set Step
Example:
Here,
91 observations are copied from candy_products in local library to candy_products in work library
Syntax:
Data libname .dataset-name ; Set libname .dataset-name ; Label Variable-Name = < Variable Label>; Rename Variable-Name = <New Variable Name>; Run; or proc print data= libname . Dataset-name Label; Label Variable-Name = <Variable Label>; Run; Where,
Variable Label is assigned to Variable specified by Variable-Name in the Label Statement New Variable Name is assigned to the Variable specified by Variable-Name in the Rename Statement Label in Data step will write the new label in memory for that variable and will be displayed when Label in proc step will only be displayed when that block of proc step is being executed Label option should be specified in proc when using label statement in proc step
Example:
Data demo.class; Set demo.class ; Label sizehh = Size of household; Rename sizehh = sizehouse; Run; proc print data = demo1.class1 Label; label sizehh = Size of Household; run; Here,
Size of household is assigned as label for the variable Sizehh in Data step Sizehh variable is renamed as Sizehouse in Data step
Size of household label is assigned for the variable Sizehh temporarily using proc step which is effective only when that block of code is executed
Rename Statement can be used only in Data step as it is data modification
Drop= and Keep= options in data step can be used to drop and keep variables in that data set
the DROP= or KEEP= option, in parentheses, follows the name of the data set that contains the variables to be dropped or kept variable(s) identifies the variables to drop or keep
Example:
1.
Timemin and Timesec are dropped from the data set clinic.stress data clinic.stress (drop= timemin timesec); Set clinic.stress; Run;
2.
Timemin and Timesec are Kept in the data set clinic.stress data clinic.stress (Keep= timemin timesec); Set clinic.stress; Run;
Another way to exclude variables from data set is to use the DROP statement or the KEEP statement Like the DROP= and KEEP= data set options, these statements drop or keep variables The DROP statement differs from the DROP= data set option in the following ways:
Cannot use the DROP statement in SAS procedure steps The DROP statement applies to all output data sets that are named in the DATA statement. To exclude variables from some data sets but not from others, place the appropriate DROP= data set option next to each data set name that is specified in the DATA statement.
The KEEP statement is similar to the DROP statement, except that the KEEP statement specifies a list of variables to write to output data sets
Use the KEEP statement instead of the DROP statement if the number of variables to keep is significantly smaller than the number to drop
Syntax:
Example: data clinic.stress; Set clinic.stress; drop timemin timesec; Run; Here,
Where statement can be used to select observations during proc step and data step There can be only one WHERE statement in a step
where-expression specifies a condition for selecting observations The where-expression can be any valid SAS expression The WHERE statement works for both character and numeric variables WHERE statement is observation level
To specify a condition based on the value of a character variable: enclose the value in quotation marks write the value with lowercase and uppercase letters exactly as it appears in the data set Following comparison operators can be used to express a condition in the WHERE statement:
Symbol = or eq ^= or ne > or gt < or lt >= or ge <= or le Meaning equal to not equal to greater than less than greater than or equal to less than or equal to Example where name='Jones, C.'; where temp ne 212; where income>20000; where partno lt "BG05"; where id>='1543'; where pulse le 85;
The CONTAINS operator selects observations that include the specified substring.
Example:
WHERE statements can be used to select observations that meet multiple conditions
To link a sequence of expressions into compound expressions, use logical operators, including the following:
Operator AND or & Meaning and, both. If both expressions are true, then the compound expression is true.
OR or |
or, either. If either expression is true, then the compound expression is true.
Example:
1.
Where with proc step proc print data = clinic.admit; var age height weight fee; where age > 30; run;
2.
Where with data step data clinic.admit; set clinic.admit; where age >30 and pulse >55; run;
3.
Some examples using logical operators: where ID>1050 and state='NC'; where actlevel = 'LOW' or actlevel = 'MOD'; where actlevel in ('LOW','MOD'); where fee in (124.80,178.20); where (age<=55 and pulse>75) or area='A';
The IF-THEN statement executes a SAS statement when the condition in the IF clause is true comparison and Logical operators can be used in IF conditional expression Any numeric value other than 0 or missing is true, and a value of 0 or missing is false
Syntax: IF expression THEN statement; [ else IF expression THEN statement; . . else statement; ] Where,
expression is any valid SAS expression statement is any executable SAS statement
Example:
Data clinic.stress; Set clinic.stress; if totaltime > 800 then TestLength = 'Long'; else if 750 <= totaltime <= 800 then TestLength ='Normal'; else if totaltime < 750 then TestLength = 'Short'; Run; Here,
Long is assigned to variable Testlength if totaltime is greater than 800 If first IF expression is not true, the control will check the next expression. If true it will assign and quit the execution If first and second IF statements are not true, the control will come to third expression and assign Short to Testlenght
If Then statement along with Delete option can be used to select observations in a data set and delete
true, the DELETE statement executes, and control returns to the top of the DATA step (the observation is deleted). false, the DELETE statement does not execute, and processing continues with the next statement in the DATA step
Example:
Data clinic.stress; Set clinic.stress; if resthr < 70 then delete; Run; Here,
The IF-THEN and DELETE statements below omit any observations whose values for RestHR are lower than 70
When a long series of mutually exclusive conditions and the comparison is numeric, using a SELECT group is more efficient than using a series of IF-THEN or IFTHEN/ELSE statements because CPU time is reduced SELECT groups also make the program easier to read and debug. For programs with few conditions, use IF-THEN/ELSE statements
Syntax:
SELECT <(select-expression)>; WHEN-1 (when-expression-1 <..., when-expression-n>) statement; WHEN-n (when-expression-1 <..., when-expression-n>) statement; <OTHERWISE statement;> END; Where,
SELECT begins a SELECT group The optional select-expression specifies any SAS expression that evaluates to a single value.
WHEN identifies SAS statements that are executed when a particular condition is true.
When-expression specifies any SAS expression, including a compound expression Must specify at least one when-expression Statement is any executable SAS statement. The optional OTHERWISE statement specifies a statement to be executed if no WHEN condition is met. END ends a SELECT group
Example:
data emps (keep=salary group); set sasuser.payrollmaster; length Group $ 20; select (jobcode); when ("FA1") group="Flight Attendant I"; when ("FA2") group="Flight Attendant II"; when ("FA3") group="Flight Attendant III"; when ("ME1") group="Mechanic I"; when ("ME2") group="Mechanic II"; when ("ME3") group="Mechanic III"; when ("NA1") group="Navigator I"; when ("NA2") group="Navigator II"; when ("NA3") group="Navigator III"; when ("TA1","TA2","TA3") group="Ticket Agents"; otherwise group="Other"; end; run;
The SELECT group assigns values to the variable Group based on values of the variable JobCode
The observation in each data set will stack together according to the order specified to form new data set
Appends the observations from one data set to another data set
Syntax:
Where,
output-SAS-data-set names the data set to be created SAS-data-set-1 and SAS-data-set-2 specify the data sets to be read SAS-data-set-1 and SAS-data-set-2 gets appended and copies to output-SAS-data-set
Example:
The base file gets appended with observations from data file.
No new data set is created Works only if the base file is having all the variables in the data file, otherwise use force option
Syntax:
Where,
SAS-data-set-1 and SAS-data-set-2 specify the data sets to be read SAS-data-set-2 gets appended to SAS-data-set-1999 Force is an optional keyword, used when base file is having some variables missing compared to data file, to force SAS to append
Example:
Merging
A merge combines observations from two or more SAS data sets based on the values of specified common variables (one or more) It creates a new data set (the merged data set) Merging is done in a data step with the statements
MERGE : to name the input data sets BY : to name the common variable(s) to be used for matching
input data sets must have a common variable input data sets must be sorted by the common variable(s)
Syntax:
output-SAS-data-set names the data set to be created SAS-data-set-1 and SAS-data-set-2 specify the data sets to be read variable(s) in the BY statement specifies one or more variables whose values are used to match observations DESCENDING indicates that the input data sets are sorted in descending order by the variable that is specified If there are more than one variable in the BY statement, DESCENDING applies only to the variable that immediately follows it Each input data set in the MERGE statement must be sorted in order of the values of the BY variable(s) Each BY variable must have the same type in all data sets to be merged
Procedure sort can be used to sort the data sets either ascending or descending
Syntax:
Proc Sort Data = Data-Set-1 [out = Data-Set-2]; By [Descending] Variabel1 [Variable2 ]; Run; Here,
Data-Set-1 will be sorted in either ascending or descending order If OUT= option is specified then a Data-Set-1 will be copied to Data-Set-2 and will get sorted there but the original data set (Data-Set-1) remains un sorted. By statement will sort the data set according to the variables specified Descending option will sort the data set in descending order by the variable just proceeding that.
Example:
During match-merging SAS sequentially checks each observation of each data set to see whether the BY values match, then writes the combined observation to the new data set data merged; merge a b; by num; run;
1. Clinic.Demog proc sort data=clinic.demog; by id; run; proc print data=clinic.demog; Obs 1 2 3 4 5 6 ID A001 A002 A003 A004 A005 A007 Age 21 32 24 . 44 39 f m Sex m m f Date 05/22/75 06/15/63 08/17/72 03/27/69 02/24/52 11/11/57
2. Clinic.Visit proc sort data=clinic.visit; by id; run; proc print data=clinic.visit; run; Obs 1 2 3 4 5 6 7 8 9 10 11 ID A001 A001 A001 A002 A003 A003 A004 A005 A005 A005 A008 Visit 1 2 3 1 1 2 1 1 2 3 1 SysBP 140 138 145 121 118 112 143 132 132 134 126 DiasBP 85 90 95 75 68 65 86 76 78 78 80 Weight 195 198 200 168 125 123 204 174 175 176 182 Date 11/05/98 10/13/98 07/04/98 04/14/98 08/12/98 08/21/98 03/30/98 02/27/98 07/11/98 04/16/98 05/22/98
Example: Merging
4
5 6 7 8 9 10 11 12
A002
A003 A003 A004 A005 A005 A005 A007 A008
32
24 24 . 44 44 44 39 .
M
f f
04/14/98
08/12/98 08/21/98 03/30/98
1
1 2 1 1 2 3
121
118 112 143 132 132 134 .
75
68 65 86 76 78 78 . 80
168
125 123 204 174 175 176 . 182
f f f m
126
By default, DATA step match-merging combines all observations in all input data sets
To exclude unmatched observations from output data set, use the IN= data set option and the subsetting IF statement in DATA step.
In this case, use the IN= data set option to create and name a variable that indicates whether the data set contributed data to the current observation
the subsetting IF statement to check the IN= values and to write to the merged data set only those observations that appear in the data sets for which IN= is specified
Syntax:
(IN=variable) Where,
the IN= option, in parentheses, follows the data set name variable names the variable to be created Within the DATA step, the value of the variable is 1 if the data set contributed data to the current observation. Otherwise, its value is 0.
Example:
To Match-merge the data sets Clinic.Demog and Clinic.Visit and select only observations that appear in both data sets :
Use IN= to create two temporary variables, indemog and invisit The first IN= creates the temporary variable indemog, which is set to 1 when an observation from Clinic.Demog contributes to the current observation; otherwise, it is set to 0 Likewise, the value of invisit depends on whether Clinic.Visit contributes to an observation or not IF statement is used to select only observations that appear in both Clinic.Demog and Clinic.Visit If the condition is met, the new observation is written to Clinic.Merged. Otherwise, the observation is deleted data clinic.merged; merge clinic.demog (in= indemog) clinic.visit (in=invisit); by id; if indemog=1 and invisit=1; run; proc print data=clinic.merged; run;
Output:
Obs 1 2 3 4
Age 21 21 21 32
Sex m m m m
Visit 1 2 3 1
DiasBP 85 90 95 75
5
6 7 8 9 10
A003
A003 A004 A005 A005 A005
24
24 . 44 44 44
f
f f f f
08/17/72
08/17/72 03/27/69 02/24/52 02/24/52 02/24/52
1
2 1 1 2 3
118
112 143 132 132 134
68
65 86 76 78 78
125
123 204 174 175 176
08/12/98
08/21/98 03/30/98 02/27/98 07/11/98 04/16/98
Condition
No condition If Y = 1 If X = 1 If X = 1 and Y = 1 If X = 0 or Y = 0 If X = 0 and Y = 1 If X = 1 and Y = 0
Description
Includes all the observations from both the dataset Includes all the observations from right dataset Includes all the observations from left dataset Includes all the matching observations from both datasets Includes all the non matching observations from both datasets Includes all the non matching observations from right dataset Includes all the non matching observations from left dataset