Q1)Explain
compilation and execution phase of a SAS data step processing?
Q2)Explain creation/use of PDV/INPUT buffer/Descriptor
portion in the data step.
Q3)When PDV is initialized and reinitialized during
SAS data step processing?
SAS Data Step:  Compile , Execute
and PDV
SAS Data Step is a loop with an automatic output and return(to top) action.
SAS process Data step in 2 phases.
Compile Phase:
- SAS checks syntax of SAS statements. If this check fails, compilation stops and no further execution. Error message will be displayed in SAS log.
- Translates the statements to machine code – to be executed later.
- Input
     Buffer : Input buffer is created when the
     INFILE statement points to an external file.
 SAS Checks INFILE statement and determine various attributes of the file(like Length of each record). Then SAS sets aside an area in memory named Input buffer where it places each observation read from input file.
 While Execution of Data Step:
 Data from raw file is read into Input buffer and then into PDV one observation at a time .
 The data/observation is directly read into PDV, when SAS is reading from a SAS dataset.
- Descriptor
     Portion : SAS determines data type and
     storage length of each variable based on the INPUT and some other
     statements.This information is called descriptor portion of the dataset.
- Program
     data vector : is a logical memory area where
     SAS reads observation one at a time to build the SAS dataset.When a
     program executes:
 Data from raw file is read into Input buffer and then into PDV one observation at a time .
 The data/observation is directly read into PDV, when SAS is reading from a SAS dataset.
 At the end: from PDV, SAS writes the values to a SAS Output data set as a single observation.
 Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_.
- sets up some automatic variables such as:_N_, _ERROR_, END=, IN=, FIRST, LAST, POINT=
Execution Phase:
A simple DATA step iterates once for each observation that is being created. During the Execution Phase, SAS performs the following functions, in this default order:
- begins with the DATA Statement and sets the _N_ variable to 1 (with each new iteration of the DATA Statement, the _N_ variable gets incremented by 1)
- sets variable values to missing in the Program Data Vector
- reads a data record into the input buffer
     with the INPUT statement (if reading a raw file) OR reads a SAS
     observation from a SAS data set into the PDV with one of the following
     statements: SET, MERGE, MODIFY or UPDATE
 The INPUT statement causes SAS to read the first record of raw data into the input buffer. Then according to the instructions in the INPUT statement, SAS reads the data value in the input buffer and assigns them to variables in the PDV.
- SAS executes any subsequent programming statements for the current record.
- At the end of the statements, an output,
     return, and reset occur automatically.
 SAS writes an observation to the Output SAS data set,
 the system automatically returns to the top of the DATA step,
 and the values of variables created by INPUT and assignment statements are reset to missing in the program data vector.
 By default, the values of variables coded in an INPUT statement and user-defined variables (e.g., with an assignment statement) are not retained across executions of the DATA step, unless referenced in a RETAIN statement. Variables whose values are retained include:
- all SAS special automatic variables
- all variables coded in a RETAIN statement
- all variables read with a SET, MERGE or UPDATE statement
- accumulator variables in a SUM statement.
- all actions are repeated per iteration until end of input file (or input data set) is reached
- if the DATA Step includes no reading of input records, the Execution (by default) performs only one iteration
Q4)Does SAS 'Compile' (translate) or does it
'Interpret'?
 Compile
Q5)In
the flow of data step processing, what is the first action in a typical data
step? 
when you submit a data step , SAS process the data
step(compilation and execution-creation of Input buffer/PDV/Descriptor portion)
and then creates a new SAS data set. 
Q6)What
is the use of _N_ and _ERROR_ in SAS?
Automatic variables are created automatically by the DATA step or by DATA
step statements.These variables are added to the program data vector but are
not output to the data set being created.The values of automatic variables are
retained from one iteration of the DATA step to the
next, rather than set to missing.
Two automatic variables are created by every DATA step: _N_ and _ERROR_. 
_N_ : is initially set to 1. Each time the DATA step loops past the DATA statement, the variable _N_ increments by 1. The value of _N_ represents the number of times the DATA step has iterated.
_ERROR_ : is 0 by default but is set to 1 whenever an error is encountered, such as an input data error, a conversion error, or a math error, as in division by 0 or a floating point overflow. You can use the value of this variable to help locate errors in data records and to print an error message to the SAS log.
Q7) What is the function of output statement?
Placing an explicit OUTPUT statement in a DATA step overrides the automatic
output(default way), so that observations are added to
a data set only when the explicit OUTPUT statement is executed.
Q8)What
does the RUN statement do?
 When SAS editor looks at Run it
starts processing(compile then execute) the data or
proc step,
you can avoid the usage of the run statement if your data step/proc step is
followed by another data/proc step.
Q9) Name statements
that are recognized at compile time only and execution time only.
Corresponding to the two phases of the DATA step, there are two types of
DATA step statements: compile time statements and execution time statements. 
Some examples of execution time statements are:
• Assignment statements (variable = value;).
• If-then/else statements.
• Do loops.
• Generally any statement which relies upon the values of variables stored in
the PDV.
Some examples of compile time statements are:
• Retain statement.
• Array declarations.
• Drop statement.
• Keep statement.
• Rename statement.
Q10)Name
statements that function at both compile and execution time?
File, INFILE,INPUT
Q11)Identify
statements whose placement in the DATA step is critical.
DATA, INPUT, RUN(if there are no other data/proc
step).
Q12) What is the difference between reading the data from
external file and reading the data from existing data set?
The main difference is that while reading an existing data set with the SET
statement, SAS retains the values of the variables from one observation to the
next.
 
No comments:
Post a Comment