Tuesday, 25 March 2014

SAS_ReadingExternalDataset

Q)Name some options used with INFILE and INPUT statement.
INFILE Options
DSD DLM
Missover/Truncover/PAD
INPUT Options
@ and @@


Q)What are the special input delimiters?
The input delimiters are DLM and DSD.


Q)Reading Data from External file. Explain List Vs column Input?

List Input

·         Reading values separated by blanks
Infile ‘C:\...’;
Input Gender $ Age Height Weight;

·         Reading values separated by commas
Infile ‘C:\...’ dsd;
Input Gender $ Age Height Weight;
DSD:
First, it changes the default delimiter from a blank to a comma. Next, if there are two delimiters in a row, it assumes there is a missing value between. Finally, if character values are placed in quotes (single or double quotes), the quotes are stripped from the value.

·         Reading values separated by delimiter other than commas/blanks
Infile ‘C:\...’ dsd dlm=‘:’;

Column Input

·         infile 'c:\books\learning\bank.txt';
input Subj $ 1-3 DOB $ 4-13 Gender $ 14 Balance 15-21;

·         infile 'c:\books\learning\bank.txt';
input @1 Subj $3. @4 DOB mmddyy10. @14 Gender $1. @15 Balance 7.;

Using Informats with List Input

·         data list_example;
infile 'c:\books\learning\list.csv' dsd;
input
Subj : $3.
Name : $20.
DOB : mmddyy10.
Salary : dollar8.;
run;

·         colon (called an informat modifier) tells SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered.
&(Informat modifier) can be used like colon but the delimiter is now 2 or more blanks instead of just one

 

 

Q)Explain Flowover In Case of Column Input
Q)Explain flaws while reading a file with Flowover(default option) in case of Column Input.

When the INPUT stmt reaches the end of non blank chars without having completely filled all variables,
a new line is read into the Input buffer for each unfilled variable and INPUT attempts to fill the unfilled/partially filled variable starting from col one(in case of patially filled variable-new value will overwrite it)

In the below example
1)Record2-JobDesc=Driver..It is partially filled.So next line is read into Input buffer and jobDesc is filled with new value starting from col 1.
2)Record5-Empid and JobDesc ,both are empty.So for Empid a new line is read into Inputbuffer which sets EMPID=WILLY.
Then a newline is read for empty jobdesc which sets JOBDESC=RAYAT


INPUT

Lavender

Debby

ABB11

Technician

WOODS

GUNILA

ABB15

Driver

LEHMAN

DERRYL

ACC21

 

PLANT

THOMAS

ACC25

Technician

GUPTA

RAMAN

 

 

WILLY

TORY

ASS21

Plumber

RAYAT

SATNAM

ASS25

Pilot

 

PROGRAM

DATA test;
INFILE "c:\a.txt";
INPUT Surname $1-22 Firstn $ 23-32
Empid $33-37 Jobdesc $38-47;
RUN;
proc print data = test;
run;


OUTPUT

Lavender

Debby

ABB11

Technician

WOODS

GUNILA

ABB15

Lehman

PLANT

THOMAS

ACC25

Technician

GUPTA

RAMAN

WILLY

RAYAT

 


Q)Explain about INFILE options  Flowover/Missover/Truncover

FLOWOVER

FLOWOVER is the default option on INFILE statement. Here, when the INPUT statement reaches the end of non-blank characters without having filled all variables, a new line is read into the Input Buffer and INPUT attempts to fill the rest of the variables starting from column one. The next time an INPUT statement is executed, a new line is brought into the Input Buffer.

MISSOVER
When INPUT reads a short line, MISSOVER option on INFILE statement does not allow it to move to the next line. MISSOVER option sets all the variables without values to missing.

 

TRUNCOVER

Causes the INPUT statement to read variable-length records where some records are shorter than the INPUT statement expects. Variables which are not assigned values are set to missing.

 

Difference between TRUNCOVER and MISSOVER


Both will assign missing values to variables if the data line ends before the variable’s field starts. But when the data line ends in the middle of a variable field, TRUNCOVER will take as much as is there, whereas MISSOVER will assign the variable a missing value.

 

If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn't have a value?
By using the option MISSOVER in the infile statement.If the input of some data lines are shorter than others then we use TRUNCOVER option in the infile statement

List Input

Column Input

Flowover

Will not work properly in some cases

Will not work properly in some cases

Missover

ok

Will not work properly in some cases

Truncover

ok

ok

PAD

Will not work properly in some cases

ok

 

Q)What will be the output when flowover,missover and truncover is used with below dataset

For example, an external file with variable-length records contains these records:

----+----1----+----2

1

22

333

4444

55555

The following DATA step reads this data to create a SAS data set. Only one of the input records is as long as the informatted length of the variable TESTNUM.

data numbers;

infile 'external-file';

input testnum 5.;

run;

This DATA step creates the three observations from the five input records because by default the FLOWOVER option is used to read the input records.
If you use the MISSOVER option in the INFILE statement, then the DATA step creates five observations.All the values that were read from records that were too short are set to missing. Use the TRUNCOVER option in the INFILE statement if you prefer to see what values were present in records that were too short to satisfy the current INPUT statement.

infile 'external-file' truncover;

 

OBS

FLOWOVER

MISSOVER

TRUNCOVER

 

 

 

 

1

22

.

1

2

4444

.

22

3

55555

.

333

4

 

.

4444

5

55555

55555

 

 

Q)What SAS statements would you code to read an external raw data file to a DATA step?
INFILE and INPUT statement.

Q)What statement you code to tell SAS that it is to write to an external file?What statement do you code to write the record to the file?
PUT and FILE statements.

Q)If reading an external file to produce an external file, what shortcut to write record without coding every single variable on the record?
Ans
: Put _all _

Q)If you're not wanting any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set?
Data _Null_

Q)How do you read in the variables that you need?
Using Input statement with the column pointers like @5/12-17 etc.



Q)Comparison between trailing @ and subsetting IF
When All variables are different then @ and IF may produce same result

Subsetting IF

Trailing @

Output

data temp;

input Gender$ Name$ Age;

if Gender = 'F';

datalines;

M RAM 23

F MEENA 30

M SHAM 42

F SHEENA 48

F LEENA 55;

proc print data=temp;

run;

data temp;

input Gender$ @;

if Gender = 'F';

input Name$ Age;

datalines;

M RAM 23

F MEENA 30

M SHAM 42

F SHEENA 48

F LEENA 55;

proc print data=temp;

run;

 

 

 

 

Obs Gender Name Age

 

1 F MEENA 30

2 F SHEENA 48

3 F LEENA 55

 

 

In case IF stmt total all 15 variables(5 obs * 3 variables)are read to produce output.

In case of @

1)First All gender variable are read=5 variable

2)Gender=’F’ in 3 obs. So 3*2=6 variable are read

3)Total 5+6=11 variable are read to produce the output

 

 

When All variables are not different then @ and IF produce different result

 

Without Trailing @

Output

data temp;

input @1 Gender $1. ;

if Gender = 'F' then input @3 Name $6. @10 Age 2.;

else if Gender = 'M' then input @3 Name $6. @13 Age 2.;

datalines;

M RAM 23

F MEENA 30

M SHAM 42

F SHEENA 48

F LEENA 55;

proc print data=temp;

run;

 

 

 

 

 

 

Obs Gender Name Age

 

1 M MEENA .

2 M SHEENA .

 

This is not expected result!!!

With Trailing @

Output

data temp;

input @1 Gender $1. @;

if Gender = 'F' then input @3 Name $6. @10 Age 2.;

else if Gender = 'M' then input @3 Name $6. @13 Age 2.;

datalines;

M RAM 23

F MEENA 30

M SHAM 42

F SHEENA 48

F LEENA 55;

proc print data=temp;

run;

 

Obs Gender Name Age

1 M RAM 23

2 F MEENA 30

3 M SHAM 42

4 F SHEENA 48

5 F LEENA 55

 

 

 

 

Q)How would you create multiple observations from a single observation?
Q)What is the difference between @ and @@?

Trailing @: Testing a Condition before Creating an Observation

·         By using @ without specifying a column, SAS will hold the line of data until it reaches either the end of the data step or an INPUT statement that does not end with the trailing@.

·         The trailing @ is also known as column pointer By using the trailing@, in the INPUT statement gives you ability to read a part of your raw data line and hold it in the input buffer,while you t test for some specified condition, and then decide how to read additional data from the same record.

Double Trailing @@:Creating Multiple Observations from a Single Record

·         The double trailing @ not only prevents SAS from reading a new record into the input buffer when a new INPUT statement is encountered, but it also prevents the record from being released when the program returns to the top of the DATA step.
The trailing @ does not hold a record in the input buffer across iterations of the DATA step. @@ hold a record across iterations of data step.
SAS will hold the line of data until it reaches either the end of theline or an INPUT statement that does not end with the trailing@@.