Tuesday, February 6, 2018

A Note on NSS Data Extraction 1

Extracting Data from NSS 62nd Round
Using STATA:

First we shall try to extract NSS-data for Consumer Expenditure (62_1.0), which is in ASCII-fixed format using STATA (I used STATA 9).

Step 1: Locate the Folder/CD containing data for NSS-Round 62. It should contain three sub-folders, Nss62_1.0 for Consumer expenditure data, Nss62_2.2 for Manufacturing enterprises data, and Nss62_10 for Employment and unemployment data. We are going to use the folder containing data on Consumer expenditure, i.e., Nss62_1.0.

Step 2: The folder Nss62_1.0 should have four items within it, three folders and one text document (README62_1.0). You can find the State Code for the State of interest by following the path:
Nss62_1.0 → Supporting Documents → Instrn. To Field Staff_62 → Appendix-2.

For Uttar Pradesh, the State code is 09 (It has been changed from 25 in NSS 61st Round).
It is also evident from Appendix II that Uttar Pradesh has been divided into four NSS-Regions. So in this way, we would be concerned with code 091 (signifying the 1st NSS-region in Uttar Pradesh) to 094 (indicating 4th NSS-region in Uttar Pradesh). This would be helpful in data extraction.

Step 3: Locate the “layout_62_1.0” file that is an excel file by following path:
Nss62_1.0 → Supporting Documents → Layout_62_1.0.
We shall call it Layout file for convenience. The Layout file shows various “Levels” containing “Blocks” of the Schedule that was administered to gather the information from respondents. You will notice that there are seven Levels in all. Now starts the real job.

Step 4: Now, we need to extract data for each level separately as their contents differ. For example, the variable of Level 1 would not be there, except initial 17 items (up to HHS No.) Thus, we need to write as many Dictionary files as the number of Levels. This makes us write 7 Dictionary files for extracting the Consumer Expenditure data of Uttar Pradesh.

Following is an example of a dictionary file (for Level 1) for STATA:

dictionary using MH1C01.dat
{
_column(1) str3 CntrcdRndshft %3s
_column(4) str5 LOTFSU %5s
_column(9) str1 Filler %1s
_column(10) str2 Round %2s
_column(12) str3 Schdlno %3s
_column(15) str1 Sample %1s
_column(16) str1 Sector %1s
_column(17) str3 StateReg %3s
_column(20) str2 District %2s
_column(22) str2 Strtmno %2s
_column(24) str2 Substrtum %2s
_column(26) str1 SubRund %1s
_column(27) str1 Subsmple %1s
_column(28) str4 FODSubReg %4s
_column(32) str1 Segmntno %1s
_column(33) str1 SecndStg %1s
_column(34) str2 HHSno %2s
_column(36) str2 Level %2s
_column(43) str2 SlnoInfor %2s "Seriel Number of Informant"
_column(45) str1 Respnscd %1s "Response Code"
_column(46) str1 Survycd %1s
_column(47) str1 Substncd %1s "Substitution Code"
_column(48) str6 DoSurvy %6s
_column(54) str6 DoDespch %6s
_column(60) str3 TCanvs %3s
_column(127) str3 Nss %3s
_column(130) str3 Nsc %3s
_column(133) str10 Mlt %10s
}

Step 5: Once all the seven dictionary files have been written, we would store in a folder, say “NSS extract”.

Step 6: Now we need to write a “do-file” for STATA. I’m pasting below the do-file that I’ve written and used to extract the NSS data for all seven Levels at one go.

infile using "E:\NSS extract\Level1.dct", using("E:\Nss62_1.0\Data\MH1C01.TXT")
drop if StateReg > "094"
keep if Level == "01"
destring, replace
save "E:\How to\Level1.dta", replace
clear

infile using "E:\NSS extract\Level2.dct", using("E:\Nss62_1.0\Data\MH1C01.TXT")
drop if StateReg > "094"
keep if Level == "02"
destring, replace
save "E:\How to\Level2.dta", replace
clear

infile using "E:\NSS extract\Level3.dct", using("E:\Nss62_1.0\Data\MH1C01.TXT")
drop if StateReg > "094"
keep if Level == "03"
destring, replace
save "E:\How to\Level3.dta", replace
clear

infile using "E:\NSS extract\Level4.dct", using("E:\Nss62_1.0\Data\MH1C01.TXT")
drop if StateReg > "094"
keep if Level == "04"
destring, replace
save "E:\How to\Level4.dta", replace
clear

infile using "E:\NSS extract\Level5.dct", using("E:\Nss62_1.0\Data\MH1C01.TXT")
drop if StateReg > "094"
keep if Level == "05"
destring, replace
save "E:\How to\Level5.dta", replace
clear

infile using "E:\NSS extract\Level6.dct", using("E:\Nss62_1.0\Data\MH1C01.TXT")
drop if StateReg > "094"
keep if Level == "06"
destring, replace
save "E:\How to\Level6.dta", replace
clear

infile using "E:\NSS extract\Level7.dct", using("E:\Nss62_1.0\Data\MH1C01.TXT")
drop if StateReg > "094"
keep if Level == "07"
destring, replace
save "E:\How to\Level7.dta", replace

Further Explanation regarding the do-file:
“E” is the drive on your PC where you made the folder for storing the extracted data. It can as well be “C” or whatever.

“NSS extract” is the folder where the “dictionary files” have been stored and where the extracted data would be stored.

“E:\Nss62_1.0\Data\MH1C01” is the path for finding the ASCII-fixed format data. It may differ depending on where one has stored the NSS data which is to be extracted.

“drop if StateReg > “”094” is given to drop all the irrelevant data. When we import data it would also be having data for several other States as they were stored in “MH1C01.txt”. We need here only data for Uttar Pradesh. Remember that Uttar Pradesh has NSS-code 09 and it has been divided into four NSS-regions. So, the data of our interest should be that of Region 1, Region 2, Region 3, and Region 4 in Uttar Pradesh. The “Layout file” has a variable by name “State-Region” that we had named as “StateReg” in our dictionary file. This variable would help us to identify the relevant data for Uttar Pradesh. When we give command “drop is StateReg > “”094”, STATA would drop from its memory the data for all other States except Uttar Pradesh. “094” is the maximum coding that can be applied to the regions of Uttar Pradesh under given data. 094 is enclosed within quotation mark as we have extracted all variables in String format.

“Keep if Level == “X”” makes the STATA to further drop data for all levels except the “Level X” where X can be any whole number between 1 to 7, depending on the level of our choice.

“destring, replace” is used to destring all the data. Remember we have imported data in String format.
So, extract data and have fun...!

N.B. I'm obliged to my mentor who bore the brunt of my experiments with NSS-data for I went to him and discussed my hopelessly frustrating strategies to extract the data at one go...!
This is reposted as per request of so many users.