Survey data analysis in Stata: Basic introduction step by step

// Note: commands in // or * are comments. Commands in quotes are Stata Syntax.

//By Mr. Rohan Byanjankar

//clear

clear

//importing data

*syntax 

*import excel "<file_path>",sheet("Data") firstrow

import excel "D:\~~~SPSS session\Materials\2077.06.18 SPSS dataset sudal.xlsx", sheet("Data") firstrow

//browse data

//syntax

*browse

browse

//syntax for browse selecting cases

*browse in 1/20

browse in 1/20

//editing data

//syntax

*edit

edit

//labeling or describing variable

//syntax

*label variable <variable_name> "description of variable"

*Note: label variable is STATA command.

label variable Household "Household ID"

//renaming variable

//syntax

*rename <old_variable_name> <new_variable_name>

rename Membersinfamily numberoffamily

//dropping variable

//syntax

*drop <variable_name>

//count

count if Age>=40

//sort dataset in ascending order

//syntax 

*sort <variable_name>

sort Age

//labeling of variables

//syntax

*Step 1: label define गर्ने  ।

*Step 2: label लाई  variable संग टास्ने ।

*label define <label_name> <label_codes> "label_names"

*label values <variable_name> <label_name>

*Note: label define and label values are STATA command.

label define Gender 1"Male" 2"Female"

label values Gender Gender 

//label list <label_name>

label list Gender

//Labeling for Religion

label define Religion 1"Hindiusm" 2"Kirat" 3"Buddhist"

label values Religion Religion

label list Religion

//Labeling for family type

label define Familytype 1"Nuclear" 2"Joint"

label values Familytype Familytype

//Labeling for Education

label def Education 1"Never Attended School" 2"Attended School" 3"SLC" 4"Intermediate" 5"Bachelors" 6"Masters"

label values Education Education

//Same process of labeling other variables

//Recoding variable Age

//syntax

*recode <variable_name> <(range=code "label")>,gen<new_variable_name>

recode Age (10/20=1 "10-20")(20/30=2 "20-30")(30/40=3 "30-40")(40/50=4 "40-50")(50/60=5 "50-60")(60/max=6 "60+"),gen(age_group)

label var age_group "Grouping of Age"

//Producing tables

*Tables are only generated for Categorical variables (nominal and ordinal)

//one-way tables

tab Gender

tab Education

//all one-way tables

tab1 Gender Education

//two-way tables

tab Gender Education

tab Gender Area

//all two-way tables

tab2 Gender Area Gender Education

//two-way table with row percent

tab Gender Education, row

tab Gender Area, row

//two-way table with column percent

tab Gender Education, col

tab Gender Area, col

//two-way table with row percent but no frequency

tab Gender Education, row nofreq

tab Gender Area, row nofreq

//two-way table with column percent but no frequency

tab Gender Education, col nofreq

tab Gender Area, col nofreq

//Summary statistics

*Summary statistics can only be calculated for continuous variable such as income, expenses, profit...

//syntax

*sum <variable_name>

sum Food_today

sum TotalIncome

//Summary statistics with details

sum Food_today, detail

sum TotalIncome, detail

//Table of summary statistics with mean, count max, min, range, sd, median, skewness, kurtosis

tabstat Food_today, statistics(mean count max min range sd median sk kurtosis)

//Table of summary statistics with mean, count max, min, range, sd, median, skewness, kurtosis by Category

tabstat Food_today, statistics(mean count max min range sd median sk kurtosis) by (Gender)

//correlation

*correlation can only be calculated between two numerical variables.

//corr var_1 var_2 ... var_n

corr Food_t TotalIncome

To be continued...

Post a Comment

0 Comments