TODO: Video of a created assignment interview on tablet
use
generate
/ replace
egen rownonmiss
rename
keep
regexm
tempfile
merge
save
reshape wide
reshape long
outsheet
/ export delimited
zipfile
R tools:
* FOLDERSlocal proj_dir "" // folder where all project files are locatedlocal input_dir "`proj_dir'" // folder where input files are found. For example: "`proj_dir'/input/"local output_dir "`proj_dir'" // folder where output files are found. For example: "`proj_dir'/output/"
* FILES* input:local hhold_file_in "households.dta" // household file for this demolocal member_file_in "members.dta" // members file for this demo
* output:local hhold_file_out "cati_panel_round1.tab" * file name = questionnaire variable * to find this name: * - open the questionnaire in Designer * - click on SETTINGS * - copy value from Questionnaire variablelocal number_file_out "numbers.tab" * NUMBERS ROSTER * file name = roster ID in Designerlocal member_file_out "members.tab" * MEMBERS ROSTER * file name = (first) roster ID in Designer
local hhold_file_out "cati_panel_round1.tab" * file name = questionnaire variable * to find this name: * - open the questionnaire in Designer * - click on SETTINGS * - copy value from Questionnaire variable
local number_file_out "numbers.tab" * NUMBERS ROSTER * file name = roster ID in Designer
local member_file_out "members.tab" * MEMBERS ROSTER * file name = (first) roster ID in Designer
use "`input_dir'/`hhold_file_in'", clear* compile all types of implicit and explicit missings* this may require inspection of data and iterative work* use regular expressions (aka regex) to specify patterns* see more on Stata's use of regex here: https://www.stata.com/support/faqs/data-management/regular-expressions/* see more about regex on cheat sheets lik this one: https://cheatography.com/davechild/cheat-sheets/regular-expressions/* note, however, that Stata only allows a subset of regex. see FAQ linked above for more details// same number for all digitslocal numbers = "99[ ]*99[ ]*99[ ]99|88[ ]*88[ ]*88[ ]*88"// "none" written in place of numberlocal none = "[Aa][Uu][Cc][Uu][Nn]"// SuSo's missing value marker for stringslocal missing = "##N/A##"* combine all missing values into a reglar expressionlocal implicit_missings = "`numbers'|`none'|`missing'"local number_vars = "s00q12 s00q14 s00q16 s00q18"foreach number_var of local number_vars { replace `number_var' = "" if regexm(`number_var', "`implicit_missings'")}
* use regular expressions to remove invalid numbers* see more on Stata's use of regular expressions here: https://www.stata.com/support/faqs/data-management/regular-expressions/* step 1: compile valid phone number patterns for all telephone providers* source: International Telecommunications Union (ITU): https://www.itu.int/oth/T0202000083/enlocal sotelma_cdma_bamako = "^20[ ]*79"local sotelma_cdma_regions = "^21 7"local sotelma_fix = "^20[ ]*2|^20[ ]*7[0-8]|^21[ ]*2[67]|^21[ ]*[45689]"local atel_fix = "^40[0-4]"local orange_fix_bamako = "^44[ ]*[239]"local orange_fix_regions = "^44[ ]*1"local atel_mobile = "^50"local sotelma_mobile = "^6|^9[5-9]|^89"local orange_mobile = "^7|^9[0-4]|^8[23]"* step 2: combine all patterns into a single regular expressionlocal valid_number = "`sotelma_cdma_bamako'|`sotelma_cdma_regions'|`sotelma_fix'|`atel_fix'|`orange_fix_bamako'|`orange_fix_regions'|`atel_mobile'|`sotelma_mobile'|`orange_mobile'"* step 3: remove numbers that do not fit these patternslocal number_vars = "s00q12 s00q14 s00q16 s00q18"foreach number_var of local number_vars { replace `number_var' = "" if !regexm(`number_var', "`valid_number'")}
* count the number of valid numbers of the 4 possible numbers providedegen num_valid = rownonmiss(s00q12 s00q14 s00q16 s00q18), strok* keep observations with 1 or more valid contacts (i.e., non-missing numbers)keep if (num_valid > 0)
* apply some selection rule(s) to sample households* keep if mod(_n, 2) == 0 // for example, keep every second observation* keep selected householdstempfile householdssave "`households'", replace* keep members associated with selected householdsuse "`input_dir'/`member_file_in'", clearmerge m:1 hhid using "`households'", /// keep(3) /// keep cases present in both files--that is, members for selected households keepusing(hhid) /// keep no variables nogen noreport /// do not create _merge variable and do not report merge result* keep members for selected householdstempfile memberssave "`members'", replace
* SuSo assignments require a unique ID named interview__id* A simple way to do this is to create a* sequential number for each observationuse "`households'", cleargen interview__id = _ntempfile householdssave "`households'", replace* Add this new ID to each fileuse "`members'", clearmerge m:1 hhid using "`households'", /// merge by hhid keepusing(interview__id) /// add interview__id from households file keep(3) nogen noreport /// do not create _merge; do not report; keep matching cases onlytempfile memberssave "`members'", replace
use "`households'", clear* keep necessary variables onlykeep interview__id s00q10 - s00q18* rename variables so that: * - names and numbers have expressive stub names* - contain indices to facilitate reshapingrename (s00q10 s00q13 s00q15 s00q17) /// (name1 name2 name3 name4)rename (s00q12 s00q14 s00q16 s00q18) /// (number1 number2 number3 number4)* reshape so names and numbers, respectively, * inhabit their own columnsreshape long name@ number@, i(interview__id) j(number_id) * retain rows with non-empty contacts onlykeep if (!mi(name) & !mi(number))
* create variables for preloading* map variables to expected columnsgen number_list = numbergen preload_number = numbergen preload_owner_name = namegen number_owner_txt = name* determine who owns number based on source variable* contacts 1 and 2 are for members* contacts 3 and 4 are for non-membersgen number_member = .replace number_member = 1 if inlist(number_id, 1, 2) replace number_member = 2 if inlist(number_id, 3, 4) * determine relationship to head by source variable* contact 1 is for the household headgen number_rel_mem = .replace number_rel_mem = 1 if (number_id == 1)
* create ID variables* household * interview__id, already in dset, is used by SuSo* contact numbersort interview__id number_idbysort interview__id: gen numbers__id = _n* retain only the necessary variableskeep interview__id numbers__id /// number_list /// preload_number /// preload_owner_name /// number_owner_txt /// number_member /// number_rel_memtempfile numbers_rostersave "`numbers_roster'"
use "`numbers_roster'"keep interview__id numbers__id number_list* decrement each index by 1* list questions are 0-indexed in SuSoreplace numbers__id = numbers__id - 1* rename to match list question naming* list questions are composed of:* - core variable name: var* - index of list element: #* - separator: __* that is, elements of the form: var__#rename number_list number_list__* reshape from long roster to wide list* see image at right for intuitionreshape wide number_list__, i(interview__id) j(numbers__id)tempfile numbers_listsave "`numbers_list'"
use "`members'", clear* Keep only those members that are still part of the household (according to the last survey)* This is most relevant for panel surveys* This may also be applicable for multi-visit cross-sectional surveys* Keep this code only if applicable.keep if still_member == 1
* rename/create variables to match SuSo// panel person IDgen preload_pid = s00q00a// namerename s00q00b s2q1gen s2q1_open = s2q1// sexrename s01q01 s2q5gen preload_sex = s2q5// agerename s01q03 s2q6gen preload_age = s2q6// relationshiprename s01q02 s2q7gen preload_relation = s2q7
* sort by household and person IDs* not stricly needed; just neatersort interview__id s00q00a* create a new sequential person identifier* why?* SuSo needs a person ID that:* - starts with 0* - is sequential* - has no gaps in sequence* This ensures that member list (next slide) is of correct form* Person ID may not be sequential for a few reasons:* - Members may have been dropped (earlier this slide)* - Gaps exist because of data generation processbysort interview__id: generate members__id = _n
keep interview__id members__id /// IDs preload_pid /// panel ID preload_sex /// sex preload_age /// age preload_relation /// relationship s2q1 s2q1_open /// name s2q5 /// sex s2q6 /// age s2q7 /// relationshiptempfile members_rostersave "`members_roster'"
TODO: Consider 3-column design for rename panel:
code
Remove any that have left
use "`members_roster'", clearkeep interview__id members__id s2q1* SuSo's list is 0-indexed* members__id starts with 1* subtract 1 from each member__id so the two matchreplace members__id = members__id - 1* SuSo's list question are formatted as follows:* - each element of the list occupies its own column* - each list element's name consists: of the list question's variable* name, __ as a separator, and list element's index* (e.g. var__0 for the 1st element, var__2 for the 2nd, etc)* To put the data in SuSo's format:* rename the variable to be of the form var__rename s2q1 s2q1__* reshape the data so that each element is a column and each* column has name of the form var__0, var__1, etc.reshape wide s2q1__, i(interview__id) j(members__id)tempfile members_listsave "`members_list'"
TODO: Consider adding a stylized image of the operation May need to revise comment width so that not cut off
use household.dta, clear* rename variables to match SuSorename s00q10 head_namerename urban_rural arearename s00q28 language* keep only those needed by SuSo* since SuSo does not know how to* handle extra variableskeep interview__id hhid head_name area languagetempfile householdssave "`households'", replace
use "`households'", clear* add list questions to household file// members listmerge 1:1 interview__id using "`members_list'", nogen noreport assert(3) keep(3)// numbers listmerge 1:1 interview__id using "`numbers_list'", nogen noreport assert(3) keep(3)tempfile households_plus_listssave "`households_plus_lists'"
* create a data set of following form:* | variable__name |* | --------------- |* | "var1" |* | "var2" |* create empty data set with 2 observationsclearset obs 2* populate those observations with names of* variables whose preloaded values to protect* typically, these are roster triggers* like the list questions belowgen variable__name = ""replace variable__name = "s2q1" in 1replace variable__name = "number_list" in 2tempfile protected_varssave "`protected_vars'"
Imagine there are 3 interviewers:
use "`households_plus_lists'", clear* SuSo assigns based on the user name found in* the _responsible column* Make assignments by matching interviewers to households* that speak language(s) spoken by the interviewer* Below is a simple example. Actual assignment may be more complex.gen _responsible = ""replace _responsible = "Interviewer_1" if inlist(language, 1, 2, 3, 4)replace _responsible = "Interviewer_2" if inlist(language, 5, 6, 7, 8)replace _responsible = "Interviewer_2" if inlist(language, 9, 10, 11)tempfile householdssave "`households'", replace
* check how many interviews are assigned to each interviewertab _responsible* make adjustments, either ad-hoc or based on a rule* replace _responsible ...tempfile householdssave "`households'", replace
* SuSo allows (optionally) to record the audio of an interview* If one wants to record interviews, one mechanism is together* mark the interview as such in the assignment files* To do so, add the _record_audio column to the main file, households in this case* Two values are allowed: 1 (record); 2 (do not record)* Use some rule to determine which interviews will be recordedgen _record_audio = . * for example, every 5th interview in the household data filereplace _record_audio = (mod(_n, 5) == 0)tempfile householdssave "`households'", replace
local expected_vars "interview__id hhid head_name language area number_list__0 s2q1__0"foreach expected_var of local expected_vars { * check whether each expected variable exists in the data set * if so, move to the next one * if not, fail loudly capture confirm variable `expected_var' if (_rc != 0) { di "Variable `expected_var' not found." error 1 }}
local expected_vars "interview__id hhid head_name language area number_list__0 s2q1__0"local expected_types "str double str double double str str"local num_vars: word count `expected_vars'forvalues i = 1/`num_vars' { local expected_var: word `i' of `expected_vars' local expected_type: word `i' of `expected_types' * check whether each expected variable is of the right type * if so, move to the next one * if not, fail loudly capture confirm `expected_type' variable `expected_var' if (_rc != 0) { di "Variable `expected_var' expected as `expected_type', but another type found" error 1 }}
capture assert _responsible != ""if _rc != 0 { qui: count if _responsible == "" local num_miss = r(N) di "All assignments are expected to be assigned. But `num_miss' have not been." error 1}
* number listcapture assert !inlist(number_list__0, "", " ")if _rc != 0 { qui: count if inlist(number_list__0, "", " ") local num_miss = r(N) di "Lists should have non-empty entries. But `num_miss' of number_list have empty/null values."}* member listcapture assert !inlist(s2q1__0, "", " ")if _rc != 0 { qui: count if inlist(s2q1__0, "", " ") local num_miss = r(N) di "Lists should have non-empty entries. But `num_miss' of s2q1__0 have empty/null values."}
* HOUSEHOLD LEVEL* file name = questionnaire variable* to find this name: * - open the questionnaire in Designer* - click on SETTINGS* - copy value from Questionnaire variableuse "`households'", clearoutsheet using cati_panel_round1.tab, /// nolabel /// save values, not labels noquote /// no quotes for strings replace
* NUMBERS ROSTER* file name = roster ID in Designeruse "`numbers'", clearoutsheet using numbers.tab, /// nolabel /// save values, not labels noquote /// no quotes for strings replace
* MEMBERS ROSTER* file name = (first) roster ID in Designeruse "`members'", clearoutsheet using members.tab, /// nolabel /// save values, not labels noquote /// no quotes for strings replace
* VARIABLES TO PROTECT FROM EDITING* file name = name system expects: protected__variablesuse "`protected_vars'", clearoutsheet using protected__variables.tab, noquote replace
TODO: Consider using half of screen to show sources of info in Designer Problem is that code will be a little squished May want a 66/33 L-R breakdown
* change to directory with tab files* cd "your/path/here/"* list all files ending in ".tab"* zip them together* save as "assignments_r1.zip"zipfile "*.tab", saving("assignments_r1.zip", replace)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |