Creating panel assignmentsRecipes for CATI surveys1 / 37

About assignments2 / 37

Defined by outputsVideo of assignment on tablet
3 / 37

Defined by inputsSet of tab-delimited files
Zipped
4 / 37

Defined by choicesWhich variables to preload
5 / 37

First round6 / 37

Final product

TODO: Video of a created assignment interview on tablet

Assignment on dashboard
Preloaded phone numbers
Preloaded members

Identifying information

Region
Household ID

Personal information

Phone number(s)
- Valid number(s)
- Whose number
Members
- Names
- Other details (e.g., age, gender, relationship)

7 / 37

Intermediary productsHousehold data
List of members
Roster of members
List of phone numbers
Roster of phone numbers
8 / 37

Raw ingredientsHousehold dataLocation 
Phone number(s)

Member dataName
Other details (e.g., age, gender, relationship)

9 / 37

Household data

10 / 37

Member data

11 / 37

Recipe 👨‍🍳Set up project parameters
Determine which households have valid phone numbers
Draw sample of households
Create interview ID
Create roster of numbers
Create list of numbers
Create roster of people
Create list of people
Create household-level data
Add lists--of numbers and of people--to the household-level data
Create list of variables to protect--the lists
Decide which interviewers get which cases
Decide which interviewsto record
Check your work
Save to tab-delimited files
Zip all tab files 
12 / 37

Tools 🔪use
generate / replace
egen rownonmiss
rename
keep
regexm
tempfile
merge
save
reshape wide
reshape long
outsheet / export delimited
zipfile
13 / 37

R tools:

haven / readr
dplyr
stringr
tidyr
fs
zip

Set up project parameters


* FOLDERS
local proj_dir      ""              // folder where all project files are located
local input_dir     "`proj_dir'"    // folder where input files are found. For example: "`proj_dir'/input/"
local output_dir    "`proj_dir'"    // folder where output files are found. For example: "`proj_dir'/output/"


* FILES
* input:
local hhold_file_in     "households.dta"    // household file for this demo
local member_file_in    "members.dta"       // members file for this demo


* output:
local hhold_file_out    "cati_panel_round1.tab"
    * file name = questionnaire variable
    * to find this name: 
    * - open the questionnaire in Designer
    * - click on SETTINGS
    * - copy value from Questionnaire variable
local number_file_out   "numbers.tab"
    * NUMBERS ROSTER
    * file name = roster ID in Designer
local member_file_out   "members.tab"
    * MEMBERS ROSTER
    * file name = (first) roster ID in Designer


local hhold_file_out    "cati_panel_round1.tab"
    * file name = questionnaire variable
    * to find this name: 
    * - open the questionnaire in Designer
    * - click on SETTINGS
    * - copy value from Questionnaire variable


local number_file_out   "numbers.tab"
    * NUMBERS ROSTER
    * file name = roster ID in Designer


local member_file_out   "members.tab"
    * MEMBERS ROSTER
    * file name = (first) roster ID in Designer

14 / 37

See which households have valid contactsLook for missings—explicit and implicit
Check that numbers follow known patterns
15 / 37

Look for missings—explicit and implicit

use "`input_dir'/`hhold_file_in'", clear
* compile all types of implicit and explicit missings
* this may require inspection of data and iterative work
* use regular expressions (aka regex) to specify patterns
* see more on Stata's use of regex here: https://www.stata.com/support/faqs/data-management/regular-expressions/
* see more about regex on cheat sheets lik this one: https://cheatography.com/davechild/cheat-sheets/regular-expressions/
* note, however, that Stata only allows a subset of regex. see FAQ linked above for more details
// same number for all digits
local numbers = "99[ ]*99[ ]*99[ ]99|88[ ]*88[ ]*88[ ]*88"
// "none" written in place of number
local none = "[Aa][Uu][Cc][Uu][Nn]"
// SuSo's missing value marker for strings
local missing = "##N/A##"
* combine all missing values into a reglar expression
local implicit_missings = "`numbers'|`none'|`missing'"
local number_vars = "s00q12 s00q14 s00q16 s00q18"
foreach number_var of local number_vars {
    replace `number_var' = "" if regexm(`number_var', "`implicit_missings'")
}

16 / 37

Check that numbers follow known patterns

* use regular expressions to remove invalid numbers
* see more on Stata's use of regular expressions here: https://www.stata.com/support/faqs/data-management/regular-expressions/
* step 1: compile valid phone number patterns for all telephone providers
* source: International Telecommunications Union (ITU): https://www.itu.int/oth/T0202000083/en
local sotelma_cdma_bamako = "^20[ ]*79"
local sotelma_cdma_regions = "^21 7"
local sotelma_fix = "^20[ ]*2|^20[ ]*7[0-8]|^21[ ]*2[67]|^21[ ]*[45689]"
local atel_fix = "^40[0-4]"
local orange_fix_bamako = "^44[ ]*[239]"
local orange_fix_regions = "^44[ ]*1"
local atel_mobile = "^50"
local sotelma_mobile = "^6|^9[5-9]|^89"
local orange_mobile = "^7|^9[0-4]|^8[23]"
* step 2: combine all patterns into a single regular expression
local valid_number = "`sotelma_cdma_bamako'|`sotelma_cdma_regions'|`sotelma_fix'|`atel_fix'|`orange_fix_bamako'|`orange_fix_regions'|`atel_mobile'|`sotelma_mobile'|`orange_mobile'"
* step 3: remove numbers that do not fit these patterns
local number_vars = "s00q12 s00q14 s00q16 s00q18"
foreach number_var of local number_vars {
    replace `number_var' = "" if !regexm(`number_var', "`valid_number'")
}

17 / 37

Keep households with at least 1 valid contact


* count the number of valid numbers of the 4 possible numbers provided
egen num_valid = rownonmiss(s00q12 s00q14 s00q16 s00q18), strok
* keep observations with 1 or more valid contacts (i.e., non-missing numbers)
keep if (num_valid > 0)

18 / 37

Draw sample of households
* apply some selection rule(s) to sample households
* keep if mod(_n, 2) == 0 // for example, keep every second observation
* keep selected households
tempfile households
save "`households'", replace
* keep members associated with selected households
use "`input_dir'/`member_file_in'", clear
merge m:1 hhid using "`households'", ///
    keep(3)         /// keep cases present in both files--that is, members for selected households
    keepusing(hhid) /// keep no variables
    nogen noreport  /// do not create _merge variable and do not report merge result
* keep members for selected households
tempfile members
save "`members'", replace

Out of the scope of this training
Contact your local sampler for guidance

19 / 37

Create new interview ID

* SuSo assignments require a unique ID named interview__id
* A simple way to do this is to create a
* sequential number for each observation
use "`households'", clear
gen interview__id = _n
tempfile households
save "`households'", replace
* Add this new ID to each file
use "`members'", clear
merge m:1 hhid using "`households'",    /// merge by hhid
    keepusing(interview__id)            /// add interview__id from households file
    keep(3) nogen noreport              /// do not create _merge; do not report; keep matching cases only
tempfile members
save "`members'", replace

20 / 37

Create roster of numbers

use "`households'", clear
* keep necessary variables only
keep interview__id s00q10 - s00q18
* rename variables so that: 
* - names and numbers have expressive stub names
* - contain indices to facilitate reshaping
rename  (s00q10 s00q13 s00q15 s00q17) /// 
        (name1 name2 name3 name4)
rename  (s00q12 s00q14 s00q16 s00q18) /// 
        (number1 number2 number3 number4)
* reshape so names and numbers, respectively, 
* inhabit their own columns
reshape long name@ number@, i(interview__id) j(number_id)   
* retain rows with non-empty contacts only
keep if (!mi(name) & !mi(number))


* create variables for preloading
* map variables to expected columns
gen number_list = number
gen preload_number = number
gen preload_owner_name = name
gen number_owner_txt = name
* determine who owns number based on source variable
* contacts 1 and 2 are for members
* contacts 3 and 4 are for non-members
gen number_member = .
replace number_member = 1 if inlist(number_id, 1, 2)    
replace number_member = 2 if inlist(number_id, 3, 4)    
* determine relationship to head by source variable
* contact 1 is for the household head
gen number_rel_mem = .
replace number_rel_mem = 1 if (number_id == 1)


* create ID variables
* household
    * interview__id, already in dset, is used by SuSo
* contact number
sort interview__id number_id
bysort interview__id: gen numbers__id = _n
* retain only the necessary variables
keep interview__id numbers__id ///
    number_list         ///
    preload_number      ///
    preload_owner_name  ///
    number_owner_txt    ///
    number_member       ///
    number_rel_mem
tempfile numbers_roster
save "`numbers_roster'"

21 / 37

Create list of numbers

use "`numbers_roster'"
keep interview__id numbers__id number_list
* decrement each index by 1
* list questions are 0-indexed in SuSo
replace numbers__id = numbers__id - 1
* rename to match list question naming
* list questions are composed of:
* - core variable name: var
* - index of list element: #
* - separator: __
* that is, elements of the form: var__#
rename number_list number_list__
* reshape from long roster to wide list
* see image at right for intuition
reshape wide number_list__, i(interview__id) j(numbers__id)
tempfile numbers_list
save "`numbers_list'"

22 / 37

Create roster of people

use "`members'", clear
* Keep only those members that are still part of the household (according to the last survey)
* This is most relevant for panel surveys
* This may also be applicable for multi-visit cross-sectional surveys
* Keep this code only if applicable.
keep if still_member == 1

* rename/create variables to match SuSo
// panel person ID
gen preload_pid = s00q00a
// name
rename s00q00b s2q1
gen s2q1_open = s2q1
// sex
rename s01q01 s2q5
gen preload_sex = s2q5
// age
rename s01q03 s2q6
gen preload_age = s2q6
// relationship
rename s01q02 s2q7
gen preload_relation = s2q7


* sort by household and person IDs
* not stricly needed; just neater
sort interview__id s00q00a
* create a new sequential person identifier
* why?
* SuSo needs a person ID that:
* - starts with 0
* - is sequential
* - has no gaps in sequence
* This ensures that member list (next slide) is of correct form
* Person ID may not be sequential for a few reasons:
* - Members may have been dropped (earlier this slide)
* - Gaps exist because of data generation process
bysort interview__id: generate members__id = _n


keep interview__id members__id /// IDs
    preload_pid         /// panel ID
    preload_sex         /// sex
    preload_age         /// age
    preload_relation    /// relationship
    s2q1 s2q1_open      /// name
    s2q5                /// sex
    s2q6                /// age
    s2q7                /// relationship
tempfile members_roster
save "`members_roster'"

23 / 37

TODO: Consider 3-column design for rename panel:

data
Designer
code
Remove any that have left
Rename variables
Recode variables (WHY???)
Create sequential ID that starts from 1
Preserve panel ID
Keep variables

Create list of people


use "`members_roster'", clear
keep interview__id members__id s2q1
* SuSo's list is 0-indexed
* members__id starts with 1
* subtract 1 from each member__id so the two match
replace members__id = members__id - 1
* SuSo's list question are formatted as follows:
* - each element of the list occupies its own column
* - each list element's name consists: of the list question's variable
*   name, __ as a separator, and list element's index
*   (e.g. var__0 for the 1st element, var__2 for the 2nd, etc)
* To put the data in SuSo's format:
* rename the variable to be of the form var__
rename s2q1 s2q1__
* reshape the data so that each element is a column and each
* column has name of the form var__0, var__1, etc.
reshape wide s2q1__, i(interview__id) j(members__id)
tempfile members_list
save "`members_list'"

24 / 37

TODO: Consider adding a stylized image of the operation May need to revise comment width so that not cut off

Create household-level data


use household.dta, clear
* rename variables to match SuSo
rename s00q10       head_name
rename urban_rural  area
rename s00q28       language
* keep only those needed by SuSo
* since SuSo does not know how to
* handle extra variables
keep interview__id hhid head_name area language
tempfile households
save "`households'", replace

25 / 37

Add lists to the household-level data


use "`households'", clear
* add list questions to household file
// members list
merge 1:1 interview__id using "`members_list'", nogen noreport assert(3) keep(3)
// numbers list
merge 1:1 interview__id using "`numbers_list'", nogen noreport assert(3) keep(3)
tempfile households_plus_lists
save "`households_plus_lists'"

26 / 37

Create list of variables to protect


* create a data set of following form:
* | variable__name  |
* | --------------- |
* | "var1"          |
* | "var2"          |
* create empty data set with 2 observations
clear
set obs 2
* populate those observations with names of
* variables whose preloaded values to protect
* typically, these are roster triggers
* like the list questions below
gen variable__name = ""
replace variable__name = "s2q1" in 1
replace variable__name = "number_list" in 2
tempfile protected_vars
save "`protected_vars'"

27 / 37

Decide which interviewers get which cases

Imagine there are 3 interviewers:

Interviewer_1
Interviewer_2
Interviewer_3

Interviewer_1 speaks:
- Bambara/Malinké (1)
- Peulh/Foulfoulbé (2)
- Sonhrai (3)
- Sarakolé (4)
Interviewer_2 speaks:
- Kassonké (5)
- Sénoufo/Minianka (6)
- Dogon (7)
- Maure (8)
Interviewer_3 speaks:
- Tamacheq (9)
- Bobo / Dafing / Samogo (10)
- Français (11)


use "`households_plus_lists'", clear
* SuSo assigns based on the user name found in
* the _responsible column
* Make assignments by matching interviewers to households
* that speak language(s) spoken by the interviewer
* Below is a simple example. Actual assignment may be more complex.
gen _responsible = ""
replace _responsible = "Interviewer_1" if inlist(language, 1, 2, 3, 4)
replace _responsible = "Interviewer_2" if inlist(language, 5, 6, 7, 8)
replace _responsible = "Interviewer_2" if inlist(language, 9, 10, 11)
tempfile households
save "`households'", replace


* check how many interviews are assigned to each interviewer
tab _responsible
* make adjustments, either ad-hoc or based on a rule
* replace _responsible ...
tempfile households
save "`households'", replace

28 / 37

Decide on workload per interviewer
Capture which language(s) interviewers speak
Match based on common language(s) spoken
Adjust assignments manually

Decide which interviews to record


* SuSo allows (optionally) to record the audio of an interview
* If one wants to record interviews, one mechanism is together
* mark the interview as such in the assignment files
* To do so, add the _record_audio column to the main file, households in this case
* Two values are allowed: 1 (record); 2 (do not record)
* Use some rule to determine which interviews will be recorded
gen _record_audio = . 
* for example, every 5th interview in the household data file
replace _record_audio = (mod(_n, 5) == 0)
tempfile households
save "`households'", replace

29 / 37

Check your work Overview
Expected names
Expected types
All assigned
Lists complete
Why
Easier to identify and diagnose issues. SuSo warns about issues. Code makes identification faster and remediation easier.
Possible to flag subtle problems. SuSo only looks for issues that prevent preloading. Code can look for other issues (e.g., number "none" should have been dropped)
Helps write better code. At a minimum, passes tests. Ideally, alo actively avoids problems.

How
Expected names. SuSo expects certain variable names. Check them.
Expected types. SuSo expects variables be certain types. Check them too.
All assigned. SuSo assigns based on _responsible, if present, and assigns the rest to the default responsible. Assignment to the default may not be desirable.
Lists complete. SuSo sequential list entries to be non-empty--in particular, the first entry.

local expected_vars "interview__id hhid head_name language area number_list__0 s2q1__0"
foreach expected_var of local expected_vars {
    * check whether each expected variable exists in the data set
    * if so, move to the next one
    * if not, fail loudly
    capture confirm variable `expected_var'
    if (_rc != 0) {
        di "Variable `expected_var' not found."
        error 1
    }
}
local expected_vars "interview__id hhid head_name language area number_list__0 s2q1__0"
local expected_types "str double str double double str str"
local num_vars: word count `expected_vars'
forvalues i = 1/`num_vars' {
    local expected_var: word `i' of `expected_vars'
    local expected_type: word `i' of `expected_types'
    * check whether each expected variable is of the right type
    * if so, move to the next one
    * if not, fail loudly
    capture confirm `expected_type' variable `expected_var'
    if (_rc != 0) {
        di "Variable `expected_var' expected as `expected_type', but another type found"
        error 1
    }
}
capture assert _responsible != ""
if _rc != 0 {
    qui: count if _responsible == ""
    local num_miss = r(N)
    di "All assignments are expected to be assigned. But `num_miss' have not been."
    error 1
}
* number list
capture assert !inlist(number_list__0, "", " ")
if _rc != 0 {
    qui: count if inlist(number_list__0, "", " ")
    local num_miss = r(N)
    di "Lists should have non-empty entries. But `num_miss' of number_list have empty/null values."
}
* member list
capture assert !inlist(s2q1__0, "", " ")
if _rc != 0 {
    qui: count if inlist(s2q1__0, "", " ")
    local num_miss = r(N)
    di "Lists should have non-empty entries. But `num_miss' of s2q1__0 have empty/null values."
}
30 / 37

Save to tab-delimited format


* HOUSEHOLD LEVEL
* file name = questionnaire variable
* to find this name: 
* - open the questionnaire in Designer
* - click on SETTINGS
* - copy value from Questionnaire variable
use "`households'", clear
outsheet using cati_panel_round1.tab, ///
    nolabel /// save values, not labels
    noquote /// no quotes for strings
    replace


* NUMBERS ROSTER
* file name = roster ID in Designer
use "`numbers'", clear
outsheet using numbers.tab, ///
    nolabel /// save values, not labels
    noquote /// no quotes for strings
    replace


* MEMBERS ROSTER
* file name = (first) roster ID in Designer
use "`members'", clear
outsheet using members.tab, ///
    nolabel /// save values, not labels
    noquote /// no quotes for strings
    replace


* VARIABLES TO PROTECT FROM EDITING
* file name = name system expects: protected__variables
use "`protected_vars'", clear
outsheet using protected__variables.tab, noquote replace

31 / 37

TODO: Consider using half of screen to show sources of info in Designer Problem is that code will be a little squished May want a 66/33 L-R breakdown

Zip all tab files


* change to directory with tab files
* cd "your/path/here/"
* list all files ending in ".tab"
* zip them together
* save as "assignments_r1.zip"
zipfile "*.tab", saving("assignments_r1.zip", replace)

32 / 37

Subsequent rounds33 / 37

Recipe 1: households that respondedFilter to households that reponded
Keep new and remaining members
Remove useless numbers
Create list of members
Create list of numbers
Create variables to facilitate next callrespondent's name
preferred number
contacted number
head's name
preferred date and time to contact

Add lists and contact details to household data
Decide which interviewers get which cases
Check your work
Decide whether to activate recording
Save to tab-delimited format
Zip all tab files
34 / 37

Tools

use
generate / replace
rename
keep
tempfile
merge
save
reshape wide
reshape long
outsheet / export delimited
zipfile

haven
dplyr
stringr
tidyr
lubridate
readr
fs
zip

35 / 37

Recipe 2: households that didn't respondDecide which households to continue calling
Filter to those households
Determine when they were last "observed"Responded to the survey
Created an assignment

Filter past assignments to those households
Make adjustments as needed
Save to tab-delimited format
Zip all tab files
36 / 37

Tools

insheet / import delimited
generate / replace
rename
keep
tempfile
merge
append
save
reshape wide
reshape long
outsheet / export delimited
zipfile

readr
dplyr
fs
zip

37 / 37

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help