+ - 0:00:00
Notes for current slide
Notes for next slide

Creating panel assignments

Recipes for CATI surveys

1 / 37

About assignments

2 / 37

Defined by outputs

  • Video of assignment on tablet
3 / 37

Defined by inputs

  • Set of tab-delimited files
  • Zipped
4 / 37

Defined by choices

  • Which variables to preload
5 / 37

First round

6 / 37

Final product

TODO: Video of a created assignment interview on tablet

  • Assignment on dashboard
  • Preloaded phone numbers
  • Preloaded members

Identifying information

  • Region
  • Household ID

Personal information

  • Phone number(s)
    • Valid number(s)
    • Whose number
  • Members
    • Names
    • Other details (e.g., age, gender, relationship)
7 / 37

Intermediary products

  • Household data
  • List of members
  • Roster of members
  • List of phone numbers
  • Roster of phone numbers
8 / 37

Raw ingredients

  • Household data
    • Location
    • Phone number(s)
  • Member data
    • Name
    • Other details (e.g., age, gender, relationship)
9 / 37

Household data

10 / 37

Member data

11 / 37

Recipe 👨‍🍳

  1. Set up project parameters
  2. Determine which households have valid phone numbers
  3. Draw sample of households
  4. Create interview ID
  5. Create roster of numbers
  6. Create list of numbers
  7. Create roster of people
  8. Create list of people
  9. Create household-level data
  10. Add lists--of numbers and of people--to the household-level data
  11. Create list of variables to protect--the lists
  12. Decide which interviewers get which cases
  13. Decide which interviewsto record
  14. Check your work
  15. Save to tab-delimited files
  16. Zip all tab files
12 / 37

Tools 🔪

  • use
  • generate / replace
  • egen rownonmiss
  • rename
  • keep
  • regexm
  • tempfile
  • merge
  • save
  • reshape wide
  • reshape long
  • outsheet / export delimited
  • zipfile
13 / 37

R tools:

  • haven / readr
  • dplyr
  • stringr
  • tidyr
  • fs
  • zip

Set up project parameters

* FOLDERS
local proj_dir "" // folder where all project files are located
local input_dir "`proj_dir'" // folder where input files are found. For example: "`proj_dir'/input/"
local output_dir "`proj_dir'" // folder where output files are found. For example: "`proj_dir'/output/"
* FILES
* input:
local hhold_file_in "households.dta" // household file for this demo
local member_file_in "members.dta" // members file for this demo
* output:
local hhold_file_out "cati_panel_round1.tab"
* file name = questionnaire variable
* to find this name:
* - open the questionnaire in Designer
* - click on SETTINGS
* - copy value from Questionnaire variable
local number_file_out "numbers.tab"
* NUMBERS ROSTER
* file name = roster ID in Designer
local member_file_out "members.tab"
* MEMBERS ROSTER
* file name = (first) roster ID in Designer
local hhold_file_out "cati_panel_round1.tab"
* file name = questionnaire variable
* to find this name:
* - open the questionnaire in Designer
* - click on SETTINGS
* - copy value from Questionnaire variable

local number_file_out "numbers.tab"
* NUMBERS ROSTER
* file name = roster ID in Designer

local member_file_out "members.tab"
* MEMBERS ROSTER
* file name = (first) roster ID in Designer

14 / 37

See which households have valid contacts

  • Look for missings—explicit and implicit
  • Check that numbers follow known patterns
15 / 37

Look for missings—explicit and implicit

use "`input_dir'/`hhold_file_in'", clear
* compile all types of implicit and explicit missings
* this may require inspection of data and iterative work
* use regular expressions (aka regex) to specify patterns
* see more on Stata's use of regex here: https://www.stata.com/support/faqs/data-management/regular-expressions/
* see more about regex on cheat sheets lik this one: https://cheatography.com/davechild/cheat-sheets/regular-expressions/
* note, however, that Stata only allows a subset of regex. see FAQ linked above for more details
// same number for all digits
local numbers = "99[ ]*99[ ]*99[ ]99|88[ ]*88[ ]*88[ ]*88"
// "none" written in place of number
local none = "[Aa][Uu][Cc][Uu][Nn]"
// SuSo's missing value marker for strings
local missing = "##N/A##"
* combine all missing values into a reglar expression
local implicit_missings = "`numbers'|`none'|`missing'"
local number_vars = "s00q12 s00q14 s00q16 s00q18"
foreach number_var of local number_vars {
replace `number_var' = "" if regexm(`number_var', "`implicit_missings'")
}
16 / 37

Check that numbers follow known patterns

* use regular expressions to remove invalid numbers
* see more on Stata's use of regular expressions here: https://www.stata.com/support/faqs/data-management/regular-expressions/
* step 1: compile valid phone number patterns for all telephone providers
* source: International Telecommunications Union (ITU): https://www.itu.int/oth/T0202000083/en
local sotelma_cdma_bamako = "^20[ ]*79"
local sotelma_cdma_regions = "^21 7"
local sotelma_fix = "^20[ ]*2|^20[ ]*7[0-8]|^21[ ]*2[67]|^21[ ]*[45689]"
local atel_fix = "^40[0-4]"
local orange_fix_bamako = "^44[ ]*[239]"
local orange_fix_regions = "^44[ ]*1"
local atel_mobile = "^50"
local sotelma_mobile = "^6|^9[5-9]|^89"
local orange_mobile = "^7|^9[0-4]|^8[23]"
* step 2: combine all patterns into a single regular expression
local valid_number = "`sotelma_cdma_bamako'|`sotelma_cdma_regions'|`sotelma_fix'|`atel_fix'|`orange_fix_bamako'|`orange_fix_regions'|`atel_mobile'|`sotelma_mobile'|`orange_mobile'"
* step 3: remove numbers that do not fit these patterns
local number_vars = "s00q12 s00q14 s00q16 s00q18"
foreach number_var of local number_vars {
replace `number_var' = "" if !regexm(`number_var', "`valid_number'")
}
17 / 37

Keep households with at least 1 valid contact

* count the number of valid numbers of the 4 possible numbers provided
egen num_valid = rownonmiss(s00q12 s00q14 s00q16 s00q18), strok
* keep observations with 1 or more valid contacts (i.e., non-missing numbers)
keep if (num_valid > 0)
18 / 37

Draw sample of households

* apply some selection rule(s) to sample households
* keep if mod(_n, 2) == 0 // for example, keep every second observation
* keep selected households
tempfile households
save "`households'", replace
* keep members associated with selected households
use "`input_dir'/`member_file_in'", clear
merge m:1 hhid using "`households'", ///
keep(3) /// keep cases present in both files--that is, members for selected households
keepusing(hhid) /// keep no variables
nogen noreport /// do not create _merge variable and do not report merge result
* keep members for selected households
tempfile members
save "`members'", replace
  • Out of the scope of this training
  • Contact your local sampler for guidance
19 / 37

Create new interview ID

* SuSo assignments require a unique ID named interview__id
* A simple way to do this is to create a
* sequential number for each observation
use "`households'", clear
gen interview__id = _n
tempfile households
save "`households'", replace
* Add this new ID to each file
use "`members'", clear
merge m:1 hhid using "`households'", /// merge by hhid
keepusing(interview__id) /// add interview__id from households file
keep(3) nogen noreport /// do not create _merge; do not report; keep matching cases only
tempfile members
save "`members'", replace
20 / 37

Create roster of numbers

use "`households'", clear
* keep necessary variables only
keep interview__id s00q10 - s00q18
* rename variables so that:
* - names and numbers have expressive stub names
* - contain indices to facilitate reshaping
rename (s00q10 s00q13 s00q15 s00q17) ///
(name1 name2 name3 name4)
rename (s00q12 s00q14 s00q16 s00q18) ///
(number1 number2 number3 number4)
* reshape so names and numbers, respectively,
* inhabit their own columns
reshape long name@ number@, i(interview__id) j(number_id)
* retain rows with non-empty contacts only
keep if (!mi(name) & !mi(number))

* create variables for preloading
* map variables to expected columns
gen number_list = number
gen preload_number = number
gen preload_owner_name = name
gen number_owner_txt = name
* determine who owns number based on source variable
* contacts 1 and 2 are for members
* contacts 3 and 4 are for non-members
gen number_member = .
replace number_member = 1 if inlist(number_id, 1, 2)
replace number_member = 2 if inlist(number_id, 3, 4)
* determine relationship to head by source variable
* contact 1 is for the household head
gen number_rel_mem = .
replace number_rel_mem = 1 if (number_id == 1)

* create ID variables
* household
* interview__id, already in dset, is used by SuSo
* contact number
sort interview__id number_id
bysort interview__id: gen numbers__id = _n
* retain only the necessary variables
keep interview__id numbers__id ///
number_list ///
preload_number ///
preload_owner_name ///
number_owner_txt ///
number_member ///
number_rel_mem
tempfile numbers_roster
save "`numbers_roster'"

21 / 37

Create list of numbers

use "`numbers_roster'"
keep interview__id numbers__id number_list
* decrement each index by 1
* list questions are 0-indexed in SuSo
replace numbers__id = numbers__id - 1
* rename to match list question naming
* list questions are composed of:
* - core variable name: var
* - index of list element: #
* - separator: __
* that is, elements of the form: var__#
rename number_list number_list__
* reshape from long roster to wide list
* see image at right for intuition
reshape wide number_list__, i(interview__id) j(numbers__id)
tempfile numbers_list
save "`numbers_list'"

22 / 37

Create roster of people

use "`members'", clear
* Keep only those members that are still part of the household (according to the last survey)
* This is most relevant for panel surveys
* This may also be applicable for multi-visit cross-sectional surveys
* Keep this code only if applicable.
keep if still_member == 1

* rename/create variables to match SuSo
// panel person ID
gen preload_pid = s00q00a
// name
rename s00q00b s2q1
gen s2q1_open = s2q1
// sex
rename s01q01 s2q5
gen preload_sex = s2q5
// age
rename s01q03 s2q6
gen preload_age = s2q6
// relationship
rename s01q02 s2q7
gen preload_relation = s2q7
* sort by household and person IDs
* not stricly needed; just neater
sort interview__id s00q00a
* create a new sequential person identifier
* why?
* SuSo needs a person ID that:
* - starts with 0
* - is sequential
* - has no gaps in sequence
* This ensures that member list (next slide) is of correct form
* Person ID may not be sequential for a few reasons:
* - Members may have been dropped (earlier this slide)
* - Gaps exist because of data generation process
bysort interview__id: generate members__id = _n

keep interview__id members__id /// IDs
preload_pid /// panel ID
preload_sex /// sex
preload_age /// age
preload_relation /// relationship
s2q1 s2q1_open /// name
s2q5 /// sex
s2q6 /// age
s2q7 /// relationship
tempfile members_roster
save "`members_roster'"
23 / 37

TODO: Consider 3-column design for rename panel:

  • data
  • Designer
  • code

  • Remove any that have left

  • Rename variables
  • Recode variables (WHY???)
  • Create sequential ID that starts from 1
  • Preserve panel ID
  • Keep variables

Create list of people

use "`members_roster'", clear
keep interview__id members__id s2q1
* SuSo's list is 0-indexed
* members__id starts with 1
* subtract 1 from each member__id so the two match
replace members__id = members__id - 1
* SuSo's list question are formatted as follows:
* - each element of the list occupies its own column
* - each list element's name consists: of the list question's variable
* name, __ as a separator, and list element's index
* (e.g. var__0 for the 1st element, var__2 for the 2nd, etc)
* To put the data in SuSo's format:
* rename the variable to be of the form var__
rename s2q1 s2q1__
* reshape the data so that each element is a column and each
* column has name of the form var__0, var__1, etc.
reshape wide s2q1__, i(interview__id) j(members__id)
tempfile members_list
save "`members_list'"
24 / 37

TODO: Consider adding a stylized image of the operation May need to revise comment width so that not cut off

Create household-level data

use household.dta, clear
* rename variables to match SuSo
rename s00q10 head_name
rename urban_rural area
rename s00q28 language
* keep only those needed by SuSo
* since SuSo does not know how to
* handle extra variables
keep interview__id hhid head_name area language
tempfile households
save "`households'", replace
25 / 37

Add lists to the household-level data

use "`households'", clear
* add list questions to household file
// members list
merge 1:1 interview__id using "`members_list'", nogen noreport assert(3) keep(3)
// numbers list
merge 1:1 interview__id using "`numbers_list'", nogen noreport assert(3) keep(3)
tempfile households_plus_lists
save "`households_plus_lists'"
26 / 37

Create list of variables to protect

* create a data set of following form:
* | variable__name |
* | --------------- |
* | "var1" |
* | "var2" |
* create empty data set with 2 observations
clear
set obs 2
* populate those observations with names of
* variables whose preloaded values to protect
* typically, these are roster triggers
* like the list questions below
gen variable__name = ""
replace variable__name = "s2q1" in 1
replace variable__name = "number_list" in 2
tempfile protected_vars
save "`protected_vars'"
27 / 37

Decide which interviewers get which cases

Imagine there are 3 interviewers:

  • Interviewer_1
  • Interviewer_2
  • Interviewer_3
  • Interviewer_1 speaks:
    • Bambara/Malinké (1)
    • Peulh/Foulfoulbé (2)
    • Sonhrai (3)
    • Sarakolé (4)
  • Interviewer_2 speaks:
    • Kassonké (5)
    • Sénoufo/Minianka (6)
    • Dogon (7)
    • Maure (8)
  • Interviewer_3 speaks:
    • Tamacheq (9)
    • Bobo / Dafing / Samogo (10)
    • Français (11)
use "`households_plus_lists'", clear
* SuSo assigns based on the user name found in
* the _responsible column
* Make assignments by matching interviewers to households
* that speak language(s) spoken by the interviewer
* Below is a simple example. Actual assignment may be more complex.
gen _responsible = ""
replace _responsible = "Interviewer_1" if inlist(language, 1, 2, 3, 4)
replace _responsible = "Interviewer_2" if inlist(language, 5, 6, 7, 8)
replace _responsible = "Interviewer_2" if inlist(language, 9, 10, 11)
tempfile households
save "`households'", replace
* check how many interviews are assigned to each interviewer
tab _responsible
* make adjustments, either ad-hoc or based on a rule
* replace _responsible ...
tempfile households
save "`households'", replace
28 / 37
  • Decide on workload per interviewer
  • Capture which language(s) interviewers speak
  • Match based on common language(s) spoken
  • Adjust assignments manually

Decide which interviews to record

* SuSo allows (optionally) to record the audio of an interview
* If one wants to record interviews, one mechanism is together
* mark the interview as such in the assignment files
* To do so, add the _record_audio column to the main file, households in this case
* Two values are allowed: 1 (record); 2 (do not record)
* Use some rule to determine which interviews will be recorded
gen _record_audio = .
* for example, every 5th interview in the household data file
replace _record_audio = (mod(_n, 5) == 0)
tempfile households
save "`households'", replace
29 / 37

Check your work

Why

  • Easier to identify and diagnose issues. SuSo warns about issues. Code makes identification faster and remediation easier.
  • Possible to flag subtle problems. SuSo only looks for issues that prevent preloading. Code can look for other issues (e.g., number "none" should have been dropped)
  • Helps write better code. At a minimum, passes tests. Ideally, alo actively avoids problems.

How

  • Expected names. SuSo expects certain variable names. Check them.
  • Expected types. SuSo expects variables be certain types. Check them too.
  • All assigned. SuSo assigns based on _responsible, if present, and assigns the rest to the default responsible. Assignment to the default may not be desirable.
  • Lists complete. SuSo sequential list entries to be non-empty--in particular, the first entry.
local expected_vars "interview__id hhid head_name language area number_list__0 s2q1__0"
foreach expected_var of local expected_vars {
* check whether each expected variable exists in the data set
* if so, move to the next one
* if not, fail loudly
capture confirm variable `expected_var'
if (_rc != 0) {
di "Variable `expected_var' not found."
error 1
}
}
local expected_vars "interview__id hhid head_name language area number_list__0 s2q1__0"
local expected_types "str double str double double str str"
local num_vars: word count `expected_vars'
forvalues i = 1/`num_vars' {
local expected_var: word `i' of `expected_vars'
local expected_type: word `i' of `expected_types'
* check whether each expected variable is of the right type
* if so, move to the next one
* if not, fail loudly
capture confirm `expected_type' variable `expected_var'
if (_rc != 0) {
di "Variable `expected_var' expected as `expected_type', but another type found"
error 1
}
}
capture assert _responsible != ""
if _rc != 0 {
qui: count if _responsible == ""
local num_miss = r(N)
di "All assignments are expected to be assigned. But `num_miss' have not been."
error 1
}
* number list
capture assert !inlist(number_list__0, "", " ")
if _rc != 0 {
qui: count if inlist(number_list__0, "", " ")
local num_miss = r(N)
di "Lists should have non-empty entries. But `num_miss' of number_list have empty/null values."
}
* member list
capture assert !inlist(s2q1__0, "", " ")
if _rc != 0 {
qui: count if inlist(s2q1__0, "", " ")
local num_miss = r(N)
di "Lists should have non-empty entries. But `num_miss' of s2q1__0 have empty/null values."
}
30 / 37

Save to tab-delimited format

* HOUSEHOLD LEVEL
* file name = questionnaire variable
* to find this name:
* - open the questionnaire in Designer
* - click on SETTINGS
* - copy value from Questionnaire variable
use "`households'", clear
outsheet using cati_panel_round1.tab, ///
nolabel /// save values, not labels
noquote /// no quotes for strings
replace

* NUMBERS ROSTER
* file name = roster ID in Designer
use "`numbers'", clear
outsheet using numbers.tab, ///
nolabel /// save values, not labels
noquote /// no quotes for strings
replace

* MEMBERS ROSTER
* file name = (first) roster ID in Designer
use "`members'", clear
outsheet using members.tab, ///
nolabel /// save values, not labels
noquote /// no quotes for strings
replace

* VARIABLES TO PROTECT FROM EDITING
* file name = name system expects: protected__variables
use "`protected_vars'", clear
outsheet using protected__variables.tab, noquote replace
31 / 37

TODO: Consider using half of screen to show sources of info in Designer Problem is that code will be a little squished May want a 66/33 L-R breakdown

Zip all tab files

* change to directory with tab files
* cd "your/path/here/"
* list all files ending in ".tab"
* zip them together
* save as "assignments_r1.zip"
zipfile "*.tab", saving("assignments_r1.zip", replace)
32 / 37

Subsequent rounds

33 / 37

Recipe 1: households that responded

  • Filter to households that reponded
  • Keep new and remaining members
  • Remove useless numbers
  • Create list of members
  • Create list of numbers
  • Create variables to facilitate next call
    • respondent's name
    • preferred number
    • contacted number
    • head's name
    • preferred date and time to contact
  • Add lists and contact details to household data
  • Decide which interviewers get which cases
  • Check your work
  • Decide whether to activate recording
  • Save to tab-delimited format
  • Zip all tab files
34 / 37

Tools

  • use
  • generate / replace
  • rename
  • keep
  • tempfile
  • merge
  • save
  • reshape wide
  • reshape long
  • outsheet / export delimited
  • zipfile
  • haven
  • dplyr
  • stringr
  • tidyr
  • lubridate
  • readr
  • fs
  • zip
35 / 37

Recipe 2: households that didn't respond

  • Decide which households to continue calling
  • Filter to those households
  • Determine when they were last "observed"
    • Responded to the survey
    • Created an assignment
  • Filter past assignments to those households
  • Make adjustments as needed
  • Save to tab-delimited format
  • Zip all tab files
36 / 37

Tools

  • insheet / import delimited
  • generate / replace
  • rename
  • keep
  • tempfile
  • merge
  • append
  • save
  • reshape wide
  • reshape long
  • outsheet / export delimited
  • zipfile
  • readr
  • dplyr
  • fs
  • zip
37 / 37

About assignments

2 / 37
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow