KoboToolbox
enables grouping of questions, allowing them
to be answered multiple times. This feature is particularly useful
during household surveys where a set of questions is designed to be
answered by each member of the household.
Repeating groups are a powerful tool in survey design, offering several advantages:
- Efficiency: A single set of questions can be used for multiple respondents.
- Flexibility: Surveys can accommodate varying numbers of respondents.
- Data consistency: The same questions are asked for each repetition, ensuring uniform data collection.
- Simplified analysis: The structured format facilitates easier data analysis across respondents.
These benefits make repeating groups essential for surveys dealing with multi-member units like households, schools, or organizations.
Loading data
KoboToolbox
implements this feature by incorporating the
concept of repeat group
,
enabling the repetition of a group of questions.
In KoboToolbox forms, begin_repeat
and
end_repeat
are special commands that define the boundaries
of a repeating group:
-
begin_repeat
: Marks the start of a group of questions that can be repeated multiple times. -
end_repeat
: Signals the end of the repeating group.
Any questions placed between these commands will be repeated as a
set, allowing for multiple responses to the same group of questions.
This method involves enclosing the questions intended for repetition
within a begin_repeat
/end_repeat
loop.
Furthermore, repeat group
allows for nesting, thus enabling
the repetition of a question group within another
repeat group
. This concept can be demonstrated using the
project and associated form below.
- Survey questions
type | name | label::English (en) | label::Francais (fr) | repeat_count | calculation |
---|---|---|---|---|---|
start | start | ||||
end | end | ||||
today | today | ||||
begin_repeat | demo | Demographic Characteristics | Caracteristique Demographique | ||
text | name | Name | Nom | ||
integer | age | Age | Age | ||
select_one sex | sex | Sex | Sexe | ||
integer | hobby | How many hobbies does ${name} have? | Combien de hobbies ${name} a ? | ||
select_one yesno | morelang | Does ${name} speak more than one language? | Est-ce que ${name} parle plus d’une langue ? | ||
calculate | name_individual | indexed-repeat(${name}, ${demo}, position(..)) | |||
begin_repeat | hobbies_list | List of Hobbies | Liste de hobbies | ${hobby} | |
text | hobbies | Hobbies of ${name_individual} | Hobbies de ${name_individual} | ||
end_repeat | |||||
begin_repeat | lang_list | List of Languages | Liste de langues | ${morelang} | |
select_multiple lang | langs | Languages spoken by ${name_individual} | Langue parle par ${name_individual} | ||
end_repeat | |||||
end_repeat | |||||
calculate | family_count | count(${demo}) | |||
note | family_count_note | Number of family members: ${family_count} | Nombre de membre dans la famille: ${family_count} | ||
begin_repeat | education | Education information | Information sur l’education | ${family_count} | |
calculate | name_individual2 | indexed-repeat(${name}, ${demo}, position(..)) | |||
select_one edu_level | edu_level | What is ${name_individual2}’s level of education | Quel est le niveau d’education de ${name_individual2} | ||
end_repeat |
- Choices
list_name | name | label::English (en) | label::Francais (fr) |
---|---|---|---|
sex | 1 | Male | Homme |
sex | 2 | Female | Femme |
sex | 3 | Prefer not to say | Prefere ne pas dire |
edu_level | 1 | Primary | Primaire |
edu_level | 2 | Secondary | Secondaire |
edu_level | 3 | Higher Secondary & Above | Lycee et superieur |
yesno | 1 | Yes | Oui |
yesno | 0 | No | Non |
lang | 1 | French | Francais |
lang | 2 | Spanish | Espagnol |
lang | 3 | Arabic | Arabe |
lang | 99 | Other | Autre |
Loading the survey
The aforementioned survey, named nested_roster
, was
uploaded to the server. It can be accessed from the list of asset
asset_list
.
library(robotoolbox)
library(dplyr)
# Retrieve a list of all assets (projects) from your KoboToolbox server
asset_list <- kobo_asset_list()
# Filter the asset list to find the specific project and get its unique identifier (uid)
uid <- filter(asset_list, name == "nested_roster") |>
pull(uid)
# Load the specific asset (project) using its uid
asset <- kobo_asset(uid)
asset
#> <robotoolbox asset> aANhxwX9S6BCsiYMgQj9kV
#> Asset name: nested_roster
#> Asset type: survey
#> Asset owner: dickoa
#> Created: 2022-01-05 21:22:51
#> Last modified: 2023-09-07 07:27:04
#> Submissions: 3
In this code:
kobo_asset_list()
retrieves a list of all assets (projects) available on yourKoboToolbox
server.kobo_asset()
loads a specific asset (project) using its unique identifier (uid
), allowing you to work with that particular project data and metadata.
Extracting the data
The output here deviates from a standard data.frame
. It
consists of a listing of each repeat group
loop present in
our form.
df <- kobo_data(asset)
df
#> ── Metadata ────────────────────────────────────────────────────────────────────
#> Tables: `main`, `demo`, `hobbies_list`, `lang_list`, `education`
#> Columns: 51
#> Primary keys: 5
#> Foreign keys: 4
#> [1] "dm"
The output is a dm
object, sourced from the
dm
package. A dm
object is a collection of
related data frames that preserves the relationships between different
levels of data in repeating groups. It’s particularly useful for
repeating groups because:
It maintains the hierarchical structure of the data, reflecting how repeating groups are nested within the survey.
It allows for efficient storage and manipulation of data from different levels of the survey without losing the relationships between these levels.
It provides tools for working with related tables, making it easier to analyze data across different repeating groups.
Using a dm
object helps preserve the complex structure
of surveys with repeating groups, allowing for more intuitive and
accurate data analysis.
Manipulating repeat group
as dm
object
A dm
object, which is a list of interconnected
data.frame
instances, can be manipulated using the
dm
package.
Visualizing the relationship between tables
To comprehend the data storage structure, we can visualize the
relationships among tables (repeat group loops) and the schema of the
dataset. This schema can be depicted using the dm_draw
function.
This visual representation of table relationships can significantly aid in planning your data analysis strategy and ensuring that you’re working with the data in a way that respects its inherent structure.
Number of rows of each table
The dm
package offers numerous helper functions for
manipulating dm
objects. For instance, the
dm_nrow
function can be used to ascertain the number of
rows in each table.
dm_nrow(df)
#> main demo hobbies_list lang_list education
#> 3 7 14 4 7
A dm
object is a list of data.frame
A dm
object is a list
of
data.frame
. Similar to any list of data.frame
,
you can extract each table (data.frame
), and analyze it
separately. The principal table, where you have the first
repeat group
, is termede as main
.
glimpse(df$main)
#> Rows: 3
#> Columns: 17
#> $ start <dttm> 2022-01-06 15:16:21, 2022-01-06 15:17:18, 2022-0…
#> $ end <dttm> 2022-01-06 15:17:18, 2022-01-06 15:25:11, 2022-0…
#> $ today <date> 2022-01-06, 2022-01-06, 2022-01-06
#> $ `_index` <int> 1, 2, 3
#> $ `_id` <int> 17727380, 17727538, 17727576
#> $ uuid <chr> "ee485fd6655b4e328fdd895ac0451656", "ee485fd6655…
#> $ family_count <dbl> 2, 2, 3
#> $ education_count <dbl> 2, 2, 3
#> $ `__version__` <chr> "vcs3hEpGKxBo8G5uQa94oD", "vcs3hEpGKxBo8G5uQa94oD…
#> $ instanceID <chr> "uuid:c2fbd800-f9d9-4a68-a9da-3917da86c318", "uui…
#> $ `_xform_id_string` <chr> "aANhxwX9S6BCsiYMgQj9kV", "aANhxwX9S6BCsiYMgQj9kV…
#> $ `_uuid` <chr> "c2fbd800-f9d9-4a68-a9da-3917da86c318", "06552d3d…
#> $ `_status` <chr> "submitted_via_web", "submitted_via_web", "submit…
#> $ `_submission_time` <dttm> 2022-01-06 15:17:28, 2022-01-06 15:25:23, 2022-01…
#> $ `_validation_status` <chr> NA, NA, NA
#> $ `_submitted_by` <lgl> NA, NA, NA
#> $ `_attachments` <list> <NULL>, <NULL>, <NULL>
The other tables are named following the names of their associated
repeat groups
. For instance, the education
table is named after the education
repeat group
.
glimpse(df$education)
#> Rows: 7
#> Columns: 6
#> $ name_individual2 <chr> "Ahmad", "Myriam", "Shannon", "Skip", "Jemelle", …
#> $ edu_level <chr+lbl> "3", "3", "3", "3", "3", "3", "1"
#> $ `_index` <int> 1, 2, 3, 4, 5, 6, 7
#> $ `_parent_index` <int> 1, 1, 2, 2, 3, 3, 3
#> $ `_parent_table_name` <chr> "main", "main", "main", "main", "main", "main…
#> $ `_validation_status` <chr> NA, NA, NA, NA, NA, NA, NA
Filtering data
One key benefit of using the dm
package is its
capability to dynamically filter tables while maintaining their
interconnections. For example, filtering the main
table
will automatically extend to the education
and
demo
tables. As the hobbies_list
and
lang_list
tables are linked to the demo
table,
they will be filtered as well.
df |>
dm_filter(main = (`_index` == 2)) |>
dm_nrow()
#> main demo hobbies_list lang_list education
#> 1 2 4 0 2
This approach ensures that your filtered dataset maintains the structural integrity of your survey data, leading to more reliable and consistent analysis results.
Joining tables
In certain instances, analyzing joined data may prove simpler. The
dm_flatten_to_tbl
function can be used to join data safely
while preserving its structure and the connections between tables. We
can merge the education
table with the main
table using the dm_flatten_to_tbl
function, with the
operation starting from education
.
df |>
dm_flatten_to_tbl(.start = education,
.join = left_join) |>
glimpse()
#> Rows: 7
#> Columns: 22
#> $ name_individual2 <chr> "Ahmad", "Myriam", "Shannon", "Skip", "…
#> $ edu_level <chr+lbl> "3", "3", "3", "3", "3", "3", "1"
#> $ `_index` <int> 1, 2, 3, 4, 5, 6, 7
#> $ `_parent_index` <int> 1, 1, 2, 2, 3, 3, 3
#> $ `_parent_table_name` <chr> "main", "main", "main", "main", "ma…
#> $ `_validation_status.education` <chr> NA, NA, NA, NA, NA, NA, NA
#> $ start <dttm> 2022-01-06 15:16:21, 2022-01-06 15:16:2…
#> $ end <dttm> 2022-01-06 15:17:18, 2022-01-06 15:17:1…
#> $ today <date> 2022-01-06, 2022-01-06, 2022-01-06, 202…
#> $ `_id` <int> 17727380, 17727380, 17727538, 17727538,…
#> $ uuid <chr> "ee485fd6655b4e328fdd895ac0451656", "e…
#> $ family_count <dbl> 2, 2, 2, 2, 3, 3, 3
#> $ education_count <dbl> 2, 2, 2, 2, 3, 3, 3
#> $ `__version__` <chr> "vcs3hEpGKxBo8G5uQa94oD", "vcs3hEpGKxB…
#> $ instanceID <chr> "uuid:c2fbd800-f9d9-4a68-a9da-3917da86…
#> $ `_xform_id_string` <chr> "aANhxwX9S6BCsiYMgQj9kV", "aANhxwX9S6BC…
#> $ `_uuid` <chr> "c2fbd800-f9d9-4a68-a9da-3917da86c318",…
#> $ `_status` <chr> "submitted_via_web", "submitted_via_web…
#> $ `_submission_time` <dttm> 2022-01-06 15:17:28, 2022-01-06 15:17:2…
#> $ `_validation_status.main` <chr> NA, NA, NA, NA, NA, NA, NA
#> $ `_submitted_by` <lgl> NA, NA, NA, NA, NA, NA, NA
#> $ `_attachments` <list> <NULL>, <NULL>, <NULL>, <NULL>, <NULL>,…
This logic can be extended to create the widest possible table
through a cascade of joins, commencing from a deeper table
(.start
argument) and ending at the main table. Taking
.start = hobbies_list
as an example, two joins will be
performed: hobbies_list
will be merged with the
demo
table, and subsequently, the demo
table
will be combined with the main
table.
df |>
dm_flatten_to_tbl(.start = hobbies_list,
.join = left_join,
.recursive = TRUE) |>
glimpse()
#> Rows: 14
#> Columns: 32
#> $ hobbies <chr> "Basketball", "Video games", "Karaok…
#> $ `_index` <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1…
#> $ `_parent_index.hobbies_list` <int> 1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, …
#> $ `_parent_table_name.hobbies_list` <chr> "demo", "demo", "demo", "demo", "dem…
#> $ `_validation_status.hobbies_list` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ name <chr> "Ahmad", "Ahmad", "Myriam", "Shannon…
#> $ age <dbl> 30, 30, 20, 40, 40, 40, 65, 35, 35, …
#> $ sex <chr+lbl> "1", "1", "2", "1", "1", "1", "1…
#> $ hobby <dbl> 2, 2, 1, 3, 3, 3, 1, 3, 3, 3, 2, 2, …
#> $ morelang <chr+lbl> "1", "1", "0", "0", "0", "0", "0…
#> $ name_individual <chr> "Ahmad", "Ahmad", "Myriam", "Shannon…
#> $ `_parent_index.demo` <int> 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, …
#> $ hobbies_list_count <dbl> 2, 2, 1, 3, 3, 3, 1, 3, 3, 3, 2, 2, …
#> $ lang_list_count <dbl> 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, …
#> $ `_parent_table_name.demo` <chr> "main", "main", "main", "main", "mai…
#> $ `_validation_status.demo` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ start <dttm> 2022-01-06 15:16:21, 2022-01-06 15:…
#> $ end <dttm> 2022-01-06 15:17:18, 2022-01-06 15:…
#> $ today <date> 2022-01-06, 2022-01-06, 2022-01-06,…
#> $ `_id` <int> 17727380, 17727380, 17727380, 177275…
#> $ uuid <chr> "ee485fd6655b4e328fdd895ac0451656", …
#> $ family_count <dbl> 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, …
#> $ education_count <dbl> 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, …
#> $ `__version__` <chr> "vcs3hEpGKxBo8G5uQa94oD", "vcs3hEpGK…
#> $ instanceID <chr> "uuid:c2fbd800-f9d9-4a68-a9da-3917da…
#> $ `_xform_id_string` <chr> "aANhxwX9S6BCsiYMgQj9kV", "aANhxwX9S…
#> $ `_uuid` <chr> "c2fbd800-f9d9-4a68-a9da-3917da86c31…
#> $ `_status` <chr> "submitted_via_web", "submitted_via_…
#> $ `_submission_time` <dttm> 2022-01-06 15:17:28, 2022-01-06 15:…
#> $ `_validation_status.main` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ `_submitted_by` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ `_attachments` <list> <NULL>, <NULL>, <NULL>, <NULL>, <NU…
Conclusion
The integration of robotoolbox
with the dm
package provides a powerful toolkit for handling complex survey data
with repeating groups from KoboToolbox
. This approach
preserves the hierarchical structure of your data, allows for efficient
manipulation and analysis, and offers flexibility in how you view and
work with your survey results. By maintaining the relationships between
different levels of your survey data, it ensures accurate and meaningful
analyses, from simple filtering to complex joins. Whether you’re dealing
with household surveys, multi-level organizational data, or any other
nested data structure, this workflow offers a robust solution for
managing and analyzing your KoboToolbox
data in
R
.
You can gain extensive knowledge about the dm
package
by going through its detailed documentation.