Skip to contents

Kobotoolbox supports questions that can be grouped to be answered repeatedly. A typical use case is when implementing a household survey, with some series of questions for each household member.

Loading data

In Kobotoolbox, this feature is implemented by repeating a group of questions. It uses the concept of repeat group. It works by inserting the questions you want to repeat inside a begin_repeat/end_repeat loop. repeat group can be nested, making it possible to have a group of questions repeated within another repeat group instance. It can be illustrated with the following project and associated form.

Survey questions

type name label::English (en) label::Francais (fr) repeat_count calculation
start start
end end
today today
begin_repeat demo Demographic Characteristics Caracteristique Demographique
text name Name Nom
integer age Age Age
select_one sex sex Sex Sexe
integer hobby How many hobbies does ${name} have? Combien de hobbies ${name} a ?
select_one yesno morelang Does ${name} speak more than one language? Est-ce que ${name} parle plus d’une langue ?
calculate name_individual indexed-repeat(${name}, ${demo}, position(..))
begin_repeat hobbies_list List of Hobbies Liste de hobbies ${hobby}
text hobbies Hobbies of ${name_individual} Hobbies de ${name_individual}
end_repeat
begin_repeat lang_list List of Languages Liste de langues ${morelang}
select_multiple lang langs Languages spoken by ${name_individual} Langue parle par ${name_individual}
end_repeat
end_repeat
calculate family_count count(${demo})
note family_count_note Number of family members: ${family_count} Nombre de membre dans la famille: ${family_count}
begin_repeat education Education information Information sur l’education ${family_count}
calculate name_individual2 indexed-repeat(${name}, ${demo}, position(..))
select_one edu_level edu_level What is ${name_individual2}’s level of education Quel est le niveau d’education de ${name_individual2}
end_repeat

Choices

list_name name label::English (en) label::Francais (fr)
sex 1 Male Homme
sex 2 Female Femme
sex 3 Prefer not to say Prefere ne pas dire
edu_level 1 Primary Primaire
edu_level 2 Secondary Secondaire
edu_level 3 Higher Secondary & Above Lycee et superieur
yesno 1 Yes Oui
yesno 0 No Non
lang 1 French Francais
lang 2 Spanish Espagnol
lang 3 Arabic Arabe
lang 99 Other Autre

Loading the project asset

The above form was uploaded to the server, it’s named nested_roster, and can be loaded from the list of asset asset_list.

library(robotoolbox)
library(dplyr)
asset_list <- kobo_asset_list()
uid <- filter(asset_list, name == "nested_roster") |>
  pull(uid)
asset <- kobo_asset(uid)
asset
#> <robotoolbox asset>  aANhxwX9S6BCsiYMgQj9kV 
#>   Asset name: nested_roster
#>   Asset type: survey
#>   Asset owner: dickoa
#>   Created: 2022-01-05 21:22:50
#>   Last modified: 2022-01-06 15:15:30
#>   Submissions: 3

Extracting the data

We can see that the output is not a standard a data.frame. It’s a listing of each repeat group loop in our form.

df <- kobo_data(asset)
df
#> ── Metadata ────────────────────────────────────────────────────────────────────
#> Tables: `main`, `demo`, `hobbies_list`, `lang_list`, `education`
#> Columns: 52
#> Primary keys: 5
#> Foreign keys: 4

The output is a dm object, from the the dm package.

#> [1] "dm"

Manipulating repeat group as dm object

A dm object is a list of linked data.frame, and it can be manipulated using the dm package.

Visualizing the relationship between tables

In order to understand, how it’s stored we can visualize the schema and relationship between tables (repeat groups loop). The schema can be drawn using the the dm_draw function.

Number of rows of each table

The dm package has a lot of helper functions to manipulated dm object. We can get the number of rows of each table using dm_nrow

dm_nrow(df)
#>         main         demo hobbies_list    lang_list    education 
#>            3            7           14            4            7

A dm object is a list of data.frame

A dm object is a list. As in any list of data.frame, you can extract each table, and analyze it separately. The principal table, where you have the first repeat group is named main.

glimpse(df$main)
#> Rows: 3
#> Columns: 22
#> $ start                <chr> "2022-01-06T15:25:12.023-00:00", "2022-01-06T15:1…
#> $ end                  <chr> "2022-01-06T15:28:36.835-00:00", "2022-01-06T15:2…
#> $ today                <chr> "2022-01-06", "2022-01-06", "2022-01-06"
#> $ family_count         <int> 3, 2, 2
#> $ `_index`             <int> 1, 2, 3
#> $ `_id`                <int> 17727576, 17727538, 17727380
#> $ uuid                 <chr> "ee485fd6655b4e328fdd895ac0451656", "ee485fd6655b…
#> $ demo                 <list> [<data.frame[3 x 10]>], [<data.frame[2 x 9]>], [<…
#> $ education_count      <int> 3, 2, 2
#> $ education            <list> [<data.frame[3 x 2]>], [<data.frame[2 x 2]>], [<d…
#> $ `__version__`        <chr> "vcs3hEpGKxBo8G5uQa94oD", "vcs3hEpGKxBo8G5uQa94oD…
#> $ instanceID           <chr> "uuid:c4723152-3b1c-4f60-ac30-0f3d6505d617", "uui…
#> $ `_xform_id_string`   <chr> "aANhxwX9S6BCsiYMgQj9kV", "aANhxwX9S6BCsiYMgQj9k…
#> $ `_uuid`              <chr> "c4723152-3b1c-4f60-ac30-0f3d6505d617", "06552d3d…
#> $ `_attachments`       <list> <>, <>, <>
#> $ `_status`            <chr> "submitted_via_web", "submitted_via_web", "submi…
#> $ `_geolocation`       <list> <NA, NA>, <NA, NA>, <NA, NA>
#> $ `_submission_time`   <chr> "2022-01-06T15:28:47", "2022-01-06T15:25:23", "20…
#> $ `_tags`              <list> <>, <>, <>
#> $ `_notes`             <list> <>, <>, <>
#> $ `_validation_status` <list> <>, <>, <>
#> $ `_submitted_by`      <lgl> NA, NA, NA

The other tables uses their form names, for example education for the education repeat group.

glimpse(df$education)
#> Rows: 7
#> Columns: 5
#> $ name_individual2     <chr> "Jemelle", "Fatim", "Ali", "Shannon", "Skip", "Ah…
#> $ edu_level            <chr+lbl> "3", "3", "1", "3", "3", "3", "3"
#> $ `_index`             <int> 1, 2, 3, 4, 5, 6, 7
#> $ `_parent_index`      <int> 1, 1, 1, 2, 2, 3, 3
#> $ `_parent_table_name` <chr> "main", "main", "main", "main", "main", "main…

Filtering data

One major advantage of using the dm package, is the ability to dynamically filter tables while keeping the links between them. In this example, filtering the main table will propagate to the education and demo tables. The hobbies_list and lang_list tables are linked to the demo table, and will be filtered too.

df |>
  dm_filter(main = (`_index` == 2)) |>
  dm_nrow()
#>         main         demo hobbies_list    lang_list    education 
#>            1            2            4            0            2

Joining tables

In some cases, it’s simpler to analyze the joined data, dm_flatten_to_tbl can be used for safely joining the data keeping its structure and the links between tables. We can join the education table to the main table, by usign dm_flatten_to_tbl and starting from education.

df |>
  dm_flatten_to_tbl(.start = education,
                    .join = left_join) |>
  glimpse()
#> Rows: 7
#> Columns: 26
#> $ name_individual2     <chr> "Jemelle", "Fatim", "Ali", "Shannon", "Skip", "Ah…
#> $ edu_level            <chr+lbl> "3", "3", "1", "3", "3", "3", "3"
#> $ `_index`             <int> 1, 2, 3, 4, 5, 6, 7
#> $ `_parent_index`      <int> 1, 1, 1, 2, 2, 3, 3
#> $ `_parent_table_name` <chr> "main", "main", "main", "main", "main", "main…
#> $ start                <chr> "2022-01-06T15:25:12.023-00:00", "2022-01-06T15:2…
#> $ end                  <chr> "2022-01-06T15:28:36.835-00:00", "2022-01-06T15:2…
#> $ today                <chr> "2022-01-06", "2022-01-06", "2022-01-06", "2022-0…
#> $ family_count         <int> 3, 3, 3, 2, 2, 2, 2
#> $ `_id`                <int> 17727576, 17727576, 17727576, 17727538, 17727538,…
#> $ uuid                 <chr> "ee485fd6655b4e328fdd895ac0451656", "ee485fd6655b…
#> $ demo                 <list> [<data.frame[3 x 10]>], [<data.frame[3 x 10]>], […
#> $ education_count      <int> 3, 3, 3, 2, 2, 2, 2
#> $ education            <list> [<data.frame[3 x 2]>], [<data.frame[3 x 2]>], [<d…
#> $ `__version__`        <chr> "vcs3hEpGKxBo8G5uQa94oD", "vcs3hEpGKxBo8G5uQa94oD…
#> $ instanceID           <chr> "uuid:c4723152-3b1c-4f60-ac30-0f3d6505d617", "uui…
#> $ `_xform_id_string`   <chr> "aANhxwX9S6BCsiYMgQj9kV", "aANhxwX9S6BCsiYMgQj9k…
#> $ `_uuid`              <chr> "c4723152-3b1c-4f60-ac30-0f3d6505d617", "c4723152…
#> $ `_attachments`       <list> <>, <>, <>, <>, <>, <>, <>
#> $ `_status`            <chr> "submitted_via_web", "submitted_via_web", "submi…
#> $ `_geolocation`       <list> <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>,…
#> $ `_submission_time`   <chr> "2022-01-06T15:28:47", "2022-01-06T15:28:47", "20…
#> $ `_tags`              <list> <>, <>, <>, <>, <>, <>, <>
#> $ `_notes`             <list> <>, <>, <>, <>, <>, <>, <>
#> $ `_validation_status` <list> <>, <>, <>, <>, <>, <>, <>
#> $ `_submitted_by`      <lgl> NA, NA, NA, NA, NA, NA, NA

We can push this logic to have the widest table possible through a cascade of joins from a deeper table (start argument) to the main table. Using hobbies_list as an example, it’ll perform two joins. hobbies_list will be joined to the demo table, and the demo table will be joined to the main table.

df |>
  dm_flatten_to_tbl(.start = hobbies_list,
                    .join = left_join,
                    .recursive = TRUE) |>
  glimpse()
#> Rows: 14
#> Columns: 37
#> $ hobbies                           <chr> "Reading", "MMA", "Origami", "Soccer…
#> $ `_index`                          <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1…
#> $ `_parent_index.hobbies_list`      <int> 1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 6, …
#> $ `_parent_table_name.hobbies_list` <chr> "demo", "demo", "demo", "demo", "dem…
#> $ name                              <chr> "Jemelle", "Jemelle", "Jemelle", "Fa…
#> $ name_individual                   <chr> "Jemelle", "Jemelle", "Jemelle", "Fa…
#> $ age                               <int> 35, 35, 35, 15, 15, 9, 9, 40, 40, 40…
#> $ sex                               <chr+lbl> "2", "2", "2", "2", "2", "1", "1…
#> $ hobby                             <int> 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 1, 2, …
#> $ morelang                          <chr+lbl> "1", "1", "1", "1", "1", "1", "1…
#> $ `_parent_index.demo`              <int> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, …
#> $ hobbies_list_count                <int> 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 1, 2, …
#> $ hobbies_list                      <list> [<data.frame[3 x 1]>], [<data.frame…
#> $ lang_list_count                   <int> 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, …
#> $ lang_list                         <list> [<data.frame[1 x 1]>], [<data.frame…
#> $ `_parent_table_name.demo`         <chr> "main", "main", "main", "main", "mai…
#> $ start                             <chr> "2022-01-06T15:25:12.023-00:00", "20…
#> $ end                               <chr> "2022-01-06T15:28:36.835-00:00", "20…
#> $ today                             <chr> "2022-01-06", "2022-01-06", "2022-01…
#> $ family_count                      <int> 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, …
#> $ `_id`                             <int> 17727576, 17727576, 17727576, 177275…
#> $ uuid                              <chr> "ee485fd6655b4e328fdd895ac0451656", …
#> $ demo                              <list> [<data.frame[3 x 10]>], [<data.fram…
#> $ education_count                   <int> 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, …
#> $ education                         <list> [<data.frame[3 x 2]>], [<data.frame…
#> $ `__version__`                     <chr> "vcs3hEpGKxBo8G5uQa94oD", "vcs3hEpGK…
#> $ instanceID                        <chr> "uuid:c4723152-3b1c-4f60-ac30-0f3d65…
#> $ `_xform_id_string`                <chr> "aANhxwX9S6BCsiYMgQj9kV", "aANhxwX9S…
#> $ `_uuid`                           <chr> "c4723152-3b1c-4f60-ac30-0f3d6505d61…
#> $ `_attachments`                    <list> <>, <>, <>, <>, <>, <>, <>, <>, <>,…
#> $ `_status`                         <chr> "submitted_via_web", "submitted_via_…
#> $ `_geolocation`                    <list> <NA, NA>, <NA, NA>, <NA, NA>, <NA, …
#> $ `_submission_time`                <chr> "2022-01-06T15:28:47", "2022-01-06T1…
#> $ `_tags`                           <list> <>, <>, <>, <>, <>, <>, <>, <>, <>,…
#> $ `_notes`                          <list> <>, <>, <>, <>, <>, <>, <>, <>, <>,…
#> $ `_validation_status`              <list> <>, <>, <>, <>, <>, <>, <>, <>, <>,…
#> $ `_submitted_by`                   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, …

You can learn a lot about the dm package going through its extensive documentation.