A qualitative study examining the validity and comprehensibility of physical activity items: developed and tested in children with juvenile idiopathic arthritis

Background Not all physical activity (PA) questionnaires (PAQ) gather information regarding PA intensity, duration, and modes and only a few were developed specifically for children. We assessed children’s comprehensibility of items derived from two published PAQs used in children along with three items designed to ascertain PA intensity in order to assess comprehensibility of items and identify response errors. We modified items to create a new PAQ for children (ASCeND). We hypothesized that children would have comprehension difficulties with some original PAQ items and that ASCeND would be easier to comprehend, and would improve recall and reporting of PA. Methods For this qualitative study, we recruited 30 Swedish children [ages 10–16 years; mean age = 13.0 (SD = 1.8)]; median disease activity score = 4.5 (IQR 2.2–9.0); median disease duration = 5.0 (IQR 2.6–10.8) with juvenile idiopathic arthritis (JIA) from a children’s hospital-based rheumatology clinic. We conducted cognitive interviews to identify children’s comprehension of PAQ items. Interviews were audiotaped, transcribed, and independently analyzed. In phase one, 10 children were interviewed and items modified based on feedback. In phase two, an additional 20 children were interviewed to gather more feedback and further refine the modified items, to create the ASCeND. Results The median interview time was 41 min (IQR 36–56). In phase one, 219 comments were generated regarding directions for recording PA duration, and transportation use, walking, dancing, weight-bearing exercise and cardio fitness. Based on feedback we modified the survey layout, clarified directions and collapsed or defined items to reduce redundancy. In phase two, 95 comments were generated. Most comments related to aerobic fitness and strenuous PA. Children had difficulty recalling total walking and other activities per day. Children used the weather on a particular day, sports practice, or gym schedules to recall time performing activities. The most comments regarding comprehension were generated about the 3-item PA intensity survey, suggesting children had problems responding to intensity items. Conclusions The newer layout facilitated recall of directions or efficiency in answering items. The 3-item intensity survey was difficult to answer. Sports-specific items helped children more accurately recall the amount of daily PA. The ASCeND appeared to be easy to answer and to comprehend. Electronic supplementary material The online version of this article (10.1186/s12969-019-0317-6) contains supplementary material, which is available to authorized users.


Background
Inadequate physical activity (PA) is increasingly common in younger people [1,2] and especially among children with chronic disability [3][4][5]. Juvenile Idiopathic Arthritis (JIA) is a systemic autoimmune disease with a prevalence of 0.16% in the Swedish population in children up to the age of 16 years [6]. JIA is traditionally characterized by articular and extra-articular symptoms (ocular, cardiac, pulmonary and hematopoietic) that may impact physical and psychosocial function. PA levels in children with JIA are rarely studied but trends show these children demonstrate lower PA levels than their healthy peers [7][8][9]. These signs of lower PA levels are alarming as physical activity positively impacts health, social engagement and development, and reduces joint symptoms and stress [10]. Additionally, physical activity at a younger age influences later cardiovascular health as indicated by the fact that as children with JIA become adults, they demonstrate a higher prevalence of arterial calcification when compared with healthy peers [11]. At present, interventions to improve PA levels of these children reveal no clear effect on function during activities nor on lifestyle (habitual) PA [10].
Accurate assessment of PA is required to identify actual PA engagement and the impact of interventions at the individual or population level [12]. Self-report measurements are considered feasible for epidemiological studies as they are easy to administer, are low cost, have minimal participant burden and are generally well accepted [13]. However, self-report measures are influenced by recall and response bias and may not capture absolute levels of PA. Direct PA measures (i.e. calorimetry, motion sensors or direct observation) are often considered more capable of precisely estimating energy expenditure and remove the inherent issues of recall and response bias [14]. Despite the greater accuracy of direct measures, they are time and cost intensive, and rely on patient adherence (e.g. wearing a monitor or using a smartphone to collect data) rendering them less useful in epidemiologic settings. In children, adherence can be challenging, as many schools do not allow children to have devices on their person when participating in classes, sports, or other physical activities. As such, no single "gold standard" for assessing and validating PA in children exists today [15].
A number of PA questionnaires exist for children (see Additional file 1) [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30]. Some assess PA during the school year, others assess PA during the school week, some assess modes of activities but not frequency, and others assess frequency and modes but over a 7 day week. Presently, no pediatric measurement exists that adequately captures habitual PA in all its dimensions (intensity, frequency, mode, and duration). Even though the study by Singh-Grewal et al. [31] demonstrated that PA intensity level did not change the outcomes of children with JIA, it is important to measure exercise intensity to determine whether PA levels of these children meet current PA Guidelines. As PA is believed to mediate disease activity and bodily function in JIA [10], there is a need for an accurate, cost-effective and feasible instrument to assess all aspect of PA in these children. This study aimed to evaluate the appropriateness, comprehensibility, and sources of response errors of items derived from two PA questionnaires (PAQ-A [19] and Active-Q [17]) and modified to include three items ascertaining PA intensity when administered to Swedish children with JIA aged 10 to 16 years. We hypothesized that children would have difficulty with the appropriateness and comprehensibility of some original PAQ items and that the new "Activity Scale for Children with Different Abilities"(ASCeND), would be easier to comprehend and would improve recall and description of total PA levels.

Sampling method and recruitment
Institutional approval was obtained for this qualitative study. We recruited consecutive patients with a primary diagnosis of JIA, ages 10 to 16 years, who came to the rheumatology clinic who met our study criteria. This clinic was located in a high-volume, urban, tertiary-care pediatric medical center. We aimed to recruit 30 Swedish children as this sample size has been shown to be effective in studies of survey comprehensibility among children these ages [32][33][34][35]. Children provided assent and their parents provided informed consent. Children did not receive compensation for participation. To obtain equal representation by age and gender, children were purposefully sampled in blocks by age (10 to < 12, 12 to < 14, 14 to 16 years) and gender (Fig. 1). Baseline demographic data were collected from medical records and included type of JIA, medications, disease duration, and disease activity status, assessed by clinical Juvenile Arthritis Disease Activity Score (JADAS-71) [36]. JADAS-71 scores are interpreted as follows for oligoarthritis: ≤ 1 inactive disease; 1.1-2.0 for low disease activity; 2.1-4.2 moderate disease activity and > 4.2 high disease activity; and for polyarthritis: ≤ 1 inactive disease; 1.1-3.8 low disease activity; 3.9-10.5 moderate disease activity and > 10.5 high disease activity [37]. During interviews we also collected self-reported sports and play.
Cognitive interviewing, the gold standard in survey development and assessment, was used to ascertain children's comprehension of items and directions, what they believed the items intended to ask, what processes they used to answer items, and whether they were able to find an answer that fit how they wished to express themselves when answering an item [32]. Children under 10 years of the age were not included due to their limited ability to use lexical comprehension when answering self-reported questionnaires [32]. During interviews, we used both concurrent "think aloud" and retrospective probing to ascertain comprehension of the measures. Research indicates such approaches when delivered in a standardized format by trained researchers, is an effective and unbiased method used to uncover thinking associated with responses to surveys [38].
Prior to the interviews, we extracted items from two previously published PAQs [17,19]. Items extracted from these scales are similar in content, although the layouts, inclusion of intensity, and response sets vary. We included all activity specific items that were consistent across the PAQs as well as unique items from individual PAQs. We then modified the survey directions to ensure activity reporting was not limited to the school week/school year but to the past 7 days. In addition, we included 3 new items regarding amount of physical activity per week stratified by intensity level. Next, an independent person fluent in English and Swedish translated the survey into Swedish. A second person who was also fluent in both English and Swedish independently back translated the Swedish version of the survey to ensure accuracy in translation. Items were then reviewed for discrepancies and changes made based on consensus. We included the 3-item PA intensity survey as this format of ascertaining physical activity has been suggested as a more efficient way of gathering this data. The study was divided into two phases. In phase one, based on the volume of feedback from cognitive interviews of the first 10 children enrolled (mean age = 12.7 years SD = 1.5) items were modified the survey. In the second phase, the next 20 children enrolled (mean age = 13.1 SD = 2.0) provided feedback on the modified survey to refine items and formatting and test the newly developed survey, the Physical Activity Scale for Children with Different Abilities (ASCeND). There were no differences with respect to demographic features of participants in phase one and phase two ( Table 1).

Standardized training of interviewers
An experienced behavioral scientist (MDI), who has extensive experience in the application of cognitive interviews for survey development, created the interview protocol and the specific and general verbal probes [32,33]. The behavioral scientist then trained the interviewer in cognitive interviewing procedures and techniques (AF), until he was deemed proficient in the interview process. This protocol has been utilized in multiple previous studies [33,34,39,40].

Interview procedure
Parents were allowed to remain present during the interview, but were instructed not to respond for the child. The interviewer explained the purpose of the interview and asked a few basic questions about the primary diagnosis and current PA level: (1) "How long have you had your diagnosis?" (2) "What typical physical activities do you do in your spare time?" (3) "Where do you live"? Number of active joints per child Number of active joints in the lower extremity per child Following these introductory questions, each child was given the PAQ and asked to read the directions aloud before answering the survey items. The child was then asked to describe the directions in his or her own words. The interviewer observed the child while the child completed the surveys and recorded whether the child hesitated or demonstrated signs of difficulty completing the items. Next, using standardized and general verbal probes, the interviewer discussed the child's interpretation of the survey's directions, items, and item responses (Table 2) [38,[41][42][43]. All interviews were audiotaped, recorded, and transcribed. Transcripts were then examined and coded based on four content areas: (1) item comprehension (language/jargon/ lexical), (2) information retrieval and recall strategies, (3) decision-making processes, and (4) response mapping to identify how the child's answers aligned with the response sets provided [32,33].

Analysis
Transcripts were read, coded, analyzed independently and synthesized using thematic coding by four research team members (AF, JvH, RN, MDI). The team consisted of behavioral scientist (MDI) and a pediatric orthopedic surgeon (JvH) both experienced in cognitive interviewing and survey development [32][33][34][35], and two physical therapists (AF and RN) who had 3.5 and 2.5 years of experience in pediatrics and arthritis, respectively, to identify problem detection, which included counts of issues arising when the children completed the questionnaires [38,44]. Problematic items were then sorted to identify the specific source of error (comprehension, mapping, response or stem format) and shared with the team members. Additionally, the team reviewed children's comments regarding general and specific directions, survey format, and difficult words using a normative group process. The team came to a consensus on word and format changes to improve survey readability. Based on the data gathered in the cognitive interviews and from specific recommendations by children for rephrasing items, we created a new preliminary survey. The study team reviewed the survey modifications using a normative process and came to a consensus about the changes. Thus, the survey format was revised to enable children to see the directions for the section along with a larger list of PA items.
Once modifications were made to the ASCeND, we tested the survey among the next consecutively recruited 20 children (50% female, mean disease duration = 6.2 years, SD = 4.7) using the same cognitive interviewing and analytic techniques to assess comprehension of the new measure.

Statistical analysis
We used the statistical program SPSS for Mac 24.0 (SPSS Inc., Chicago, IL, www.spss.com) to calculate descriptive statistics and perform inferential tests. Continuous data are presented as means (min-max); categorical data, as frequencies and percentages. We examined demographic data to determine whether significant differences existed between the subjects in Phase One and Phase Two, using a Fisher's exact test, t-tests and Mann-Whitney U tests, as appropriate. We tested for differences in the number of comments mentioned between genders using a Mann Whitney U test and a Kruskal-Wallis test for differences in the number of comments between the three age groups. Additionally, as a preliminary assessment of concurrent validity of PAQ items, we compared the total minutes of self-reported PA from activity-based items (excluding sedentary activity) with the total minutes of PA reported using the 3 PA intensity items among subjects in Phase 2, using spearman rank correlation coefficients.

Results
The mean age of the 30 children was 13.0 years (SD = 1.8) and median disease duration was 5.0 years (IQR 2.6-

Phase one
In phase one, the 10 children made 219 comments about the survey. Transcript data were sorted into the following categories: comprehension of language (e.g. reading level/phrasing/lexical) and medical jargon, item format (content of directions and timeframe, double-barreled items (e.g. asking about more than one symptom in a single item), response set format (terminology), and response mapping (responses available but not considered by the children to be suitable). The most problematic items were: ascertaining PA intensity level (total 32 comments), followed by amount of time performing chores, time spent sitting or lying down, directions for recording amount of time spent engaging in specific activities, time spent watching shows on the computer, phone or other device, and items that asked children to report the amount of time performing an activity for travel or for fun (e.g. walking dog versus walking to school). All children found the survey layout interfered with their ability to recall the directions for the activity specific items. Items requesting the amount of time traveling via various vehicles were collapsed into a general category of traveling by vehicle, regardless of the vehicle type. Suggestions for wording directions were incorporated to clarify what was asked of children. We also provided examples of cardio exercise and weight bearing exercises and removed the general category general exercise.
Item ordering was modified so that general PA items (aerobic fitness, sitting and reading) were placed after specific activities. The change in item ordering addressed the issue of double counting time when reporting on specific and general activities. For example, when asked to explain his answers for cardio-related activities, one child responded, "Mhm, on Mondays its... When we play football during recess, then I do 20-30 min; the same on Wednesday and on Tuesday we had football-practice as well." and then answered the same time-quantities for ball-sports and cardio-activities.
"We modified the phone item", allowing children to decide whether their time on the phone accounts for texting, reading, playing games, or watching shows. Children found everyday tasks to be the most problematic when reporting time. However, activities that were hobbies or joyful seemed easier for them to recount. For example, one child stated, "[…] if I ride (a horse) at the stable then maybe I ride for 45-50 minutes, and I know that because I've been riding for very long. I know about how much I ride, [….].". Whereas another child said, "[…] When you go shopping on your shopping-spree, it is easy to keep track of time. I can be in a store for an hour or so. So it's not hard to be in a store for an hour […]".

Phase two: feedback and refinement of ASCeND
Using an iterative design approach, this revised survey was tested among the remaining 20 consecutively enrolled children. In total 95 comments were generated during these interviews. Table 3 illustrates the comments per item by category (lexical, response format etc.). Issues related primarily to response format, stem format, and comprehension. Items that generated the most comments (5-6 comments each) were aerobics or cardio exercise and the general time spent performing various intensities of PA, (e.g. mild, moderate, and strenuous). To enhance clarity, directions were added across the top of each page of the survey versus only at the beginning of each section. We also added directions to the section on sedentary activities (e.g. sitting, sleeping and lying down) to emphasize the fact that the amount of time spent doing these activities should be tallied regardless of what children were doing in these positions. Of all the activities, squash generated the most confusion, as several children found the word foreign and did not understand its meaning. Squash was therefore removed from the activity list as it is not commonly played in Sweden and due to the fact that the "other racquet-sports" item would likely cover this specific activity. Next, we refined the wording on the three intensity items.
Fitness and aerobic activities remained problematic throughout phase two. For example, one child commented, "I counted that time in biking as it is fitness" and "Aah! I counted that when I was working out [...][...] handball was here". The aerobic item was then moved from beginning of sports-related questions to the end of the survey to reduce duplication in reporting time. Finally, we made sure to move all sports-specific activities to the beginning of the survey to reduce duplication in reporting total PA. For final versions of the ASCeND survey in English and Swedish please refer to Additional files 2 and 3.
We examined the data to determine there were any differences in the total number of comments generated by age or by gender. Based on the results, there were no differences by gender and age group regarding the total number of comments generated during phase two of the interviews (median number of comments for boys and girls was 2; p = 0.85 and number of comments by age group was (ages 10 to < 12 median = 4 (IQR = 1-19), 12 to < 14 median = 1(IQR = 0-4), ages 14 to 16 years median = 2 (IQR = 1-3); p = 0.37). When assessing concurrent validity, we found a moderate but significant correlation between self-reported similar amounts of PA using the activity based PA items and the PA intensity items in the ASCeND (r = 0.43; p = 0.048).

Discussion
This study aimed to evaluate the appropriateness, comprehensibility, and sources of response errors of items obtained from two PAQs [17,19] as well as 3 items inquiring about intensity of PA of when administered to Swedish children with JIA aged 10 to 16 years. We confirmed our hypotheses that children would have difficulty with the appropriateness and comprehensibility of some original PAQ items and that the new PAQ, the Physical Activity Scale for Children (ASCeND), would be more comprehensible and would improve both recall and description of total PA levels. Physical activity engagement includes the frequency, intensity, mode, and duration of activities. Outcomes of PA include metabolic equivalents (defined as the amount of oxygen consumed while sitting at rest and is equal to 3.5 ml O 2 per kg body weight x minutes), amount of time engaged in PA, and intensity of PA. Physical activity can be measured using tracking devices such as pedometers or accelerometers or by PAQs. However, some PAQs ascertain rate of exertion instead of frequency or duration of activities (e.g. Holtebekk [45]). Thus, variability in the measurement of self-reported PA influences concurrent validity testing [3].
The ability of children to recall PA participation is debated [46][47][48]. In a recent study by Ambrust et al. [48] children with JIA ages 8 to 13 years were asked to record their PA using activity diaries while concurrently wearing an accelerometer. The authors found the validity of activity diaries to be low to moderately associated with accelerometer measures of PA; with activity diaries overestimating PA levels. A review by White et al. found that the most reliable, valid, and common recall period for self-report of PA in children with disabilities is 7 days [47]. Many PAQs use a 7-day recall period but collect PA in different ways. Some collect data on PA engagement during a typical week over the past year (ACTI-VE-Q [17]), ask how often you engage in 60 min of PA over a typical or usual week (PACE [28]), collect data on activities during school lunch hour, the frequency of sports before and after school, activity engagement by normal school week and weekend (ASAQ [24]), and frequency of engagement in sports on evenings and weekends (e.g. PAQ-A & PAQ-C [19,20]). The PAQ-C also includes frequency of PA engagement in recess and physical education class, as these are typically part of a younger child's day in school. Our survey, AScEND, uses a 7-day recall period as this appears to be the best format for recall of PA but stipulates that children should report PA over the past 7 days versus week. We believe this is an important distinction to make as children use different anchors for a week (school week-begins on Monday or regular week which starts on Sunday (KOOS-child [33] and pedi-IKDC [39]).
Previous studies have also demonstrated that children's ability to recall PA activities are less specific than adults, perhaps because some of their activity behaviors are less structured (e.g. [play) and/or because the variability in their daily activities is so high [49][50][51]. Some PAQs ask children to write in what activities they engage in and record the time spent on these activities. (7 day PAR [21]). Other PAQs provide a long list of specific sports and leisure activities and ask children to report the frequency they engage in these activities. (3DPAR [16], double-barreled items refer to more than one activity or symptoms in the stem or responsê^l exical refers to words or vocabulary of a language used ACTIVE-Q [17], PAQ-A [19], PAQ-C [20], CLASS [18], AQAA [23], ASAQ [24]). Each of these approaches has strengths and limitations. The ASCeND provides a list of specific activities and asks children to report the frequency of PA engagement over the past 7 days. The modified format of ASCeND helps children report many activities on a page while reducing response burden. The horizontal grid format for each activity was also easier for children to align the response sets with corresponding items. Our results suggest the use of activity specific items were easier for children to answer than the 3 PA intensity items, as noted by the markedly fewer comments on activity specific items compared to intensity items. We can only speculate about the reasons for the differences in amount of PA recorded using activity-specific items versus the 3-item PAQ intensity. We believe children require prompts in order to recall the activities they engage in over time as their activity levels are highly variable [52].
Current recommendations for PA in children emphasize the need to include multiple modes of exercise (strengthening, aerobic, and flexibility) and a total of 60 min of PA per day. Different modes of exercise provide different physiologic benefits, as does the intensity of exercise. Differences in exercise intensity also yield different health benefits (eg. cardiovascular versus strengthening). Thus, understanding the intensity of bicycling would indicate whether the physiologic benefit is strength or aerobic fitness. The use of sports-specific items then may be more beneficial when assessing PA in children.
In follow-up discussions, the study participants were asked if they felt any question was missing or redundant. The first version of the ASCeND used questions based on "the purpose" of the activity (e.g. walking at work, walking for leisure per published measures) and listed activities without a proposed purpose (e.g. walking), as two separate activities. Several adult PA-instruments use this format, such as the IPAQ [53]. Children reported issues of double-barreling their responses during the first phase of the testing. In phase 2, we did not separate activities based on their purpose but rather listed them based on the type of activity. No child expressed difficulty responding to these items in phase-2 testing.
Physical activity in JIA-populations has been studied previously [5,7,9,[54][55][56][57][58][59]. However, the instruments used in these studies have not been specifically developed and tested among children with disabilities. Research indicates that pain can mediate the ability to recall activities [60] and the prevalence of pain is a major differentiator between children with JIA and healthy peers [61][62][63]. The testing of comprehensibility and construct validity are key to improving questionnaires [38]. We based on the ASCeND on well-established items from current questionnaires and ascertained understanding of these items among children with JIA.

Strengths and limitations
This study has several strengths. Children were recruited from a pediatric rheumatology clinic in a large tertiary medical center allowing greater variability in disease severity and activity and possible seasonal variations in self-report of PA. Purposefully sampling children based on age and sex, allowed for equal representation of children across these strata. While no rule exists for sample size in qualitative studies, the number of items, subjective factors, and the internal consistency of the questionnaire are important to aspects consider to reach data saturation [64]. Based on prior studies of this nature, we believe we have a robust number of children included in the trial. Adequate data-quantity and quality were met in both phases with phase 2 requiring a larger dataset to reach data saturation. The interviewer was trained by an experienced behavioral scientist, who has used cognitive interviewing for over 15 years to create and refine patient-reported outcomes. Members of the research team independently coded data and when discrepancies occurred, a normative group process was used to reach consensus.
All interviews were initiated by stating the purpose of the study. A child or adolescent may not naturally question or doubt questions provided by people in authority, to address this potential limitation, the interviewer gradually took a more informal approach to interviews to encourage the children to think and speak freely about the questions throughout the interview process. We focused this study on Swedish children as a PAQ could then potentially be added to the national JIA registry, thus these results may be limited to Swedish children with JIA. Additional limitations include the fact we did not assess the comprehensibility of these PAQ items among healthy children, nor did we test the English version of the ASCeND on native English speaking children, although this is a focus of future studies. We are currently in the process of assessing the convergent validity of the ASCeND with accelerometer estimates of PA in Swedish children with JIA and will be expanding assessment of the ASCeND to other groups of children.

Conclusions
This study highlights the importance of assessing children's comprehension of existing PAQ items, especially among children with JIA. We developed a new questionnaire, the ASCeND, that can be used in Swedish children with JIA to measure their PA. Assessment of PA in JIA is an area that has not been thoroughly researched. Our data indicate there are numerous issues associated with using PAQ intensity items in children, related to the concepts of strenuous, moderate and light intensity activities and that formatting of items in a survey can positively affect children's comprehension of these items. Formatting PAQs to enable easy alignment with response options appears to reduce issues with tracking responses.