The ReACCh-Out cohort has been previously described in detail [10, 11]. In brief, 1497 patients newly diagnosed with JIA were recruited at 16 pediatric rheumatology centers across Canada from January 2005 to December 2010. The first visit occurred as soon as possible after diagnosis, but the time from diagnosis to the first visit could be as long as 1 year. Follow-up visits were scheduled every 6 months for 2 years and then yearly up to 5 years, or until May 2012. At each official study visit, full clinical information was collected, including the American College of Rheumatology (ACR) core variables [12], treatment information, and patient-reported outcomes. Erythrocyte sedimentation rate (ESR) and C-reactive protein (CRP) levels were only measured if clinically indicated. At interim clinic visits between study visits, a reduced dataset was collected, including the number of active joints, limited joints or enthesitis sites, treatment information, and ESR and CRP levels if measured. ReACCh-Out was approved by Research Ethics Boards at all participating institutions and performed in accordance with the Declaration of Helsinki, including informed written consent.
The Nordic Cohort recruited 500 patients newly diagnosed with JIA in defined geographical locations of Norway, Sweden, Finland, and Denmark in 1997–2000. First visit occurred approximately 6 months after disease onset, then at 12 months, and then every 1–3 years with an obligatory visit at approximately 8 years after disease onset (available for 440 subjects) [13].
Patients
For the current study, the goal was to select patients recruited in ReACCh-Out who were as similar as possible to the population used for development of the original Nordic prediction models. We considered including only patients with information at the 5-year follow-up, but this would have reduced our sample size considerably. Moreover, since ReACCh-Out did not follow patients into adulthood, many children who entered the cohort as teenagers would have been excluded, resulting in under-representation of JIA categories commonly seen in teenagers. We chose instead to include data of patients recruited within 3 months of diagnosis who had enough information at the 3-year visit to ascertain the outcomes of interest.
Outcomes
Our primary outcome was non-achievement of remission at the 3-year visit. We were not able to use the exact same outcome definition as in the original Nordic study, since the schedule of visits and other features differed between the two cohorts. We designated a primary definition and examined several alternative definitions. The primary definition of remission was clinical inactive disease for at least 12 months while off treatment [14]. We also examined the model’s ability to predict a severe disease course as defined by Guzman et al. [7], based on cluster analysis of changes in pain, health related quality of life, number of active joints, medication requirements, and medication side effects over 5 years.
Clinical inactive disease was defined as no active joints, no active extra-articular manifestations (no enthesitis, uveitis, or systemic manifestations), and a physician global assessment of disease activity (PGA) of < 1 cm in a 10-cm visual analogue scale (VAS). This definition was based on the 2004 Wallace criteria [14] and has been previously used by our group [11, 15]. The main differences relative to the current American College of Rheumatology (ACR) provisional criteria [16] are that a morning stiffness of 15 min or less and normal acute phase reactants were not required.
We defined functional disability as a Childhood Health Assessment Questionnaire (CHAQ) disability index [17] greater than 0 at the 3-year visit. This is the same instrument and cutoff used in the Nordic study, but at a different follow-up time. The Nordic study also developed a model to predict functional disability defined by the Child Health Questionnaire physical summary score [18], but the Canadian cohort did not use that instrument.
Model validation
For each subject in the Canadian cohort, we first computed the probabilities of non-achievement of remission and functional disability, using the Nordic models exactly as published (i.e., with the same intercept and coefficients). We compared this prediction to the observed outcome to assess prediction accuracy (C-index and confidence intervals, details below). If the resulting value was substantially lower than the value originally published in the Nordic cohort, we proceeded to fine-tune the models. Fine-tuning means re-estimation of the model’s intercept and coefficients to better fit a new population, while keeping the same predictors and same logistic regression methods to combine predictors. Intercept and coefficients were re-estimated using multiple splits of the Canadian cohort.
In pre-specified sensitivity analyses, we assessed the ability of the Nordic model to predict alternative definitions of remission, including inactive disease while off treatment (i.e., without requiring 12 months) and inactive disease for > 6 months irrespective of treatment. We also looked at the model’s ability to predict a severe disease course, as defined by Guzman et al [7]. This analysis was not pre-specified. Similar to what was reported in the Nordic cohort [8], we looked at the performance of prediction models that excluded the laboratory variables from the prediction model. Additional post hoc analyses assessed the models’ performance after excluding patients with systemic JIA and in a subsample of patients who attended the 5-year follow-up. Lastly, we examined the prediction ability of a model that included only the active joint count at baseline.
Statistical analysis
All analyses were conducted using R software. The Canadian cohort had an overall 10% missing rate of baseline data. Missing data were imputed in 20 datasets using the method of multiple imputation by chained equations (MICE) [19]. Outcome data was not imputed. Our reported average C-indices and average coefficient estimates are unweighted means across all 20 imputed datasets. We followed Rubin’s rules [20] to compute standard errors (SEs) for all quantities across the 20 imputed datasets.
To validate the original un-tuned Nordic models in Canadian children, we fit each model to 100% of the data within each of 20 imputed datasets. From each dataset, we computed the C-index and the SE of the C-index. We then combined these individual SEs to produce the overall C-index SE.
For the fine-tuned models, we needed to ensure that the model-evaluation statistics were computed on data not used to estimate the coefficients. We followed the procedure published by Jiang et al. [21] and modified it to compute the C-index. For a given imputed dataset, we estimated the average C-index using their recommendation of the Leave-One-Out Cross-Validation (LOOCV) error. To estimate the within-dataset standard error, we used their recommendation of a nested cross-validation within a bootstrap (the BCCV algorithm). We created B = 25 bootstrap samples on an imputed dataset. Within each bootstrap sample, we removed one original observation (if it occurred multiple times in the imputed data, we removed all cases) and predicted this observation using the fitted model. We repeated this process for each observation in turn to obtain predictions on each case. We then computed a C-index on all predicted values of that bootstrap sample. We then computed the standard deviation (SD) of the B = 25 bootstrap sample C-indices as an estimate of the within-dataset SD of the C-index. The between-dataset and within-dataset SDs were combined to produce the overall multiple imputation SE using Rubin’s rules [20].
To obtain SE of coefficients, we fitted the model on each of B = 25 bootstrap samples from each imputed dataset (a total of 500 fits). For each imputed dataset, we estimated the within-dataset SE of the coefficients using the SD of the coefficient estimates from the glm package in R across the 25 bootstrap samples. Again, we combined this with the between-dataset SD to get the overall SE.