Study design and participants
Eighty consecutive patients, recruited from August 2016 to June 2017, from the outpatient clinic at the Danish Hospital for Rheumatic Diseases, Sønderborg, Denmark, established the overall FLAre-in-RA (FLARA) cohort. Patients aged ≥ 18 years were eligible if they met the American College of Rheumatology (ACR) 1987 or ACR/EULAR 2010 criteria for RA, were anti-cyclic citrullinated peptide antibody (anti-CCP) and/or rheumatoid-factor (RF) positive, had a Disease Activity Score based on C-reactive protein (DAS28CRP) at baseline < 3.2 and no swollen joints, were on stable disease modifying anti-rheumatic drugs (DMARD) treatment with no intra-articular glucocorticoid injections 4 weeks prior to study entry, and had no contraindications for MRI.
The cohort was followed for 1 year. Patients were requested to contact the outpatient clinic in case of a flare in a hand or wrist (a “hand flare”) accompanied by at least one tender and swollen joint, as perceived by the patient. All 80 patients underwent clinical, laboratory, MRI, and US examinations at baseline. Patients with hand flares were included in the present study and underwent assessments at four extra follow-up visits (FV1-FV4) (Fig. 1). The first immediate follow-up visit (FV1), also called the flare visit, was scheduled within 72 h upon patient’s contact to the hospital. Seven to 10 days later, patients were seen at the second follow-up visit (FV2). Finally, after 2–3 months, the patients were assessed twice with 7 to 10 days interval at the follow-up visits 3 and 4 (FV3 and FV4).
Intra-articular or intra-muscular glucocorticoid injections were only allowed at FV2 and FV4 once clinical evaluation and imaging procedures were completed.
Clinical, biochemical, and patient-reported outcomes
At baseline, information about age, gender, disease duration, and ongoing therapy for RA was collected. The patients were tested for RF and anti-CCP positivity. At each visit, a senior rheumatologist or a trained study nurse, who were blinded to imaging findings and patient-reports, carried out clinical assessment for 28 swollen (SJC28) and tender joints (TJC28). High-sensitive CRP (mg/l) level was determined to calculate DAS28CRP. Patients indicated swollen and tender joints on a mannequin format. Visual analog scales (VAS) were utilized for evaluation of pain, patient’s and evaluator’s global assessments (PGA and EGA, respectively). The Danish version of the Health Assessment Questionnaire (HAQ) was applied to assess physical function. Patient-reported flare was defined by the anchor question “Are you experiencing a flare of your RA at this time?” (yes/no) [17, 18]. Patients filled in the OMERACT (Outcome Measures in Rheumatology) Rheumatoid Arthritis Flare Questionnaire (RA-FQ) [17, 18]. Patients reporting flare rated flare severity on a scale from 0 to 10 and reported flare duration according to four categories: 1–3, 4–7, 8–14 or > 14 days. As the reports were collected at each follow-up visit, the flare duration category may have changed in an individual patient as a consequence of the increasing observation period. Moreover, information about date for hand flare onset and termination was collected.
MRI
An ONI OrthOne 1.0 Tesla (T) MR unit was used for all MRI examinations. At baseline, MRIs of the wrists and bilateral second to fifth metacarpophalangeal (MCP) and proximal interphalangeal (PIP) joints were conducted. At each FV, a unilateral MRI of wrist and hand was repeated on the side of the initial patient-reported flare. In case of flare in both hands, the most affected side, according to the patient, was chosen. Regarding baseline MRIs, only unilateral scans of the side examined at follow-up visits were included in the analyses. A coronal T1-weighted three-dimensional gradient echo sequence (T1w 3D GE), allowing multiplanar reconstruction, before and after gadolinium contrast injection [0.1 mmol gadoteric acid per kg body weight] and a coronal short-tau inversion recovery (STIR) sequence before contrast injection were acquired, following the recommendations of the OMERACT RA MRI Scoring system (RAMRIS) [19].
Parameters of the MRI sequences were as follows: for T1w 3D GE: flip angle 25°, repetition time (TR) 40 milliseconds (ms); echo time (TE) 18 ms, slice thickness (ST) 0.8 mm (mm); matrix 216 × 216, and field of view (FOV) 100 mm; for STIR: flip angle 90°, TR 4100 ms; TE 40 ms, ST 3 mm; matrix 256 × 256, and FOV 160 mm.
Two readers, one experienced (DG) and one newly trained (DK) in RAMRIS scoring, blinded to the chronology, clinical, laboratory, patient-reported outcomes (PROs), and US imaging data, evaluated the MRIs. Scans from the five time points were read simultaneously for inflammatory lesions, i.e., synovitis, bone marrow edema (BME), and tenosynovitis according to the RAMRIS [19, 20]. Synovitis was assessed in three wrist regions (the distal radioulnar joint, the radiocarpal joint, and the intercarpal and carpometacarpal joints) and in the second to fifth MCP and PIP joints, respectively, on a scale 0–3. Tenosynovitis in the wrist was assessed separately at extensor tendon compartment I-VI and three flexor tendon compartments, and in the hands, the second to fifth flexor tendons were assessed at the level of the MCP and PIP joints, respectively [19, 20]. Each bone was scored separately for BME on a scale 0–3 based on the proportion of bone volume affected by BME [19]. Sum scores for MRI synovitis, tenosynovitis, and BME, respectively, were calculated. The average scores from the two readers were used in the analyses.
For intra-reader agreement analysis, scans of five patients from all time points were re-anonymized and rescored.
US examination
A General Electric Logiq E9 US machine with a multifrequency linear array transducer 6-15 was used for all examinations with unchanged Color Doppler (CD) settings throughout the study with Doppler frequency 7.5 MHz, pulse repetition frequency of 0.4 MHz, and Doppler gain just below the noise threshold, for detection of slow flow according to recommendations [21]. US examinations were conducted at baseline and each FV. The US examiner was blinded to the clinical, laboratory, MRI, and patient-reported data. The US protocol included multiplanar scanning of 22 joints/regions: bilateral wrists (radiocarpal, midcarpal, and distal radioulnar joints, dorsal recesses), the first to fifth MCP joints, dorsal recesses, the first interphalangeal (IP) joint, the second to fifth PIP joints, dorsal and volar recesses, extensor tendon compartments I–VI, three flexor tendons/groups (flexor carpi radialis, flexor pollicis longus, and combined flexor digitorum superficialis and profundus), and finger flexors of the second through fifth finger. Finger flexors were evaluated in a manner to assess the whole synovial sheath-covered area of the tendons and were scored only once, without distinction between the level of MCP and PIP. Synovitis and tenosynovitis were defined according to the OMERACT definitions, assessed by CD and gray scale (GS) and graded semi-quantitatively 0–3 [22,23,24,25]. One combined GS and Doppler OMERACT-EULAR score was generated for synovitis and tenosynovitis [23, 24]. The scores from single joints/regions were added into a Global OMERACT-EULAR Synovitis Score (GLOESS) [26]. For the wrist, one single score was used, corresponding to the maximum combined score from any of the joints evaluated at this region. Per analogy, scores from single tendons/tendon compartments were added into a total tenosynovitis score.
Variables for the analyses
Inflammatory lesions by imaging sum scores of synovitis, tenosynovitis, and BME, respectively, on MRI and GLOESS and sum score of tenosynovitis on US, constituted dependent variables. Patient-reported flare was the main explanatory variable. The choice of covariates was based on external evidence on associations with imaging inflammation found in previous studies, i.e., CRP and SJC28 [16, 27], and among PROs pain, PGA and HAQ [28, 29]. Due to the limited sample size, no additional covariates were included.
Statistical analysis
Data is reported as mean (SD) or numbers (%), as appropriate. All outcomes assessed at the follow-up visits were compared to baseline using t tests or Wilcoxon signed-rank tests, depending on the distribution of the data as evaluated by quantile plots. The evolution of inflammatory changes by the two imaging modalities was illustrated by mean plots.
We compared all outcomes at FV3 and FV4 in non-flaring patients, to explore whether the outcomes differed when measured 1 week apart.
Because of the data structure with serial measurements on the same individuals, linear mixed models were utilized, as they explicitly allow for clustering of observations from the same individual and analyzed associations across all time points simultaneously. We included a random effect for each patient in all the models. Our analysis plan was based on three different scenarios: a univariate, a full multivariate model, and a final model after backward selection from the full model.
Primarily, series of univariate models were fitted for each explanatory variable using MRI or US inflammatory markers as the dependent variable. As a second step, all explanatory variables were included in the full multivariate model with age, sex, and disease duration at baseline as possible confounders. Finally, we conducted a backward selection from the full model with a 0.05 p value cutoff to reach the final models. As a sensitivity analysis, flare duration (four categories) was used as an explanatory variable.
The validity of the models’ assumptions was tested, checking variance homogeneity and normality of random effects and residuals by diagnostic plots. Logarithmic transformations of outcomes were applied if appropriate and collinearity was checked for in the multivariate models. As mixed effects regression models take into account missing outcome observations by design, we did not otherwise model missing data.
For MRI reliability assessment, both inter-reader and intra-reader agreement analyses utilized intraclass correlation coefficients (ICCs; two-way mixed effects model, absolute agreement). Moreover, the smallest detectable change (SDC) was calculated for the change in score between baseline and FV1.
Coefficients are reported with 95% confidence interval (95% CI). p values < 0.05 were considered statistically significant. The analyses were carried out using Stata 15.0 (StataCorp, TX, USA).