Predicting persistent inflammatory arthritis amongst early arthritis clinic patients in the UK: is musculoskeletal ultrasound required?

Introduction Analyses of large clinical datasets from early arthritis cohorts permit the development of algorithms that may be used for outcome prediction in individual patients. The value added by routine use of musculoskeletal ultrasound (MSUS) in an early arthritis setting, as a component of such predictive algorithms, remains to be determined. Methods The authors undertook a retrospective analysis of a large, true-to-life, observational inception cohort of early arthritis patients in Newcastle upon Tyne, UK, which included patients with inflammatory arthralgia but no clinically swollen joints. A pragmatic, 10-minute MSUS assessment protocol was developed, and applied to each of these patients at baseline. Logistic regression was used to develop two "risk metrics" that predicted the development of a persistent inflammatory arthritis (PIA), with or without the inclusion of MSUS parameters. Results A total of 379 enrolled patients were assigned definitive diagnoses after ≥12 months follow-up (median 28 months), of whom 162 (42%) developed a persistent inflammatory arthritis. A risk metric derived from 12 baseline clinical and serological parameters alone had an excellent discriminatory utility with respect to an outcome of PIA (area under receiver operator characteristic (ROC) curve 0.91; 95% CI 0.88 to 0.94). The discriminatory utility of a similar metric, which incorporated MSUS parameters, was not significantly superior (area under ROC curve 0.91; 95% CI 0.89 to 0.94). Neither did this approach identify an added value of MSUS over the use of routine clinical parameters in an algorithm for discriminating PIA patients whose outcome diagnosis was rheumatoid arthritis (RA). Conclusions MSUS use as a routine component of assessment in an early arthritis clinic did not add substantial discriminatory value to a risk metric for predicting PIA.


Introduction
National and international guidelines increasingly emphasise the importance of early diagnosis in the management of new onset inflammatory arthritis and support the establishment of dedicated early arthritis (EA) clinics [1,2]. However, despite new classification criteria for rheumatoid arthritis (RA) [3,4], a substantial proportion of EA clinic attendees have un-classifiable disease, and are labelled as having undifferentiated arthritis (UA) [5,6].
Analyses of large clinical datasets from early arthritis cohorts permit the development of algorithms that may be used for outcome prediction in individuals -an approach which yielded a validated "prediction rule" for use in UA patients, in which a range of baseline clinical and laboratory parameters are weighted and combined to yield a score that relates to RA progression risk [7][8][9]. It has been suggested that every bit as clinically important as identifying individuals specifically destined for RA is the more general goal of predicting persistent inflammatory arthritis (PIA) amongst EA patients as a whole [10][11][12]. Hence, a predictive tool applicable to all EA clinic attendees, which distinguishes chronic inflammatory from non-inflammatory/self-limiting disease, could accelerate access to disease-modifying antirheumatic drugs (DMARDs) for those most likely to benefit from them. Musculoskeletal ultrasound has shown promise as an evaluation tool in the setting of early arthritis [13,14], but the value it adds to a thorough clinical assessment, for example, as a component of a predictive algorithm, remains to be quantified.
The Newcastle EA Clinic accepts patients clinically suspected of having new-onset inflammatory arthritis by their referring physician [15]. Blood test results and/or the presence of clinically inflamed joints are not required at the time of referral, ensuring inclusion of an important group of patients with new-onset inflammatory arthralgia into the resultant cohort. Such patients typically describe joint pain with morning stiffness, but have no clinically inflamed/swollen joints on examination. In this observational cohort study, we used the principles adopted by van der Helm-van Mil et al. [16] to construct a predictive algorithm for PIA amongst EA clinic attendees. We then asked whether the addition of the most predictive element(s) of a short, pragmatic MSUS screening protocol improved its predictive utility.

Subjects and data collection
Consenting patients ≥16 years of age and presenting with new-onset arthralgia to the Newcastle EA clinic between September 2006 and November 2009 were included into the study. Detailed baseline demographic and clinical parameters were recorded during the patients' first EA clinic visit by an experienced specialist nurse, at which time routine blood tests included acute phase markers and autoantibodies. A "joint pattern score" (JPS) between 0 and 2.5 was also recorded for each patient, reflecting the localisation and distribution of symptomatic joints at presentation according to the system described by van der Helm-van Mil et al. [16]. An initial diagnosis was assigned to each patient by their consulting rheumatologist according to a "working diagnosis proforma" (Table 1) [15]. RA was diagnosed only where 1987 ACR classification criteria [17] were fulfilled; UA was defined as a "suspected inflammatory arthritis where RA remained a possibility, but where established classification criteria for any rheumatological condition remained unmet". This initial diagnosis was updated by the rheumatologist at each subsequent clinic visit for the duration of the study, which was greater than 12 months for all patients. A knowledge of outcome diagnoses was used to further categorise patients according to whether or not they developed a persistent inflammatory arthritis (PIA versus non-PIA). Hence, RA, psoriatic arthritis, enteropathic arthritis, ankylosing spondylitis, undifferentiated spondyloarthropathy, connective tissue disease and other inflammatory arthritides constituted PIA outcomes. A subset of individuals assigned the "self-limiting inflammatory/reactive arthritis" outcome, who had definite reactive arthritis warranting DMARD treatment, were also included in the PIA grouping. Remaining EA clinic attendees, diagnosed with self-limiting inflammatory arthritis, crystal pathologies, osteoarthritis or non-inflammatory arthralgia at follow-up, formed the non-PIA category. Enrolled patients consented to participate in the study, which received a favourable review by the Newcastle and North Tyneside Local Research Ethics Committee.

MSUS protocol
The same Aplio™ Diagnostic Ultrasound System (Toshiba Medical Systems Corporation, Tochigi-Ken, Japan) was employed for all MSUS assessments. This employed a 12 MHz probe, the screening protocol used in the study could be completed in approximately 10 minutes, and was performed in the EA clinic by one of three experienced MSUS practitioners. A total of 16 peripheral small joints were routinely evaluated: the second to fourth metacarpophalangeal (MCP) and proximal interphalangeal (PIP) joints bilaterally (dorsal and volar longitudinal planes, neutral and flexed position) and the first and second metatarsophalangeal (MTP) joints bilaterally (dorsal longitudinal plane only). Semiquantitative scores were assigned at each site for three "domains": grey-scale synovitis, power Doppler signal and bony erosion, according to consensus definitions [18,19]. For each domain at each hand joint, only the higher of the dorsal and volar scores was recorded. Each domain was scored on a 0 to 3 semi-quantitative scale, based on the system first suggested by Szudlarek et al. [20], but adopting the modification of Scheel et al. in respect of grey-scale synovitis, whereby effusion and synovial thickening were grouped to give a single, combined score [21]. In recognition that a degree of grey-scale synovitis at MTP1 may be physiological [13], a semi-quantitative score of 1 at this joint did not count towards the overall score for this domain. Once complete MSUS datasets had been obtained for 397 patients, five potentially useful dichotomous variables were identified with respect to diagnostic outcome. Broadly, these parameters represented either "synovitis load" within a defined number of joints (for example, the sum of semi-quantitative grey-scale synovitis scores of the 16 specified joints), or "proportionate joint involvement" (for example, the proportion of joints in which any power doppler signal was present); optimal cut-off values for each parameter were determined from the available datasets. During preliminary work for the study, inter-observer agreement between assessors was determined. For example, 16 small hand/foot joints of 20 EA clinic patients (a total of 316 joints) were independently scanned by two practitioners (PNP and AGP), each being blinded to the recorded findings of the other at the time of documentation. Semi-quantitative scores of both grey-scale synovitis and power Doppler synovitis were dichotomised into the presence or absence of a score ≥1 for each joint, and resultant Kappa statistics (0.56 for grey-scale and 0.64 for power Doppler synovitis) demonstrated moderately good inter-observer agreement, which is comparable with that seen in other settings [22]. Comparable kappa statistics were obtained between other practitioners using the same method. Further preliminary work confirmed an excellent intra-observer reliability for small joint MSUS measurements. For example, anonomysed images from 18 patients, scored by AGP on separate occasions (two weeks apart) in random sequence for the presence/absence of grey-scale and power Doppler semi-quantitative scores ≥2, yielded kappa values of 0.85 and 0.91, respectively.

Statistical analysis
Student's t-tests and Mann-Whitney U tests (parametric and non-parametric data, respectively), contingency table statistics (kappa tests and Pearson's chi-squared test, including Yates' continuity correction and effect size calculations for 2 × 2 tables), logistic regression analyses and the construction of receiver operator characteristic (ROC) curves were carried out using SPSS version 15 (SPSS Inc., Chicago, IL, USA).
For the primary logistic regression analyses of the complete Early Arthritis Clinic (EAC) cohort, a backward selection approach was used which rationally identified the most significant independent variables, with a P-value of 0.1 having been set as the removal criterion. Where variables were available in either continuous or categorical formats the choice of whether to enter a given variable in continuous or categorical format was made based on an iterative process, whereby the accuracies of derived models for discriminating EA-PIA versus EA-non-PIA were compared -only the derivation of the final model is outlined here. In the resultant regression models, the predicted probability of PIA was related to the covariates via the following prognostic index: B 1 x 1 + B 2 x 2 +... ...+B n x n , where × refers to a specific covariate, n is the total number of covariates in the model, and the regression coefficient (B) of each covariate indicates an estimate of the relative magnitude of its prognostic power. Using this prognostic index, it was possible to calculate the predicted probability of PIA developing for every EA patient. For ease of use, the values of regression coefficients (B) incorporated into the index were doubled and then rounded to the nearest 0.5 to provide a simplified prognostic index for clinical application, without substantially altering the prognostic utility of the tool. Where data were missing for individual subjects (within the "minimum dataset" constraints outlined above), median values from the final study cohort were imputed to enable multivariate analysis of the complete dataset. Variables requiring modification in this way (number, percentage of individuals) were: C-reactive protein (CRP) (5, 1%), erythrocyte sedimentation rate (ESR) (9, 2%), anti-citrullinated peptide autoantibody (ACPA) (9, 2%), and rheumatoid factor (RF) (3, 0.8%).
An entirely analogous approach was then used for the derivation of regression models for predicting an RA diagnostic outcome amongst the sub-cohort of patients classified as having PIA, and for predicting RA amongst those who presented with UA.

EA patient cohort and univariate analysis of baseline characteristics
A total of 389 eligible patients were recruited between September 2006 and April 2009 inclusive, and were followed up for a minimum of 12 months (median 27; range 12 to 44 months); 10 had arthritis that remained undifferentiated at the end of the follow-up period, and were excluded from analysis. The diagnostic evolution of the remaining 379 patients is presented in Table 2. The baseline clinical and serological characteristics of patients in whom PIA did or did not develop are compared in Table 3, each of 12 variables being considered both categorically and continuously where possible; these were: age, sex, smoking status, symptom duration, tender joint count, swollen joint count, joint pattern score, early morning stiffness (EMS), ESR, CRP, RF and ACPA status. Five MSUS parameters for synovitis were identified as having potential discriminatory utility at baseline with respect to an outcome of PIA; namely, (i) "sum of scores (grey-scale domain) for total of 16 scanned joints," (ii) "sum of scores (grey-scale domain) for 6 scanned joints of worst-affected hand," (iii) "number out of 16 total scanned joints scoring ≥1 (grey-scale domain)", (iv) "sum of scores (power Doppler domain) for total of 16 scanned joints" and (v) "number out of 16 total scanned joints scoring >1 (power Doppler domain) (see also Methods). Findings in relation to these parameters, as well as the presence/absence of MSUS erosions were also compared (as dichotomous variables, according to pre-determined optimal cut-offs) between comparator groups (Table 3).

Predictive algorithm for PIA: no MSUS
The 12 clinical/serological variables were entered into a backward stepwise logistic regression analysis, with PIA versus non-PIA outcome as the dependent variable (Table 4). Amongst 379 EA clinic patients, and after sequential removal of non-significant parameters, 7 variables were independently associated with PIA. The final model containing these predictors was significantly associated with PIA (χ 2 (7 degrees of freedom) = 240.4; P < 0.001), and explained between 47% (Cox and Snell R square) and 63% (Nargelkerke R squared) of the variance in diagnostic outcome [23]. A simple "risk metric" for clinical use could be calculated for each of the 379 individuals in the EA cohort, based in each case on the values of the seven independent predictive variables in the regression model [16] (Figure 1 and see also Methods). This risk metric was shown to have an excellent discriminatory ability in the current dataset through the construction of a ROC curve, the area under which was 0.91 (standard error of mean (SEM) = 0.015, P < 0.001; Figure 2).
The incidence of PIA in our cohort was 162/379, or 0.42. Taking this to represent the prior probability of PIA amongst our EA cohort, and employing a single cut-off value for the prediction score, the positive and negative predictive values (PPV and NPV) of a score ≥4.0 were 0.83 in both cases (95% CIs 0.75 to 0.88 and 0.78 to 0.88, respectively), the positive and negative likelihood ratios (+LR and -LR) being 6.4 (4.41 to 9.25) and 0.27 (0.20 to 0.35), respectively. In the absence of external, independent validation, studies which rely on statistical modelling by logistic regression may lead to overoptimistic assessment of predictive utilities due to the phenomenon of "over-fitting" [23]. By employing a stringent 10-fold random sub-sampling cross-validation to account for this possibility [24], our model's PPV with respect to PIA reduced to 0.72 (0.68 to 0.76), but the NPV was maintained at 0.85 (0.83 to 0.88) (+LR and -LR corrected to 3.56 (3.10 to 4.08) and 0.22 (0.18 to 0.26), respectively).

Predictive algorithm for PIA is not improved by incorporation of MSUS parameter(s)
Backward stepwise logistic regression analysis was repeated, with the addition of the five discriminatory MSUS parameters as independent variables alongside the seven clinical/serological variables previously identified (and listed in Table 4). The results of this revised multivariate analysis are presented in Table 5, which identifies a predictive model very similar to that previously derived, comprising seven independent predictors of PIA, but with "joint pattern score" being  Baseline data for study cohort, and univariate analysis amongst patients with persistent inflammatory versus non-inflammatory diagnostic outcomes. Clinical and laboratory parameters are considered in both continuous (cont.) and categorical (cat.) formats. ACPA, anti-citrullinated peptide antibody; EMS, early morning stiffness; IQR, inter-quartile range; JPS, joint pattern score; RF, rheumatoid factor; SD, standard deviation, SxDur, symptom duration; T/SJC, tender/swollen joint count, (see Methods for derivation). Baseline MSUS data and corresponding univariate analysis (employing predetermined, optimal cut-off values for each dichotomous parameter) is also presented. ∑, sum of, semi-quant, semi-quantitative; PIA, persistent inflammatory arthritis. *Mann-Whitney U/Student's t tests for skewed/normally-distributed continuous data respectively; Pearson's χ 2 with Yates' continuity correction for dichotomous data.
replaced by the MSUS parameter, " ≥3/16 specified joints with any grey-scale synovitis". This model was again significantly associated with outcome (χ 2 (7 degrees of freedom) = 255.8; P < 0.001), explaining between 49% (Cox and Snell R square) and 66% (Nargelkerke R squared) of the variance in diagnostic outcome [23]. The simplified risk metric for clinical use was then revised to incorporate the new MSUS parameter in place of joint pattern score ( Figure 3). Based on the ROC curves constructed from metrics that did or did not include MSUS parameter(s), the diagnostic utility of the new metric (which required MSUS) was seen to be equivalent to, but not superior to, that derived from more readily obtainable clinical and serological parameters alone (area under both curves 0.91; SEM 0.015; P < 0.001) (Figure 2).
A predictive algorithm for RA amongst the PIA subcohort is not improved by the incorporation of MSUS parameter(s) Using the principals applied to the early arthritis cohort as a whole, described above, simplified risk metrics were developed to discriminate between PIA patients diagnosed with RA versus alternate inflammatory arthritides, exploring the value added for such purposes by the incorporation of MSUS. Hence, excluding MSUS parameters, backward stepwise regression identified five   variables independently associated with an outcome of RA in this sub-cohort (n = 162); namely age, swollen joint count, joint pattern score, ESR and ACPA status ( Table 6). By additionally entering five discriminatory MSUS parameters into the regression analysis a similar predictive model was derived, comprising five independent variables, which differed only in that a dichotomous power Doppler semi-quantitative score (total ≥1 for total of 16 scanned joints) replaced swollen joint count (SJC) ≥1 (Table 7). However, the respective discriminatory ability of simplified risk metrics derived from the two predictive models were comparable, suggesting that MSUS did not add value to clinical and serological parameters alone in identifying PIA patients who develop RA (Figure 4). Further additive roles for MSUS in predicting clinically relevant outcomes in early arthrtis patients were explored. For example, using similar approaches, we were unable to demostrate its enhanced ability to predict PIA in a sub-cohort of EA clinic attendees who present without clinical evidence of swollen/ inflamed joints, or to predict progression to RA amongst those individuals who present with UA (n = 204 and n = 91, respectively; data not shown).

Discussion
We present a predictive algorithm, developed in a "trueto-life" EA clinic setting in the UK (Figure 1), which may be used to estimate the probability that an individual EA clinic attendee will develop a PIA. Predicting persistent inflammatory disease amongst EA clinic attendees, rather than mere fulfilment of classification criteria for RA (based on the 1987 ACR classification criteria for the disease), sets our study's objectives apart from those of van der Helm-van Mil et al. [7,16], but our approach bears similarities with those of others. In the well-known example of Visser et al., an 8-point scoring system for use amongst EA patients was developed (maximum score 13) that permitted the probablility of PIA to be calculated for an individual patient [11]. Unsurprisingly, the component parameters identified in that analysis overlapped with those of the current study (Figure 1), particularly emphasising the predictive importance of symptom duration, distribution of involved joints and ACPA status for PIA. Unlike the cohort studied by Visser et al., however, it is important to note that ours included some patients with no objectively swollen joints at baseline; this was in order to capture the clinically important "inflammatory arthralgia" group. This probably explains the inclusion of swollen joint count and CRP parameters in our own (but not in Visser et al.'s) predictive model. Our study addressed a highly relevant clinical question: whether MSUS, considered alongside more readily obtainable clinical and laboratory measurements, helps to predict PIA amongst unselected EAC patients referred from primary care with recent-onset joint pain. The comprehensiveness of any MSUS screening protocol used for this purpose needs to be balanced by the feasibility of its use as part of a time-constrained, routine clinical assessment, and we developed a protocol that could in most cases be completed within 10 minutes, focussing on small peripheral joints. In our hands, MSUS parameters provided no additional discriminatory value under these circumstances (Figure 2). The independent contribution of ACPA status to predictive models for PIA was remarkable, being reflected in the magnitude of its associated regression coefficient (4.89; Table 4), implying that the adoption of autoantibody testing as a diagnostic tool in this setting provides superior discriminatory utility over the MSUS screening  parameters presented here. With 57% of EAC patients having non-inflammatory outcomes, it may be argued that the discriminatory utility of MSUS would be better defined when discriminating patients with an outcome of classifiable RA amongst the sub-cohort with PIA, but this was not found to be the case in our study ( Figure  4). Neither were we able to demonstrate an additive predictive utility of MSUS when predicting progression of (i) arthralgia (in the absence of clinically evident synovitis) to PIA, or (ii) UA to RA -although our study lacked power to exclude either of these comprehensively. It is nonetheless noteworthy that the backward selection procedure employed during regression analyses suggested that the independent association amongst PIA patients of RA outcome with the presence of one or more clinically swollen joints was less strong than it was with the presence of any power Doppler signal in the 16 screened peripheral joints (compare respective odds ratios, Tables  6 and 7). The quantitative lack of additive discrimination provided by MSUS in these settings does not, therefore, negate the value of this imaging modality as an extension of the clinical examination in the evaluation of early arthritis.
Our observational study has a number of limitations, primarily reflecting the opportunistic manner in which data were collected in a real-time, real-life and busy EA clinic. Data published by others during the course of our study have suggested that the wrists and (albeit in the longitudinal evaluation of established disease) the fifth MTPJs may have particular value as sensitive markers of inflammatory arthritis [13,25], but these joints were not included in our routine screening algorithm. Conversely the first MTPJ frequently contains an effusion of equivocal significance and is the least informative MTPJ [13]. Although we discounted a semiquantitative greyscale score of 1 at this site (see Methods), our inclusion of the first MTPJ may, therefore, have influenced our findings, and our routine screening protocol might have yielded superior discriminatory value had it been informed by the detailed pilot work of Filer et al. [13]. Further work is required to define an optimal short MSUS screening protocol for use in the early arthritis setting, and, in particular, the extent to which it should include extra-articular sites, for example, to determine the presence of tenosynovitis or enthesitis. Finally, the definition of PIA was pragmatic, being based upon outcome diagnoses at follow-up, and the classification of each diagnosis for this purpose was not intuitive in all cases. For example, it was decided to define patients with crystal arthropathy (gout and calcium pyrophosphate deposition) as non-PIA, since such patients are not considered candidates for DMARDs, and their appropriate management in this cohort ensured that inflammatory manifestations were, for the most part, self-limiting. Their number was relatively small (n = 17), and classifying them alternatively as PIA did not materially alter the overall outcome of the analysis.
Notwithstanding the above, we set out to address whether MSUS parameters added to more readily obtainable clinical and laboratory data in the generation of predictive models in the EAC. In this regard, our large EAC cohort adds a substantial dataset to inform current knowledge and debate.

Conclusions
A "risk metric" using baseline clinical parameters predicts early arthritis patients who develop PIA. The incorporation of MSUS parameters does not add discriminatory value to the risk metric. Further studies will determine the role of MSUS in the diagnosis of early arthritis.