Ultrasound composite scores for the assessment of inflammatory and structural pathologies in Psoriatic Arthritis (PsASon-Score)

Introduction This study was performed to develop ultrasound composite scores for the assessment of inflammatory and structural lesions in Psoriatic Arthritis (PsA). Methods We performed a prospective study on 83 PsA patients undergoing two study visits scheduled 6 months apart. B-mode and Power Doppler (PD) findings were semi-quantitatively scored at 68 joints (evaluating synovia, perisynovial tissue, tendons and bone) and 14 entheses. We constructed bilateral and unilateral (focusing the dominant site) ultrasound composite scores selecting relevant sites by a hierarchical approach. We tested convergent construct validity, reliability and feasibility of inflammatory and structural elements of the scores as well as sensitivity to change for inflammatory items. Results The bilateral score (termed PsASon22) included 22 joints (6 metacarpophalangeal joints (MCPs), 4 proximal interphalangeal joints (PIPs) of hands (H-PIPs), 2 metatarsophalangeal joints (MTPs), 4 distal interphalangeal joints (DIPs) of hands (H-DIPs), 2 DIPs of feet (F-DIPs), 4 large joints) and 4 entheses (bilateral assessment of lateral epicondyle and distal patellar tendon). The unilateral score (PsASon13) compromised 13 joints (2 MCPs, 3 H-PIPs, 1 PIP of feet (F-PIP), 2 MTPs, 1 H-DIP and 2 F-DIPs and 2 large joints) and 2 entheses (unilateral lateral epicondyle and distal patellar tendon). Both composite scores revealed a moderate to high sensitivity (bilateral composite score 43% to 100%, unilateral 36% to 100%) to detect inflammatory and structural lesions compared to the 68-joint/14-entheses score. The inflammatory and structural components of the composite scores correlated weakly with clinical markers of disease activity (corrcoeffs 0 to 0.40) and the health assessment questionnaire (HAQ, corrcoeffs 0 to 0.39), respectively. Patients with active disease achieving remission at follow-up yielded greater reductions of ultrasound inflammatory scores than those with stable clinical activity (Cohen’s d effect size ranging from 0 to 0.79). Inter-rater reliability of bi- and unilateral composite scores was moderate to good with ICCs ranging from 0.42 to 0.96 and from 0.36 to 0.71, respectively for inflammatory and structural sub-scores. The PsASon22 and PsASon13 required 16 to 26 and 9 to 13 minutes, respectively to be completed. Conclusion Both new PsA ultrasound composite scores (PsASon22 and PsASon13) revealed sufficient convergent construct validity, sensitivity to change, reliability and feasibility. Electronic supplementary material The online version of this article (doi:10.1186/s13075-014-0476-2) contains supplementary material, which is available to authorized users.


Introduction
Current EULAR recommendations on the use of imaging techniques in rheumatoid arthritis (RA) recognize the high sensitivity of sonography to detect joint pathologies, suggesting the use of this technique for a more accurate assessment of patients' disease activity as compared to clinical examination alone [1]. In routine practice and clinical trials of psoriatic arthritis (PsA), disease activity is still monitored by RA-specific clinical composite scores [2]. These measures, however, are of questionable value for PsA because of the heterogeneous nature of the disease characterized by various articular and extra-articular manifestations [3].
We recently reported good performance of sonography for the assessment of disease activity in PsA as determined by the investigation of 68 synovial sites and periarticular structures, as well as 14 entheses [4]. A comprehensive ultrasound assessment as performed in this study, however, is not feasible in daily routine practice, whereas a reduced ultrasound composite score might enable sonographic scoring of PsA patients in interventional trials and clinical practice.
A single ultrasound composite score has been developed for the assessment of PsA patients so far: the Italian so-called five-targets score focuses on joints, tendons, entheses, skin and nails; however, as only one site is assessed for each item, this index is of limited value to determine overall disease activity [5]. The German U7 score, primarily developed for RA, has occasionally been used to monitor disease activity in PsA patients in interventional studies; however, this score does not include important PsA manifestations such as enthesitis or distal interphalangeal joint (DIP) arthritis [6,7].
The aim of this study was to develop ultrasound composite scores that (1) include all currently defined ultrasound pathologies of PsA, (2) are sensitive for detection of inflammation and structural damage as compared to the assessment of 68 joints and 14 entheses and (3) are feasible in clinical practice. We tested the construct validity, reliability and feasibility of inflammatory and structural elements of the new composite scores and investigated the sensitivity to change for the inflammatory items.

Patients
We performed a prospective study on 83 consecutive PsA patients between July 2011 and May 2013. All patients fulfilled the classification for psoriatic arthritis (CASPAR) criteria and had peripheral articular manifestations [8]. The institutional review board of the Medical University Graz approved the study and written informed consent was obtained from each patient.
Two study visits were scheduled 6 months apart. At each visit, complete history and clinical assessments were performed by one of three rheumatologists (AF, JG and JH) unaware of the ultrasound results [4]. The following parameters were recorded: number of tender joints (TJ) and swollen joints (SJ) according to the 66/68 articular index, presence of enthesitis according to the Leeds enthesitis index (LEI) and a clinical counterpart of the Madrid sonographic enthesis index (MASEI) plus lateral epicondyle (cMASEI + E) [9,10]. Patients' global assessment of disease activity (PGA), patients' pain assessment (Ptpain) and the evaluator's global assessment of disease activity (EGA) were determined on visual analogue scales (range 0 to 100 mm). We also recorded patients' questionnaires as previously described [4]. Blood samples were routinely tested for erythrocyte sedimentation rate (ESR, range 0 to 10 mm/first hour) and C-reactive protein (CRP) (range 0 to 5 mg/L).
We calculated the following clinical composite scores: the disease activity index for psoriatic arthritis (DAPSA), the composite psoriatic disease activity index (CPDAI) and a modified psoriatic arthritis disease activity score (PASDAS) omitting the short form health survey 36 (SF-36) component, which was not available in our patients [11][12][13]. In addition, we applied the minimal disease activity (MDA) criteria (5 of the 7 following items: TJ ≤1, SJ ≤1, Psoriasis activity and severity index (PASI) ≤1, Ptpain ≤15 mm, PGA ≤20 mm, health assessment questionnaire (HAQ) ≤0.5, tender enthesal points ≤1]) [14] and asked the evaluating rheumatologist for an overall judgment of patients' clinical disease activity without using formal criteria or scores (two possible levels: active disease or remission).
Enthesitis was assessed according to the MASEI investigating the presence and/or extent of erosions, enthesophytes, PD-signals (PD-e) and GS-changes [9].
We considered GS-changes of enthesitis (GSE) as a combined feature including the loss of fibrillar pattern, hypoechoic aspect, bursitis (infrapatellar and/or retrocalcaneal bursa) and/or enthesal thickening. The following anatomical sites were scanned: the insertion of the common extensor tendon at the lateral epicondyle, the distal insertion of the triceps into the olecranon, the quadriceps insertion into the upper pole of the patella, the patellar tendon insertion into the lower pole of the patella and into the tibial anterior tuberosity, the insertion of the Achilles tendon as well as the insertion of plantar aponeurosis into the calcaneal bone [4]. GSS/ GSE refers to GSS and GSE and PD-j/e combines PD-j and PD-e scores.

Construction of the ultrasound composite scores
Our aim was to develop ultrasound composite scores (1) that corroborate all currently defined ultrasound pathologies in PsA including GSS, GSE, PD-j, PD-e, GS-perisyn, PD-perisyn, GS-teno, PD-teno, erosions and ostheophytes/ enthesophytes at small, middle/large and DIP joints as well as entheses (as appropriate); (2) that have a high sensitivity to detect these PsA characteristic ultrasound features as compared to the comprehensive assessment of 68 joints and 14 entheses; and (3) whose completion is feasible in clinical practice.
We used the 68-joints/14-entheses ultrasound score as a reference for the development of the new composite scores acknowledging that this instrument has not been validated for monitoring PsA patients yet. We constructed a bilateral and a unilateral composite score (focusing on the dominant site) using the Spanish 12-joint and the German 7-joint scores, respectively, as examples [6,19]. For the inclusion of relevant joints and entheses, we applied a hierarchical approach as exemplarily depicted in Additional file 1 and described in the following paragraph.
We conducted separate cycles of the procedure for each of the following anatomical regions and ultrasound pathologies: Each cycle started with the identification of the joint/ enthesis most commonly revealing the ultrasound lesion of interest (for example, the joint that most frequently showed PD-positivity among small joints -MTP1). For the subsequent procedure, we excluded patients with a positive result at this step (in the example: patients with PD signals at MTP1). Among remaining cases, we identified the most frequently affected joint/enthesis again (in the example: cases with a positive PD result at MCP2). The rationale for excluding patients (and not simply searching for the second most commonly affected site among all cases) was to eliminate strong correlations between joint/ enthesis, thus, maximizing the gain of sensitivity by any new site. We repeated this procedure n-times until a combination of sites reached ≥90% sensitivity to detect the corresponding ultrasound abnormality as compared to the 68-joint/14-entheses score. In case the gain of sensitivity by the inclusion of a new item was <20% or <10% for the bilateral or unilateral composite scores, respectively, or the overall prevalence of the finding at a given site was <5%, the selection process was terminated earlier. The rationale for premature termination was the prevention of mechanistic inclusion of sites with a negligible contribution to the overall performance of the composite scores potentially limiting the feasibility of the scores.
Next, we tested whether dorsal or palmar/plantar scans at H-PIPs, H-DIPs, MTPs, F-PIPs, F-DIPs and wrists, as well as medial/lateral or suprapatellar scans at knees, were dispensable for the composite scores. For this purpose, we analysed the proportion of ultrasound abnormalities exclusively detected by dorsal or palmar/plantar scans as well as by medial/lateral or suprapatellar scans of knees. Scans with a yield of <20% were omitted. MCPs were not subject to this analysis, because both palmar and dorsal scans are required to investigate the composite core elements peri-and tenosynovitis, respectively.

Statistical analysis and validation of the ultrasound composite scores
Statistical analysis was performed using SPSS (version 20.0). Descriptive statistics were used to summarize the data. For continuous non-parametric data, we show the median and range whereas for parametric data, the mean and standard deviation are depicted. Comparisons between independent groups were conducted using the Mann-Whitney U-test and paired data were analysed with the Wilcoxon test (non-parametric data) or Student's t-test (parametric data). Paired categorical data were analysed with the McNemar test.
Convergent construct validity was investigated by the correlation (using Spearman's rank correlation test) of inflammatory items (including GUIS) with clinical parameters of disease activity, and by correlating structural components with the HAQ (analysing the total cohort as well as patients in clinical remission separately in order to adjust for the activity related component of disability [20]).
Sensitivity to change was determined for the inflammatory components (including GUIS) only, as for structural elements the 6-months follow-up period was considered to be too short for the detection of any alterations. We compared the changes of the inflammatory subscore from baseline to the 6-months follow-up visit between patients with stable clinical disease activity (that is, active or inactive disease at both visits) and those being active at baseline and achieving remission (as determined by the evaluating physician) or MDA at follow up. We also calculated Cohen's d effect-size statistic and the standardized response means (SRMs) of the total cohort versus patients in whom clinical disease activity improved [21]. Changes of the inflammatory elements were additionally correlated with alterations of clinical composite scores and its components. Inter-rater reliability of the ultrasound composite scores was determined by serial assessments of 10% of patients by two investigators (CDe, RH) and using the intraclass correlation coefficient (ICC).

Clinical findings at baseline and follow-up visits
Patients' characteristics are summarized in Table 1. At baseline, 40 patients (48.2%) were in remission as judged by the evaluating physician and 28 patients (33.7%) fulfilled the MDA criteria. Of the 83 patients 13 (15.7%) did not complete 6-month follow up. At follow up, 41 patients (58.6%) had stable disease activity according to physician's evaluation (25 (35.7%) were inactive and 16 (22.9%) were active at both time points), 21 (30.0%) changed from active disease to remission and 3 (4.3%) from remission to active disease. Applying the MDA criteria, 15 of those (27.3%) being active at baseline reached MDA at follow up, 7 (25.0%) with MDA at baseline did not fulfill these criteria at the second visit, and 48 patients (68.6%) had stable disease (that is, were active (n =32) or had MDA (n =16) at both visits).

Synovial recesses and entheses selected for the ultrasound composite scores
Additional file 2 details the results of selection process for the ultrasound composite scores. We made a few manual selections to improve the feasibility of the scores: (1) we included the second H-PIP instead of the first H-PIP for the PsASon22 because H-PIP2 better fitted into the construct of MCP2, MCP3, H-PIP3, H-DIP2 and H-DIP3 (that was already gathered). Also, H-PIP1 emerged from the selection process because of a high prevalence of osteophytes; however, the sensitivity of H-PIP2 for the detection of osteophytes was only slightly lower than that of H-PIP1; (2) we omitted the insertion of the Achilles tendon from both the PsASon22 and the PsASon13 scores because this site was mainly relevant for the detection of enthesophytes; and (3) we omitted the distal insertion of the triceps into the olecranon from PsASon13 because this site was mainly relevant for the detection of enthesophytes. The combination of the insertion of the common extensor tendon at the lateral epicondyle and the patellar tendon insertion into the tibial anterior tuberosity already revealed a very high sensitivity to identify patients with enthesophytes. Table 2 summarizes the proportion of ultrasound abnormalities exclusively detected by dorsal or palmar/plantar scans of H-PIPs, H-DIPs, MTPs, F-PIPs, F-DIPs and wrists as well as by medial/lateral or suprapatellar assessments of knees. We observed that ≥20% of structural and inflammatory pathologies at the H-PIP and H-DIP level were identified by palmar or dorsal scans only. At the knees, suprapatellar, medial/lateral scans are required not to miss patients with PD-signals. At the wrists and MTPs, palmar/ plantar scans appeared to be less relevant given that only a minority of patients revealed ultrasound abnormalities at these sites. At F-PIPs and F-DIPs, the proportion of erosions detected by plantar scans was >20%; however, we decided to omit plantar scans from the composite scores due to the low absolute number of erosions at these sites.
Sensitivity of the ultrasound composite scores for the detection of ultrasound pathologies As detailed in Table 4, the bilateral composite score yielded sensitivity >80% for most lesions and sites using the 68-joints/14-entheses ultrasound score as the reference, whereas the unilateral score was less efficient revealing >80% sensitivity for GSS at small/large joints, GSE and osteophytes/enthesophytes only.
Both composite scores were more sensitive for the detection of PsA characteristic pathologies at small and large joints than at DIPs. Both composite scores were more efficient to identify osteophytes/enthesophytes than erosions, and among inflammatory lesions, GS-changes were more commonly detected than PD-signals.

Convergent construct validity of ultrasound composite scores
Tables 5 and 6 depict the association of inflammatory and structural components of the bilateral and unilateral

(100)
The table depicts the number of joints at which the indicated ultrasound lesion [that is, erosions, osteophytes, grey scale synovitis (GSS) or Power Doppler signals (PD-j)] and the respective grading (1 or 2 to 3) were exclusively identified by dorsal (Dorsal only) or palmar/plantar (Palmar/plantar only) scans. Total refers to the overall number of joints positive for the specific ultrasound abnormality (irrespective of whether it was identified by dorsal, palmar/plantar or both scans). We conducted separate analyses for small finger joints except metacarpophalangeal joints (H-PIP + H-DIP), small joints of feet (MTP or F-PIP + F-DIP) and wrists. At the knees, we compared medial plus lateral (Medial/lat.) versus suprapatellar (Suprapat.) scans. Data in parenthesis reflect the percentage of positive joints out of the total number of joints revealing the indicated ultrasound finding. F-DIP, distal interphalangeal joint of the feet; F-PIP, proximal interphalangeal joint of the feet, GSS, grey scale synovitis; H-DIP, distal interphalangeal joint of hands; H-PIP, proximal interphalangeal joint of hands; MTP, metatarsophalangeal joint; PD-j, power Doppler at joints; T, total number of patients with the specific lesion; hyphens indicate "not assessed". composite scores, and the 68-joint/14-entheses score with clinical parameters of inflammation and disability (convergent construct validity). We observed weak to moderate correlations of the GSS/GSE, PD-j/e, and the GUIS scores with clinical composite scores, global assessments of pain/disease activity and acute phase reactants (Table 5). Importantly, the ultrasound composite scores yielded similar associations with clinical measures of disease activity to the 68-joint/14-entheses score (except for the component PD-teno that weakly correlated with clinical composite scores only when the sonographic 68-joint/ 14-entheses score was applied). Joint/enthesal erosions but not osteophytes correlated with HAQ, particularly in patients judged to be in remission in whom the activity related (that is, reversible) component of disability is assumed to be low (Table 6) [20].
Further subanalyses indicated that GSS and PD-j scores were associated with the number of SJ, the PD-j score with the number of TJ, and the GSE with the MASEI + epi (see Additional file 4).

Sensitivity to change in ultrasound composite scores
As detailed in Figure 2 and Additional file 5 we observed that patients changing from a clinical status of active disease to remission/MDA yielded greater reductions in the GUIS of the bilateral, the unilateral or the 68-joint/14entheses scores compared to patients with unchanged clinical activity. The effect size (Cohen's d statistic) for the GUIS was low to moderate ranging from 0.40 (PsASon13) to 0.75 (68-joint/14-entheses score). SRM of patients changing from active disease to remission or MDA ranged from −1.04 to −0.09 (as compared to −0.53 to −0.04 in the entire cohort).
In subanalyses, we observed that the GSS/GSE was the most sensitive GUIS item to change (for details including effect size and SRM see Additional file 5). We also found weak to moderate correlations between ΔGUIS scores and ΔPASDAS, ΔDAPSA, ΔPtpain, ΔPGA and ΔEGA as depicted in Additional file 6.

Feasibility and inter-rater reliability
The median time to complete the bilateral and unilateral scores was 19 minutes (range 16 to 26) and 10 (9 to 13) minutes, respectively. Inter-rater reliability of components from bilateral and unilateral composite scores was moderate to good: ICCs for the GUIS were 0.84 and 0.54, respectively. ICCs of the GUIS components ranged from 0.42 (GS-teno) to 0.96 (PD-j) for the bilateral composite score and from 0.36 (GS-teno) to 0.71 (PD-j) for the unilateral score. The ICC for erosions was 0.75 and 0.64, respectively; and for the osteophyte/enthesophyte subscore, it was 0.92 and 0.41, respectively, for the bilateral and unilateral composite scores.

Discussion
In the present study, we propose two new ultrasound composite scores for the assessment of PsA specific inflammatory and structural lesions. Both composite scores had adequate convergent construct validity, yielded reliable results and the inflammatory elements revealed sufficient sensitivity to change. We found the bilateral score (PsA-Son22) more sensitive than the unilateral composite score (PsASon13) to detect PsA-specific pathologies whereas the latter was faster to accomplish.
The major strengths of our composite scores are the inclusion of all currently defined Ps-related ultrasound pathologies into the scores, and the data driven identification of joints, peri-articular structures and entheses, except for a few manual selections aimed at the improvement of the feasibility of the scores. The German U7 and the Italian five-targets scores, the two other ultrasound composite scores previously applied in PsA, were developed on a basis of clinical experience and focus on fewer sites and less PsA-specific pathologies than our instruments [5,6].
Our data demonstrate that palmar scans of wrists and plantar scans of feet are dispensable for the ultrasound composite scores, whereas the investigation of palmar and dorsal sites of fingers is required to achieve sufficient sensitivity to identify PsA-specific pathologies. Similar data were reported for the assessment of the rheumatoid hand, whereas a comparison between plantar and dorsal scans of feet has not been performed in other cohorts so far [22,23].
We did not include dactylitis into the composite scores, because of the lack of an established ultrasound definition. Earlier studies suggested that the combination of arthritis and tenosynovitis is the underlying pathology Table 3 Scans and anatomical sites of the bilateral (PsASon22) and unilateral ultrasound score (PsASon13) of dactylitis, whereas recent ultrasound and magnetic resonance imaging data indicate that isolated tenosynovitis, soft tissue oedema and/or collateral tendon enthesitis are frequently observed in dactylitic fingers and toes [24][25][26][27][28]. An outcome measures in rheumatology (OMERACT) project aimed at the agreement on a new ultrasound definition of dactylitis is currently underway, and a revision of our composite scores may be considered once such a definition has been validated [27]. MTP1 was included into the composite scores because we frequently found GSS and PD-j changes at this site. Earlier studies in PsA and RA also reported a high prevalence of inflammatory changes at MTPs; however, we recognize potential overestimation of PsA-related synovitis at that site as MTP1 is also frequently affected by osteoarthritis and because there were some cases of concomitant gout mimicking PsA flares [29][30][31][32][33][34]. We did not systematically record ultrasound signs of crystal arthropathies in our patients, but emphasise that such a study should be performed in the future [35]. Similarly, PIP and DIP joints may be inflamed either due to PsA or (secondary or concomitant) symptomatic osteoarthritis, and it is currently impossible to reliably identify the true driver of inflammation by means of imaging methods alone [36,37]. This uncertainty also relates to RA-specific ultrasound scores, and we do currently not know whether ultrasoundverified synovitis caused by the primary disease or (secondary/concomitant) osteoarthritis has a different impact on clinical and structural outcomes in RA and PsA [6,38].
We observed a higher sensitivity of the bilateral versus the unilateral composite score for the detection of inflammatory and structural changes, but at the cost of time. In situations, where a high sensitivity is less relevant (for example, use of sonography to monitor treatment response) the unilateral score may be preferred, whereas for remission assessment the bilateral ultrasound score may produce more valuable results. In RA, subclinical arthritis was of high prognostic value regarding clinical relapses and progression of erosions, and we are currently investigating the relevance of ultrasound-verified inflammation for patients' related outcomes in PsA [39][40][41]. A further refinement of the ultrasound scores based on the results of this study (for example, by inclusion of sites with a high significance for the prediction of structural progression) cannot be excluded at this stage.
To test convergent construct validity, we correlated the inflammatory and structural components of the ultrasound scores with clinical measures of disease activity and disability, respectively. We applied the PASDAS, CPDAI and DAPSA as markers of clinical activity recognizing that none of these instruments have yet been established as the gold standard. HAQ better correlated with erosions than with osteophytes, whereas in RA the loss of cartilage was the most important factor contributing to patients' disability [20].
Among individual components of the GUIS, the GSS/ GSE correlated best with clinical activity and revealed the highest sensitivity to change during follow up. This observation may be related to the facts that peripheral arthritis was present in the majority of patients (74.7% had active joint disease according to CPDAI at baseline), that joint disease highly impacts clinical composite scores and global measures of disease activity and that GSS/GEE had the largest variability among the GUIS components [4,42]. Also, we observed that both ultrasound scores were generally more sensitive to detect GS than PD changes at joints and entheses, possibly explained by a high prevalence of GS-abnormalities even in PsA patients with low or no clinical activity [4].
The major limitation of our study is the relatively low clinical disease activity of the cohort leading to a possible underestimation of the sensitivity to change of the inflammatory elements. Previously proposed ultrasound scores in PsA and RA were validated in patients with high disease activity at baseline and subsequent treatment with biological agents [5,19,43]. Consequently, dramatic changes in disease scores were observed resulting in better sensitivity to change in the ultrasound composite scores by statistical means. The low disease activity in our cohort may also explain the relatively weak association between inflammatory items of the ultrasound composite scores and clinical factors. We previously concluded from an RA study that the strength of association between clinical and ultrasound measures diminishes as patients are in or close to remission [15]. Nevertheless, our ultrasound scores resulted in sufficient sensitivity to change and convergent construct validity, given that the comprehensive assessment of 68-joint/14-entheses did not produce better results than the reduced scores.
Other limitations of this study are the moderate size of the cohort, the relatively long disease duration, the singlecentre design and the use of the as yet unvalidated 68joint/14-entheses score as a reference for the development Data indicate the correlation of the inflammatory components of the bilateral and unilateral ultrasound composite scores and the 68-joint/14-entheses score with clinical measures of disease activity. CPDAI, composite psoriatic disease activity index; CRP, C-reactive protein; DAPSA, disease activity index for psoriatic arthritis; EGA, evaluator's global assessment of disease activity; ESR, erythrocyte sedimentation rate; GS-Peri, grey scale perisynovitis; GS-Teno, grey scale tenosynovitis; GSS/ GSE, grey scale synovitis/grey scale enthesitis ultrasound score; GUIS, global ultrasound inflammation subscore; PASDAS, modified psoriatic arthritis disease activity score; PD-j/e, power Doppler scores at joints/entheses; PD-Peri, power Doppler Perisynovitis; PD-Teno, power Doppler Tenosynovitis; PGA, patients' global assessment of disease activity; Ptpain, patients' pain assessment. ***P <0.001; **P <0.01; *P <0.05; †P <0.1; NA, no significant association found. of the composite scores. Also, we did not test the responsiveness of structural components of the scores because of the short follow-up period and the absence of paired results from other imaging methods. Future studies with a long-term (preferentially multicentre) design, the inclusion of various patients' subgroups (early disease, different clinical manifestations and new treatment with biologics) and the possibility to compare ultrasound data with other imaging methods are needed for external validation of our data.
Another interesting finding of this study is the better reliability of bilateral versus unilateral composite scores. This observation can be explained by the fact that scores with high variability of results (resulting from a larger number of sites included in the bilateral score) yield better statistical associations than scores with a narrow Figure 2 Sensitivity to change of ultrasound composite scores. Change (Δscore 6-month visitscore baseline) of global ultrasound inflammatory subscores (global ultrasound inflammation score (GUI-score)) in patients without a change in clinical disease activity (that is, active or remission at both baseline and follow-up visits) (no change DA) and patients who were active at baseline and achieved remission according to the evaluating physician (A), active-remission) or minimal disease activity (B) (active-MDA) at 6 months follow-up. Whiskers box plots show the median and 50% of cases within the boxes and all data excluding mavericks between the end points of the whiskers. Differences were tested by the Mann-Whitney U-test.