Level of agreement between 2002 American–European Consensus Group and 2012 American College of Rheumatology classification criteria for Sjögren’s syndrome and reasons for discrepancies

Introduction The aims of this study were to assess agreement between the currently used 2002 American–European Consensus Group (AECG) classification criteria and the new 2012 American College of Rheumatology (ACR) criteria for Sjögren’s syndrome (SS) and to identify potential sources of disagreement. Methods We studied 105 patients between 2006 and 2013 from the Brittany cohort of patients with suspected SS. AECG criteria were applied using only Schimer’s test and unstimulated whole salivary flow (UWSF) to assess objective ocular and oral involvement, since these are the tests most physicians use in clinical practice. Agreement between the two sets of criteria was assessed using Cohen’s κ coefficient. Results Of those studied, 42 patients fulfilled AECG and 35 ACR criteria. Agreement between the two sets was moderate (κ = 0.53). Patients fulfilling ACR but not AECG criteria (n = 8) were significantly younger and had shorter symptom durations, but only three of them had SS in the opinion of the evaluating physician. Xerostomia and xerophthalmia (AECG set only) did not discriminate between patients with and without SS. The use of UWSF in the AECG but not the ACR criteria explained part of the disagreement. The serological item in the ACR set (positive rheumatoid factor and antinuclear antibody ≥1:320 or anti-SSA/SSB positivity) did not result in classification differences compared to anti-SSA/SSB antibody alone (AECG set). Agreement between ocular staining score ≥3 (ACR set) and Schirmer’s test ≤5 mm/5 min (AECG set) was very low (κ = 0.14). Conclusions Agreement was only moderate between ACR and AECG criteria, suggesting these two sets would not select comparable patient populations. An international consensus about which classification criteria should be used in clinical studies is needed.


Introduction
The classification criteria for Sjögren's syndrome (SS) issued in 2002 by the American-European Consensus Group (AECG) have been widely used in clinical studies over the last decade [1]. In 2012, the Sjögren's International Collaborative Clinical Alliance issued new classification criteria, which have been endorsed by the American College of Rheumatology (ACR) [2]. These new criteria are intended for use in patients referred to specialists because of signs or symptoms suggesting SS. They were developed by asking experts in rheumatology, ophthalmology and oral medicine to select the items they felt were most relevant.
The new criteria differ substantially from the 2002 AECG criteria in three ways: they include no subjective ocular and oral symptoms and no functional or morphological tests for the salivary glands; they use a new ocular staining score (OSS) [3] as the only criterion for ocular involvement; and they allow the use of an antinuclear antibody (ANA) titer ≥1:320 plus rheumatoid factor (RF) positivity as an alternative to anti-SSA/SSB antibody positivity for the assessment of systemic autoimmunity.
Here, our objectives were to evaluate agreement between the two criteria sets and to identify sources of disagreement.

Study population
We studied the single-centre Brittany cohort of patients with suspected SS included between November 2006 and March 2013 in Brittany, France. The inclusion criteria are described elsewhere [4]. Briefly, patients were addressed to our consultation by their family physician, rheumatologist, internist, dentist or ophthalmologist if SS was suspected due to sicca complaints, major salivary gland swelling, suggestive extraglandular features or suggestive autoantibodies. Written consent was obtained from all participants, and the study was approved by the local ethics committee (Brest University Hospital).
This study included 105 patients of the cohort who had all the tests available to apply both ACR and AECG criteria, including 99 (94.3%) women. Mean age was 57.2 ± 13.7 years and mean symptom duration was 6.7 ± 6.1 years.

Clinical evaluation and laboratory tests
Schirmer's test was considered abnormal if ≤5 mm/5 minute and the unstimulated whole salivary flow if <0.1 ml/ minute. ANAs were assessed on HEp-2 cells and anti-SSA and anti-SSB antibodies using commercial enzyme-linked immunosorbent assays, and RF (IgM and IgA isotypes) using in-house enzyme-linked immunosorbent assays. Minor labial salivary gland biopsy was performed in all patients and graded according to the semi-quantitative score of Chisholm and Mason [5]. Salivary gland biopsy grades 3 and 4 indicating focus scores ≥1/4 mm 2 were considered abnormal.

Ophthalmologic evaluation
All patients underwent a slit-lamp examination by an ophthalmologist experienced in dry-eye diseases. The corneal fluorescein pattern was graded from 0 (no punctate epithelial erosions) to 3 (severe keratitis). A drop of lissamine green dye was then instilled into the inferior conjunctival fornix of each eye, the patient was asked to blink several times, and the nasal and temporal bulbar conjunctivae were then immediately graded semi-quantitatively from 0 (no staining) to 3 (diffuse dot staining or confluent staining).
The Sjögren's International Collaborative Clinical Alliance OSS [3] was published during the inclusion period for our cohort. We retrospectively computed the OSS of both eyes as the 0 to 12 sum of the 0 to 3 lissamine green scores for the nasal and temporal conjunctiva (range, 0 to 6) plus the 0 to 3 fluorescein score for each cornea multiplied by 2 (because the specific patterns of corneal fluorescein staining giving additional points in the OSS (central staining, filaments or confluent staining) were not described in our protocol). OSS ≥3 in at least one eye was considered abnormal. Table 1 presents the rules for classification using the AECG and ACR criteria sets. AECG criteria were applied using only Schirmer's test and unstimulated whole salivary flow, which are the tests we use in clinical practice to assess objective ocular and salivary gland involvement.

Case ascertainment
The evaluating physician was also asked to define the most probable diagnosis in his opinion for each patient, without referring to specific classification criteria. All cases were reviewed by a panel of three experts (VD-P, AS and SJ-J) to reach consensus. The most probable diagnoses in the opinion of the experts were SS in 47 (44.8%) patients, idiopathic sicca syndrome in 37 (35.2%) patients, other connective tissue diseases in 11 (10.5%) patients and druginduced sicca syndrome in 10 (9.5%) patients.
This clinical definition of SS cases was not used as a gold standard to compare the diagnostic performance of AECG criteria and ACR criteria in terms of sensitivity and specificity, since the physician most probably used, even subconsciously, current validated classification criteria (that is, AECG criteria) to perform his diagnosis, leading to circular reasoning that would have overestimated AECG performance to the detriment of ACR criteria.

Statistical analysis
Statistical tests were performed using the Statistical Package for the Social Sciences (SPSS 18.0, 2009; SPSS Inc., Chicago, IL, USA). Quantitative variables were described as mean ± standard deviation and qualitative variables as number (%). Agreement between classification criteria sets and between criteria was evaluated using Cohen's kappa coefficient (κ). To compare patient groups, we used the Mann-Whitney test, Fisher's exact test or the chi-square test as appropriate.

Classification criteria
Of the 105 patients, 42 (40.0%) fulfilled the AECG criteria and 35 (33.3%) fulfilled the ACR criteria (Table 2). Agreement between the two criteria sets was moderate (κ = 0.53). Patients fulfilling only the ACR criteria were significantly younger and had shorter symptom durations than did patients fulfilling the AECG criteria (mean age, 46.6 ± 15.8 vs. 60.0 ± 11.4 years; and mean symptom duration, 2.9 ± 2.4 vs. 7.7 ± 6.8 years; P < 0.001 for both comparisons). Table 3 details the features of the eight patients fulfilling ACR criteria but not AECG criteria and of the 15 patients fulfilling AECG criteria but not ACR criteria. All patients fulfilling only AECG criteria had sicca complaints, either abnormal unstimulated whole salivary flow or Schirmer's test, and either anti-SSA/SSB antibodies or abnormal salivary gland biopsy. Five of them had extraglandular involvement, and all of them had SS in the opinion of the evaluating physician.

Description of the patients with discordant classification
Among the eight patients fulfilling only ACR criteria, none of them had abnormal unstimulated whole salivary flow or Schirmer's test, but seven had abnormal salivary gland biopsy. Only three of them had SS according to the physician's opinion, and other more likely diagnoses were rheumatoid arthritis (n = 2), systemic lupus erythematosus (n = 1), undifferentiated connective tissue disease (n = 1) and idiopathic sicca syndrome (n = 1).

Subjective sicca complaints (AECG set only)
The subjective sicca symptoms (xerophthalmia and xerostomia) were noted in nearly all of the patients (respectively in 92.4% and 94.3%), suggesting no ability of these symptoms to discriminate between patients with     and without SS in this population. The proportions of patients with xerophthalmia or xerostomia were not significantly different in the group fulfilling both criteria sets and in the group fulfilling neither criteria set (P = 0.35 and P = 0.2, respectively).
Functional salivary-gland assessment (AECG set only) Salivary flow was decreased in 66.7% of patients fulfilling both criteria sets and in 29.4% of those fulfilling neither criteria set (P = 0.001). No patient fulfilling only the ACR set had a decrease in salivary flow.

Serological criterion
Only three patients had positive RF plus ANA ≥ 1:320 but negative anti-SSA antibodies; that is, met the ACR serological criterion but not the AECG serological criterion. These three patients fulfilled the AECG criteria; they also fulfilled the ACR criteria even without taking RF and ANA into account, since they all had abnormal OSS and focus score results.

Ophthalmological criterion
Agreement between the OSS (ACR set) and Schirmer's test (AECG set) was very low (κ = 0.14). Agreement with the salivary gland biopsy was lower for the OSS than for Schirmer's test (κ = 0.14 vs. 0.35, respectively). Both the OSS and Schirmer's test showed poor agreement with anti-SSA/SSB positivity (κ = 0.21 and κ = 0.27 respectively).

Discussion
In this study, agreement was only moderate between the AECG and ACR criteria sets. The AECG criteria set classified more patients as having SS, whereas the ACR criteria set seemed to classify patients earlier in the course of the disease but included patients who did not have SS in the opinion of the physician. These discrepancies were chiefly ascribable to the absence in the ACR set of functional salivary gland testing (such as salivary flow measurement) and, above all, to the intrinsic differences between the OSS and Schirmer's test. These two ocular dryness tests had very low agreement. According to the latent class analysis method used to develop the ACR classification criteria [2], the OSS had 89.7% sensitivity but only 37.8% specificity for SS, whereas Schirmer's test had a lower sensitivity of 42.7% but a better specificity of 75.1%. In our study, in patients fulfilling neither criteria set (and who were therefore unlikely to be diagnosed with SS), an abnormal OSS was more common than an abnormal Schirmer's test, suggesting lower specificity of the OSS. To compare the diagnostic usefulness of these two tests, we did not use the physician's diagnosis as the reference, since this diagnosis relied chiefly on Schirmer's test and not on the OSS. Instead, we evaluated the agreement of the ocular dryness tests with the focus score and anti-SSA/SSB positivity, two major diagnostic features of SS [6] that served as external validation criteria. Compared with the OSS, Schirmer's test showed slightly better agreement with the focus score.
In the recently published study by Rasmussen and colleagues, the discordance between AECG and ACR criteria was also mostly attributed to the differences between the tests assessing the objective ocular component [7]. In their study, OSS displayed a poor specificity for SS (45 to 51%), which could be partially corrected by increasing its positivity cutoff value from ≥ 3 to ≥ 4/12.
A complete ophthalmological evaluation is mandatory in patients with suspected SS, in particular to assess eyelid diseases and the differential diagnoses of keratoconjunctivitis sicca. However, classification tools should rely on the most specific items. Advantages of Schirmer's test include ease of use, even at the bedside in any clinical ward where trained staff members are available, and good performance as a screening tool for SS [8].
The serological item of the AECG criteria could probably be improved, since roughly 40% of primary SS patients do not have anti-SSA/SSB antibodies. However, we have shown here that the adjunction of ANA and RF, as proposed in the ACR criteria, did not modify the classification potential of anti-SSA/SSB alone. Other tests should be evaluated when new classification criteria will be developed, such as blood B-cell phenotyping [9] or other autoantibodies [10].
Another intrinsic difference between the AECG criteria and the ACR criteria is the absence of items assessing the subjective component of the disease in the latter. Indeed, the ACR criteria do not target the general population but only patients with suspected SS, who complain most of the time of sicca symptoms as in our study. Conversely, the AECG criteria may be applied to any patient thanks to the inclusion of the symptoms in items 1 and 2 (see Table 1). However, the preliminary European criteria and then the AECG criteria were not designed to be used in the general population, since the control patients enrolled in their development study 'were to be selected from those subjects referred to an SS expert because of ocular or oral signs and symptoms that simulated the clinical manifestations of SS, and for whom a complete evaluation was justified in order to establish a differential diagnosis' [11]. The precise diagnostic value of several questionnaires assessing ocular and oral symptoms has been carefully evaluated in these patients with suspected SS [12]. Such questionnaires may be valid tools for SS screening in the general population [8], but in our study the subjective symptoms did not participate in the discrepancy between AECG and ACR criteria, since nearly all patients had sicca complaints.
An earlier study compared the AECG criteria, the ACR criteria and the Japanese classification criteria sets for SS [13]. Taking the physician's diagnosis as the reference standard, the AECG and ACR criteria sets had 78.6% and 77.5% sensitivity, respectively, and 90.4% and 83.5% specificity.
However, a direct comparison of the AECG and ACR criteria sets in terms of sensitivity and specificity, taking the physician's diagnosis as the reference standard, may be inherently biased, as physicians probably rely heavily on the currently used AECG criteria set to diagnose SS. The resulting circular reasoning may overestimate the metrological features of the AECG criteria set.
Classification criteria for systemic diseases such as SS, rheumatoid arthritis, systemic lupus erythematosus, systemic sclerosis and inflammatory myopathies are designed to improve the homogeneity of populations enrolled in clinical studies, in order to allow valid comparisons across studies [14][15][16][17]. Since no specific reference standard is available for diagnosing these complex diseases, classification criteria are often used for diagnostic purposes, despite their limitations [18][19][20].
Whether classification criteria should have a higher sensitivity or specificity is a matter of debate. Foremost, classification criteria might be used to recruit patients in epidemiological studies; for example, to describe the whole spectrum of a disease including the mildest forms, and to define prognostic factors. For such a study, the classification criteria sensitivity should be high, even to the detriment of their specificity, since it would not be dangerous to include patients for whom the diagnosis is not definitely certain. Conversely, in therapeutic trials using potentially harmful drugs such as immunosuppressant or biological therapies, one could not take the risk of including patients who do not have the disease. In that case, the classification criteria have to be the most specific possible. All published classification criteria for SS, including preliminary European and AECG criteria, have a very high specificity (95 to 100%) but variable sensitivity (60 to 95%) [21]. The scientific community must achieve the widest consensus for a new classification system, which would display the best combination of sensitivity and specificity in order to be used universally in both epidemiological and therapeutic studies. To achieve this goal, a large international study is warranted, and the new classification criteria should include new diagnostic tools that were validated recently, such as salivary gland ultrasonography [4,[22][23][24].

Conclusions
In this study, agreement was only moderate between the AECG and ACR criteria sets. This discrepancy was mainly ascribable to the intrinsic differences between the tests assessing the ocular component of the criteria. Our results suggest that the ACR criteria may detect early forms of disease affecting specific SS subpopulations such as those with negative anti-SSA/SSB autoantibodies. On the other hand, the AECG criteria seem definitely more specific but also more stringent. The existence of two different classification criteria sets for SS that select different patient populations may cause confusion [25,26]. An international study under the auspices of both the ACR and the European League Against Rheumatism is warranted to develop new universally recognized classification criteria, which would probably include new procedures such as major salivary gland ultrasonography.