DAS28 best reflects the physician's clinical judgment of response to infliximab therapy in rheumatoid arthritis patients: validation of the DAS28 score in patients under infliximab treatment

This study is based on an expanded access program in which 511 patients suffering from active refractory rheumatoid arthritis (RA) were treated with intravenous infusions of infliximab (3 mg/kg+methotrexate (MTX)) at weeks 0, 2, 6 and every 8 weeks thereafter. At week 22, 474 patients were still in follow-up, of whom 102 (21.5%), who were not optimally responding to treatment, received a dose increase from week 30 onward. We aimed to build a model to discriminate the decision to give a dose increase. This decision was based on the treating rheumatologist's clinical judgment and therefore can be considered as a clinical measure of insufficient response. Different single and composite measures at weeks 0, 6, 14 and 22, and their differences over time were taken into account for the model building. Ranking of the continuous variables based on areas under the curve of receiver-operating characteristic (ROC) curve analysis, displayed the momentary DAS28 (Disease Activity Score including a 28-joint count) as the most important discriminating variable. Subsequently, we proved that the response scores and the changes over time were less important than the momentary evaluations to discriminate the physician's decision. The final model we thus obtained was a model with only slightly better discriminative characteristics than the DAS28. Finally, we fitted a discriminant function using the single variables of the DAS28. This displayed similar scores and coefficients as the DAS28. In conclusion, we evaluated different variables and models to discriminate the treating rheumatologist's decision to increase the dose of infliximab (+MTX), which indicates an insufficient response to infliximab at 3 mg/kg in patients with RA. We proved that the momentary DAS28 score correlates best with this decision and demonstrated the robustness of the score and the coefficients of the DAS28 in a cohort of RA patients under infliximab therapy.


Introduction
Rheumatoid arthritis (RA) is a complex disease with a broad spectrum of manifestations that requires an early intensive therapy in order to avoid joint destruction and physical disability. In order to measure the effect of therapy in daily practice and in clinical trials, many variables are recorded and different composite indices have been proposed to measure the remaining disease activity or the response to treatment. Those variables may cover items such as patient self-reported questionnaires, physician's scores including different joint scores, and serum markers of systemic inflammation.
Infliximab, in combination with methotrexate (MTX), is a highly effective therapy for a majority of RA patients. After an induction scheme at weeks 0, 2 and 6, the indicated dose of this therapy is 3 mg/kg every 8 weeks, although the ATTRACT trial suggested that a higher dose of 10 mg/kg every 8 weeks or a shorter perfusion interval may add benefit [1][2][3].
The present study is based on an expanded-access program in which patients suffering from active refractory RA were treated with intravenous infusions of infliximab (3 mg/kg + MTX) at weeks 0, 2, 6 and every 8 weeks thereafter. At week 22, patients not optimally responding to treatment could receive a dose increase of 100 mg (1 vial) per infusion from week 30 onwards [4]. The effect of dose escalation for the patients of this cohort has been discussed previously [4]. The decision to increase the dose was based on the treating rheumatologist's clinical judgment and can be considered as a measure of insufficient response to infliximab. It might be questioned which variables can be measured to best evaluate the effect of therapy and remaining disease activity in daily practice (and in clinical trials). The aim of the present analyses was to evaluate whether the decision to increase the dose could be reflected by using single variables or composite indices, alone or together in a model. We also wanted to evaluate whether this decision was mainly based on differences over time or on momentary disease activity.

Study population
A total of 511 patients, suffering from active refractory RA [5], were treated with intravenous infusions of infliximab (3 mg/kg) at weeks 0, 2, 6 and every 8 weeks thereafter in combination with MTX (a minimal dose of 15 mg/kg was recommended). Between week 0 and week 22, 37 patients dropped out for the following reasons: 16 patients stopped due to side effects (four infusion reactions, five infections, one malignancy, one pancytopenia, five disease-related complications), 12 patients stopped for withdrawal of consent and 9 patients stopped for protocol violation. Of the remaining 474 patients, 102 (22%) patients, who were not optimally responding to treatment according to the treating rheumatologist's opinion, received a dose increase of 100 mg (1 vial) per infusion from week 30 on. Throughout the first 22 weeks, dosage of MTX, steroids and non-steroidal anti-inflammatory drugs remained unchanged.

Statistics
We opted to use only statistical methods that are available in a classical statistical package (SPSS 12.0; SPSS, Inc, Chicago, IL, USA) or could be computed manually. When needed, the continuous variables were normalized (by taking the square root of the joint counts and the natural logarithm of CRP and ESR). Robustness of the discriminant analyses and logistic regressions was confirmed by the use of a random train and test set. Missing values were handled by pairwise complete case analysis. This means that a case with no missing values for a group of variables is included in the analysis of that group of variables. The case may have missing values for variables used in other analyses. Confidence intervals (95% CI) for sensitivity or specificity were calculated based on the method proposed by Harper [14]. The areas under the curves (AUCs) of receiver operating characteristic (ROC) curves were calculated. A higher AUC indicates that a single variable has better discriminative characteristics. A statistical test to compare AUCs of two variables tested on the same population has been described by Hanley [15]. Continuous and categorical variables were compared by adapting the cut-off of the continuous variables to the same specificity level as the categorical variable so that sensitivities could be evaluated and compared [16]. The selection and comparison of variables by curve analysis was performed since this method gives a valid ranking of variables and does not (in contrast to ranking methods based on p values) depend on the number of subjects available for that specific variable [17]. In order to find the true maximal model and to avoid sticking at a local maximal model, we used different strategies for the construction of the final model: binary logistic regressions and discriminant analyses were performed with the default options of SPSS 12.0 and stepwise construction of models was performed by conditional forward and backward elimination for logistic regression and by Wilk's lambda for discriminant analysis using the strategy described by Hosmer and Lemeshow [18].

Ethics
All patients signed informed consent. This study was approved by the local ethics committees.

Ranking of continuous variables
In order to select the most important variables that correlate with the decision to give a dose increase at week 22, we calculated the AUC of ROC curve analysis for all continuous variables and ranked them based on this AUC [17]. Since crossing over of ROC curves may affect the diagnostic properties of a variable without changing the AUC, we also ranked the variables based on sensitivity levels by adapting the cut-off to a given preset specificity level of 95% [16].
Both ranking methods displayed that the DAS28 score at week 22 had the highest ability to discriminate the physician's decision to give a dose increase. Table 2 displays the 10 most important variables ranked by AUC of ROC curve analysis and by the sensitivity at the 95% specificity level. Using the method described by Hanley [15], we found that there was a significant difference in AUC between the two first ranked parameters: DAS28 at week 22 and the 28 tender joint count at week 22 (AUC = 0.840 versus 0.797, p = 0.02). Additionally, most variables were ranked in such a way that each variable was represented first by its measure at week 22 before it was represented by a measure at another week.

Evaluation of the response scores
To evaluate categorical scores, we adapted the cut-off of the variable with the highest ranking (DAS28 at week 22) to the specificity of the categorical score and compared the sensitivities [16]. For the decision to give a dose increase, ACR response not reaching the ACR20 criterion ('no ACR response') had a sensitivity of 69.6% (95% CI: 65.  (Table 3).
Additionally, we fitted a logistic regression model with the decision to give a dose increase as a dependent variable and DAS28 at week 22, DAS28 response and ACR response as categorical covariates. These analyses retained DAS28 at week 22 as the only significant covariate in the model (data not shown).

Effects of change of scores over time on the physician's decision
To evaluate the effect of differences over time, we plotted the means of the most important normalized continuous variables over time (Fig. 1). The plot of the variable with the highest ranking (DAS28) shows that patients who get a dose increase have a (significantly) higher disease activity at baseline and, after an initial decrease of disease activity, regain disease activity from week 6 on.   DAS, disease activity score.

Figure 1
Plot of the mean scores over time Plot of the mean scores over time.

Evaluation of the discriminant score of the variables of DAS28
To validate the score and coefficients of the DAS28, we calculated a discriminant function using the (normalized) variables of the DAS28 score: 28 tender and swollen joint count, ESR and patient global VAS. After rescaling, we obtained the following discriminant coefficients: 0.52 for 28 tender joint count (28TJC), 0.28 for 28 swollen joint count (28SJC), 0.56 for ESR and 0.025 for patient disease activity. This discriminant score had an AUC of 0.844 (0.797-0.891) and a sensitivity at the 95% specificity level of 43.8% (95% CI: 38.1-49.2), which is equal to the DAS28 at week 22. The Pearson's correlation coefficient between this discriminant score and the DAS28 was 0.986 (Fig. 2). We also performed logistic regression with similar results (data not shown).

Comparison with the other DAS scores and SDAI/CDAI
Since different alternative methods are available to calculate the DAS scores (Table 1), we additionally evaluated the properties of those alternative scores. We also evaluated the SDAI and CDAI [8,9], after normalization, by taking the squared root. The Pearson's correlation coefficient of those alternative scores with the DAS28 at week 22 was 0.982 for the DAS28-3, 0.952 for the DAS28-CRP, 0.928 for the DAS28-CRP-3, 0.914 for the SDAI and 0.893 for the CDAI. The AUC and sensitivity at the 95% specificity level are shown in Table 1 and indicate that all those alternative scores perform similarly or slightly worse than the original DAS28.

Detailed ROC curve analysis of the DAS28
We plotted the ROC curve of the DAS28 in Fig. 3 and listed sensitivities and specificities in Table 4. Also, predictive values and the accuracies of classification in function of the different DAS28 cut-offs are shown in Table 4. Beneath a cut-off of 3.2, we found a high predictive value for continuing the current dose as a measure of good response. The maximal accuracy of 84% could be found at a cut-off of 5.5.

Discussion
The aim of the present analyses was to evaluate which single or composite variables, combined in a model, could discriminate the treating rheumatologist's decision to give a dose increase of infliximab to RA patients not optimally responding to an indicated dose of 3 mg infliximab every 8 weeks. Since different variables on different time points were available, we started to rank the continuous variables based on the AUC of ROC curves and sensitivities at the 95% specificity level. This strategy has previously been proposed for microarray data [17]. The calculation of sensitivities at the 95% specificity level is important in order not to overlook some variables with a relative small AUC but with a high specificity [16]. So, both methods ranked the DAS28 at week 22 as the variable which best discriminates the decision to give a dose increase. In a second and third analysis, we looked at whether response scores and differences in disease activity over time Validation of the DAS28 score and coefficients (see text) Validation of the DAS28 score and coefficients (see text). ESR, erythrocyte sedimentation rate; VAS, visual analogue scale. could give additional information to discriminate the rheumatologist's decision. Those analyses indicated that variables, including differences over time, seem to be less important than the momentary remaining disease activity at week 22, to discriminate the rheumatologist's decision.
After the prior selection of variables, based on the findings of the previous steps, we built the final model to discriminate the rheumatologist's decision, which was only slightly better than the DAS28. We think that the small gain in discriminative properties in comparison with the DAS28 is not enough to accept the increased complexity of this model. Moreover, in contrast to the DAS28, this model included the physician's global assessment of disease activity (VAS), which is investigatordependent and has the draw-back that it cannot be calculated by a study nurse. All four analyses together indicated that the DAS28 is an important variable for evaluating insufficient response to infliximab therapy (especially in daily practice) and that this variable can only slightly be improved by adding supplemental variables.
DAS was developed in the early 1990s [19,20] and later on, it was transformed into the DAS28 [7] in an era when therapy with biologicals was not yet available. In those initial studies, patients were scored by the same two independent nurses and the decision to change disease-modifying antirheumatic drug (DMARD) therapy during a follow-up period of up to 3 years was considered as a measure of insufficient response [20]. The present study is a multi-center study where patients were scored by the treating physician and the decision to give a dose increase of infliximab could happen only at one time point. This difference in study design and therapy may explain why in the present study the AUC of DAS28 is smaller than in other studies (AUC = 0.840 versus 0.933) [21]. Therefore, it is remarkable that despite those differences in study design, we could calculate a discriminant function (in the fifth analysis) that correlated so well with the DAS28 by using the 28SJC, 28TJC, ESR and patient disease activity VAS as independent variables and the physician's decision as a grouping variable. Not only the discriminant scores, but also the coefficients of this discriminant function were quite similar to the coefficients of the DAS28, indicating the robustness of the scores and coefficients of the DAS28 score.
In another, final analysis, we evaluated the alternative DAS scores and the squared root transformed SDAI and CDAI. All those alternative scores have a slightly worse AUC than the original DAS28, but seem good enough to be useful when some other variables are not available. We think the use of the DAS28 is feasible and time-effective using a preprogrammed DAS, disease activity score; PPV, positive predictive value (predictive value to give a dose increase as a measure of insufficient response); NPV, negative predictive value (predictive value to continue on the current dose as a measure of good response); PPV, NPV and accuracy were calculated using the following formulae: c) Accuracy = sensitivity* a_priori_chance + specificity* (1-a_priori_chance) The a priori chance is given by the percentage of patients that need a dose increase as a measure of insufficient response.
a) * _ _ * _ _ PPV sensitivity a priori chance sensitivity a priori chan = c ce specificity a priori chance r ri chance sensitivity a priori chance _ ) ( ) * _ _ ) + − 1 calculator, spreadsheet or web-based calculator [11]. The unique characteristics of the DAS score make it a useful measure in a lot of applications. DAS28 as a continuous variable is a sensitive tool for measuring response to treatment in randomized controlled trials and facilitates the use of more complex statistical methods that can handle repeated measures over different time points [22][23][24].
Other studies demonstrated that a low DAS is an important prognostic factor of persistent remission and that DAS correlates with radiological progression [25,26]. DAS may also be a useful parameter in daily clinical practice as a treatment goal and to evaluate the actual disease activity (which cannot be assessed by the categorical response scores) [27][28][29][30][31]. Our findings that the physician's decision to give a dose increase can best be modeled by a combination of measurements of remaining/momentary disease activity, represented by the DAS28 does not reduce the value of the response scores such as ACR response or DAS response scores. Indeed, those scores are important for measuring differences over time as a measure of global treatment effects in clinical trials [12,13] but, as demonstrated by the present study, are not useful for evaluating the momentary disease activity in a single patient, which is important in daily practice. The continuous properties of the DAS28 score provide the additional opportunity for a cut-off, which can be chosen as a function of the purpose. Interestingly, we found a high predictive value for continuing the current dose as a measure of good response below a cut-off of 3.2. It is noteworthy that a DAS score of 3.2 is an important threshold for a good DAS response according to the EULAR criteria [12]. In contrast, for classification purpose, a higher cut-off (5.5) is more appropriate since this level displayed the highest accuracy. One should be aware that the displayed predictive values and accuracies may be highly influenced by the prevalence of insufficient response, reflected by the need for a dose increase, which was 21.5% in the present study. A lower a priori chance of the need for a dose increase may increase the accuracy of DAS (given the fixed cut-off of 5.5) and vice versa. Indeed, at a cut-off with a high specificity, the accuracy will increase when the a priori chance decreases (applying formula c given in the legend to Table 4).

Conclusion
The results of the present analyses indicate that the momentary DAS28 as a continuous composite index correlates best with the decision to give a dose increase of infliximab, which is a measure of insufficient response. The discriminative characteristics of the DAS could be slightly improved by the use of supplemental variables, although this results in the disadvantage of a more complex model and calculations. This study also demonstrates the robustness of the scores and coefficients of the DAS28 in a cohort of RA patients under infliximab therapy and therefore validates the DAS28 as a measure of disease activity in patients under treatment with biologicals.

Figure 3
ROC curve analysis of the DAS28 at week 22 (plotting the 1-specificity versus the sensitivity) ROC curve analysis of the DAS28 at week 22 (plotting the 1-specificity versus the sensitivity). Also the accuracy, PPV and NPV are plotted. PPV, positive predictive value (predictive value to give a dose increase as a measure of insufficient response); NPV, negative predictive value (predictive value to continue on the current dose as a measure of good response).