Machine learning model for identifying important clinical features for predicting remission in patients with rheumatoid arthritis treated with biologics

Background We developed a model to predict remissions in patients treated with biologic disease-modifying anti-rheumatic drugs (bDMARDs) and to identify important clinical features associated with remission using explainable artificial intelligence (XAI). Methods We gathered the follow-up data of 1204 patients treated with bDMARDs (etanercept, adalimumab, golimumab, infliximab, abatacept, and tocilizumab) from the Korean College of Rheumatology Biologics and Targeted Therapy Registry. Remission was predicted at 1-year follow-up using baseline clinical data obtained at the time of enrollment. Machine learning methods (e.g., lasso, ridge, support vector machine, random forest, and XGBoost) were used for the predictions. The Shapley additive explanation (SHAP) value was used for interpretability of the predictions. Results The ranges for accuracy and area under the receiver operating characteristic of the newly developed machine learning model for predicting remission were 52.8–72.9% and 0.511–0.694, respectively. The Shapley plot in XAI showed that the impacts of the variables on predicting remission differed for each bDMARD. The most important features were age for adalimumab, rheumatoid factor for etanercept, erythrocyte sedimentation rate for infliximab and golimumab, disease duration for abatacept, and C-reactive protein for tocilizumab, with mean SHAP values of − 0.250, − 0.234, − 0.514, − 0.227, − 0.804, and 0.135, respectively. Conclusions Our proposed machine learning model successfully identified clinical features that were predictive of remission in each of the bDMARDs. This approach may be useful for improving treatment outcomes by identifying clinical information related to remissions in patients with rheumatoid arthritis. Supplementary Information The online version contains supplementary material available at 10.1186/s13075-021-02567-y.


Introduction
Rheumatoid arthritis (RA) is a chronic inflammatory disease that affects the synovial tissues in multiple joints. Biologics are often considered a promising line of treatment for patients who have high disease activities despite treatment with conventional disease-modifying antirheumatic drugs (cDMARDs) [1][2][3]. Biologics are generally prescribed by referring to clinical practice recommendations after considering factors such as disease activity, adverse events, and cost effectiveness [1][2][3]. However, treatment with biologics may be unsuccessful because of differences in the physiological and pathological characteristics among individuals. Indeed, clinical trials have frequently shown that approximately 30-40% of patients do not respond to treatment with biologics, and their response rates decrease with subsequent biologics [4,5]. Treatment failure due to ineffective biologics not only increases the pain experienced by the individual but also increases their cost of healthcare [6]. Therefore, it is necessary to develop good predictors that can identify the efficacies of different biologics for such individuals.
Several clinical, genetic, and proteomic studies have used statistical methods to identify biomarkers to predict responses to biologics in patients with RA [7][8][9]. Machine learning approaches that can complement statistical methods are able to incorporate such information for making accurate predictions. Furthermore, machine learning can be generalized across a broader array of data types and can produce results with complex situations as well [10,11]. However, it is often difficult for users to understand the processes by which machine learning predicts outcomes from relationships among numerous variables. Accordingly, several methods have been proposed to improve the interpretability of machine learning methods while maintaining the prediction accuracies of complex models. Explainable artificial intelligence (XAI), which presents the reasons for a prediction in a manner that can be understood, suggests the relationships between several variables necessary for predicting outcomes [12].
In the field of artificial intelligence, XAI was recently developed to help our understanding of the important features that are related to predicting the outcomes of machine learning models. In the study of RA with various clinical characteristics, it is possible to use the machine learning method to predict outcomes such as remission and show important clinical features that are associated with the desired outcomes. Therefore, predicting remission with a machine learning model by combining multiple variables in RA cohort data and identifying important features associated with remission by using XAI is an advanced approach that complements the traditional statistical methods for determining the relationship between remission and various variables.
In this study, we established a machine learning model using data from the Korean College of Rheumatology Biologics and Targeted Therapy Registry (KOBIO) [13] and show its application to identifying clinical variables predictive of remission in RA patients treated with biologics using the concept of XAI.

Study population
This study used data from the KOBIO registry, which is a nationwide multicenter cohort in Korea that was established to evaluate the effectiveness and side effects of biologic therapies in patients with RA [13]. Patients in the registry were recruited from 38 hospitals since 2012, and their demographics, medications, comorbidities, extra-articular manifestations, disease activities, radiographic findings, and laboratory findings performed within 4 weeks prior to the patient's visit were recorded with the date. The data from patients who were followed up annually were recorded on the KOBIO website (http://www.kobio.or.kr/kobio/), and these patients provided informed consent prior to registration. Ethical approval of the KOBIO-RA was obtained from the institutional review boards of all 38 participating institutions, including the Institutional Review Board of Inje University Seoul Paik Hospital (PAIK 2018-11-005). Figure 1 shows the flowchart for patient selection. From December 2012 to June 2019, a total of 2122 patients who were treated with biologic disease-modifying antirheumatic drugs (bDMARDs) and targeted synthetic disease-modifying anti-rheumatic drugs (tsDMARDs) were registered. The baseline data were obtained at the time of initial enrollment for prescription of the bDMARDs, and follow-up data were registered annually or when the bDMARDs were switched or stopped. Patients treated with tsDMARDs, such as tofacitinib (N = 33), were excluded from the analysis because the aim of the study was to predict patients who responded to bDMARDs. Among the bDMARDs, rituximab was excluded from the analysis owing to the small sample group (N = 2). After excluding data from subjects whose follow-up durations were less than 3 months from the baseline, a total of 1204 baseline data and 1397 followup data were obtained and used for the analysis in this study.

Machine learning methodology for predicting remission
The disease activity scores in 28 joints using the erythrocyte sedimentation rate (DAS28-ESR) were measured at baseline establishment and during follow-up. The outcome of the prediction model evaluating the responses to bDMARDs was "remission" at follow-up, which was defined as DAS28-ESR ≤ 2.6 at follow-up. To prevent overestimation of remission by prednisolone treatment, another prediction model was constructed with the outcome of "remission without increasing prednisolone dose". An overview of the study flow is presented in Fig. 2.
Five machine learning models were used to predict remission in subjects receiving bDMARDs, tumor necrosis factor (TNF) inhibitors, non-TNF inhibitors, and each bDMARD; the models included lasso and ridge based on linear relationships [14], support vector machine using kernel methods [15], tree-based random forest [16], and Xgboost [17]. To avoid overfitting problems, the training and test sets were divided in a 7:3 ratio, and the models were trained with the training set; then, the prediction results were verified using the test set. For the training dataset, a 5-fold cross validation was performed to tune the hyperparameters determined as outside models (Additional file: Table S1 and Table  S2). In this procedure, a grid search was conducted to evaluate all possible combinations of the hyperparameters. The grid search found optimal hyperparameters with the objective function of determining the area under the receiver operating characteristics (AUROC) in each model. Bootstrapping (random sampling with replacement) was also performed to obtain a median value for the AUROC curve and to determine the accuracy for reducing measurement variances caused by small samples when dividing between the training and test sets.
Dimension reduction was performed to avoid the "curse of dimensionality" caused by a large number of variables compared with the size of the data. Among the 64 variables, we selected variables that are frequently encountered in clinical practice for prescription of biologics and excluded variables that are not referenced when prescribing biologics. As a result, 15 variables known to be of clinical importance were preselected (i.e., sex, age, baseline DAS28-ESR, methotrexate dose, steroid dose, erythrocyte sedimentation rate [ESR], Creactive protein [CRP], rheumatoid factor [RF], anticyclic citrullinated peptide antibody [ACPA], antinuclear antibody [ANA], and five comorbidities). Subsequently, 20 variables that were highly correlated with the drug response (remission) of each bDMARD were selected. After selecting variables based on data, we created a prediction model by training with a fixed set of 35 variables.  Missing data for variables (Additional file: Table S3) were replaced with the median value for each variable. With a similar logic, binary variables such as comorbidities were coded as 1 if "yes" and 0 if "no" or "no test" because "no" was the most common value.

XAI for identifying important clinical features associated with responses to bDMARDs
Using the Xgboost machine learning model, the important variables and their impact on predicting remissions using XAI were determined. By focusing on improving the performance and accuracy, machine learning models have become complex, and their interpretability has declined. Although there are some feature importance measures in random forest and Xgboost, these models provide inconsistent results depending on the tree structure; in addition, they only show the overall importance and not the direction of the effect of the independent variables [12]. To overcome these issues, the Shapley additive explanations (SHAP) method was developed [18], which approximates a complex model to a linear model and interprets the feature importance in the linear model to demonstrate the amount by which a given feature changes the prediction. In addition, XAI provides a Shapley plot that can be easily explained visually and easily to understand the complex relationship between variables and outcomes compared to random forest. This methodology satisfies three conditions: (1) the approximated linear model has a similar accuracy to the original model in the local domain, (2) meaningless variables have no impact on the explanatory power of the model, and (3) feature importance is consistent in the model structure. Accordingly, the SHAP method demonstrates consistent feature importance regardless of model structure and direction of effect of the predictive variables, thereby allowing clinicians to acquire insights into achieving remission and to find potential variables affecting the selection of appropriate bDMARDs.

Statistical analysis
All data are shown as mean (standard deviation [SD]) or percentage values. To evaluate the machine learning performance, the accuracy and AUROC curve were analyzed. The no information rate, which is the largest proportion of the observed classes, was used as a baseline to determine the overall distribution of the classification and to compare with those of the machine learning models. Statistical analyses were performed using R software version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria), and model training was performed using the caret package and SHAPforxgboost package in R.

Results
Clinical characteristics of the patients Table 1 shows the clinical characteristics of the 1204 patients included in the study. The mean (SD) age at baseline was 54.0 (12.8) years and the majority (82.6%) of the patients were female. The mean (SD) disease duration was 7.1 (7.2) years. The rheumatoid factor (RF) and ACPA positivity were 83.2% and 73.4%, respectively, and the mean (SD) DAS28-ESR values at baseline and follow-up were 5.6 (1.0) and 4.3 (1.3), respectively. The mean (SD) duration from the initiation of biologics to the next visit was 0.97 (0.31) years. Of the 1397 followup data, 546 reached remission that is not more than 2.6 of DAS28-ESR; in those follow-up data, the mean (SD) number of follow-up visits and the duration of follow-up  Glucocorticoid dose (e.g., prednisolone, methylprednisolone, deflazacort, and dexamethasone) was converted to prednisolone doses cDMARDs conventional disease-modifying anti-rheumatic drugs, DAS28-ESR disease activity score in 28 joints using erythrocyte sedimentation rate, HBsAg hepatitis B surface antigen, HBsAb hepatitis B surface antibody, HBcAb hepatitis B core antibody, HCV hepatitis C virus until reaching remission was 2.2 times (0.59) and 1.1 years (0.35), respectively.

Prediction of remission from the five machine learning methods
In all machine learning methods for predicting remission, the accuracy and AUROC curve values were in the ranges of 52.8-72.9% and 0.512-0.694, respectively (

Important features for remission in XAI
The SHAP method for remission was used to determine the influences of the variables that contributed to remission in the prediction model. The interpretations of the feature importance with the Shapley plot are shown in Fig. 3 and Additional file: Table S4, Table 3 shows the order of the average of the SHAP value ranks from the Shapley plot in the bDMARDs. The degree and direction of contribution of the variables to remission were different for each bDMARD. Among the variables, the ESR had the highest average ranking (4.0) and had a negative association with remission for all bDMARDs. Hemoglobin had the second-highest average ranking (5.3) and had a positive association with remission for all bDMARDs except for golimumab (no association). The age and DAS28-ESR factors had the third and fourth highest average rankings (6.67 and 7.83, respectively) and had negative associations with remissions for adalimumab, etanercept, infliximab, golimumab, and tocilizumab. For abatacept, DAS28-ESR was positively associated with remission, and age had a nonlinear relationship with remission.

Rankings of important features in bDMARDs
The relationship between RF levels and remission was different for each bDMARD. RF was positively associated with remission in adalimumab, but negatively associated with remission in etanercept and tocilizumab. RF had a nonlinear relationship with remission in infliximab and abatacept, and no association with remission in golimumab. In terms of ACPA, low ACPA was associated with remission in all bDMARDs (negative association) except tocilizumab (positive association). The elevation of CRP was associated with remission in bDMARDs (positive association) except for adalimumab and golimumab (no association). In most bDMARDs, disease duration, platelet, cholesterol, and triglyceride showed negative associations with remission. ANA had a negative association with remission for infliximab and abatacept.

Predictive model for remission without increasing prednisolone dose
We built a machine learning model for remission without increasing prednisolone dose as another outcome (Additional file: Table S5). Among the 1397 follow-ups, 537 were classified as remission without increasing prednisolone dose. The ranges of accuracy and AUROC were 54.1-72.9% and 0.517-0.698, respectively, which were similar to the results for predicting remission. In the Shapley plots, compared with the outcomes for predicting remission, the SHAP scores had slight fluctuations while the order of the variables was similar (Additional file: Figure S1 and Table S6). In the ranking of important features, there was a slight change due to the difference in the SHAP value of each variable (Additional file: Table S7). However, important features such as ESR, DAS28-ESR, CRP, age, hemoglobin, RF, and ACPA retained high rankings for predicting remission.

Discussion
In the prediction models of remission with all machine learning methods, the ranges of accuracy and AUROC were 52.8-72.9% and 0.512-0.694, respectively. Notably, based on the machine learning models for predicting remission, we identified important clinical features that were associated with remission after treatment with bDMARDs. Although it is possible to estimate the importance of each variable by linear regression analysis, we determined that machine learning would complement statistical methods due to the complex relationships between the variables and remission. Some significant clinical features for remission prediction were commonly identified in the bDMARDs, albeit with slight differences in the impact between the bDMARDs. Guan et al. used machine learning to predict the responses to TNF inhibitors in patients with RA using clinical and genetic markers [10]; the authors created a Gaussian process regression model to predict the changes in DAS28 and classified the patients into responders and non-responders. However, Guan et al. used variables that are difficult to obtain in routine practice such as genetic variables, and their outcomes were different from our study. Norgeot et al. used a longitudinal deep learning model to predict controlled (remission or low activity) or uncontrolled state (moderate or high activity) with clinical disease activity index in the next clinical visit [11]; their study showed good performance, but it was not aimed at predicting the therapeutic response of the biologics, but rather predicting the clinical disease activity index at the next visit. Because the goal of our study was to predict the response of biologics, it is difficult to compare our prediction of remission with their outcome.
As opposed to how machine learning models, such as those using deep learning are "black-box" models to explain the reason for prediction, the XAI machine learning method provides reasons for prediction in a manner that lay users can easily understand. For interpretability, the SHAP method, which is based on the Shapley values, was recently presented [18]. The Shapley value employs a method based on game theory, which was introduced to suggest how to fairly distribute payout among the features [19]. This theory has influenced various fields and has recently been used in machine learning to improve the interpretability of complex models. The SHAP method is a unified framework that improves interpretability while maintaining predictability of complex models with machine learning. Using the SHAP method, we identified the characteristics of the important variables that contribute to prediction of remission in a complex cohort dataset. In the use of healthcare data, it is important to discern the relationships between the variables in clinical, genomic, and other types of healthcare data. This proposed method is expected to provide insights for finding the relationships among numerous variables in the integration of large-scale healthcare data.
Among various baseline variables, several laboratory findings were identified as important features that ranked highly for predicting remission. For all the bDMARDs, the most important feature was the ESR, which is a variable predictive of disease activity [20,21]. Hemoglobin was the second most important feature in our study; considering how anemia is associated with disease activity [22] and erosion progression [23], hemoglobin may be an important factor indicating disease states in patients with RA. The RF and ACPA were also important for prediction of remission and contributed to determination of the direction of treatment in A nonlinear relationship such as quadratic effect or mixed effect between drugs and variables. The average ranking was obtained by averaging the rankings of the 6 bDMARDs ESR erythrocyte sedimentation rate, DAS28-ESR disease activity scores in 28 joints using the erythrocyte sedimentation rate, CRP C-reactive protein, ALT alanine aminotransferase, BUN blood urea nitrogen, ANA anti-nuclear antibody patients with RA [24,25]. However, several studies have shown conflicting results on the relationship between RF and ACPA in response to TNF inhibitors [26,27]. In our study, the RF and ACPA were associated with remission in various degrees of impact and direction for each bDMARD. Moreover, RF was not linearly related to remission for infliximab and abatacept; as such, changes in the feature value were not significantly related to either remission or remission failure. This study has some obvious limitations. First, the dosage intervals and doses of biologics were not considered. Second, this study did not distinguish between the primary response failure (i.e., failure of clinical improvement) and secondary response failure (i.e., loss of response after clinical improvement) [27]. Third, this study did not provide evidence as to how the important features of each bDMARD were related to their mechanisms of action. Fourth, because this study focused on the important features of the variables needed for prediction in the machine learning model, the performance of machine learning was not confirmed using external test sets from other RA cohort data. Fifth, because this study included patients treated with first bDMARDs and excluded those who had been previously treated with bDMARDs, such patient characteristics should be considered in the interpretation of the Shapley plot. Sixth, the prediction of response to biologics is difficult because it can be affected by individual variances. In addition, it is difficult to enhance the predictive power because there is a limit in the amount of variables that can be obtained in routine clinical practice. Importantly, the problem of missing values in real-world data is one of the limitations of this study. Lastly, although the KOBIO registry is an inception cohort of consecutive patients treated with biologics, those who failed to meet the inclusion criteria or did not consent to participate in the registry were not included, which may lead to a selection bias.

Conclusion
We successfully developed machine learning models to predict remission as a response to different bDMARDs in active RA patients based on their clinical profiles. Using these models, the XAI was able to identify important clinical features associated with remission according to the biologics used. We noted that some important features were more strongly associated with remission, albeit the order of their relative importance was different for each biologic. Our results suggest that an advanced machine learning approach may be helpful for supporting clinical decisions to improve treatment outcomes with biologics in RA patients.
Additional contributions The registry was funded by the Korean College of Rheumatology, which had no involvement in the study design, collection, analysis, and interpretation of the data; in writing the manuscript; or in the decision to submit the manuscript for publication. We would like to thank all members of The Korean College of Rheumatology Biology (KOBIO) registry. We thank Dr. Joon Seo Lim from the Scientific Publications Team at Asan Medical Center for his editorial assistance with preparing this manuscript.
Additional file 1: Table S1. Hyperparameters for bDMARDs. Table S2. Hyperparameters for prediction of remission without increasing prednisolone dose. Table S3. Proportions of missing data in each variable. Table S4. SHAP values for the variables in Shapely plots. Table  S5. Prediction of remission without increasing prednisolone dose for bDMARD. Table S6. SHAP values for variables in Shapely plots. Table S7. Order of important features for prediction of remission without increasing prednisolone dose in each bDMARD. Figure S1. Shapley plots predicted.
Authors' contributions JSO had full access to all of the data used in the study and takes responsibility for the integrity of the data, study supervision, and accuracy of its analysis. BSK, SE, and JSO contributed to the study conception and design. All authors performed the acquisition, analysis, and interpretation of data. BSK, SE, and JSO wrote the manuscript. BSK, SE, and JSO were involved in the statistical modelling. JSO obtained funding. KS, HY, CH, and DHK contributed to the administrative, technical, or material support. YGK, CKL, BY, and JSO contributed to the supervision. All authors took part in revising the manuscript critically for important intellectual content, and gave their final approval of the version to be published, and agreed to be accountable for all aspects of the work.