Models solely using claims-based administrative data are poor predictors of rheumatoid arthritis disease activity

Background This study developed and validated a claims-based statistical model to predict rheumatoid arthritis (RA) disease activity, measured by the 28-joint count Disease Activity Score (DAS28). Method Veterans enrolled in the Veterans Affairs Rheumatoid Arthritis (VARA) registry with one year of data available for review before being assessed by the DAS28, were studied. Three models were developed based on initial selection of variables for analyses. The first model was based on clinically defined variables, the second leveraged grouping systems for high dimensional data and the third approach prescreened all possible predictors based on a significant bivariate association with the DAS28. The least absolute shrinkage and selection operator (LASSO) with fivefold cross-validation was used for variable selection and model development. Models were also compared for patients with <5 years to those ≥5 years of RA disease. Classification accuracy was examined for remission (DAS28 < 2.6) and for low (2.6–3.1), moderate (3.2–5.1) and high (>5.1) activity. Results There were 1582 Veterans who fulfilled inclusion criteria. The adjusted r-square for the three models tested ranged from 0.221 to 0.223. The models performed slightly better for patients with <5 years of RA disease than for patients with ≥5 years of RA disease. Correct classification of DAS28 categories ranged from 39.9% to 40.5% for the three models. Conclusion The multiple models tested showed weak overall predictive accuracy in measuring DAS28. The models performed poorly at predicting patients with remission and high disease activity. Future research should investigate components of disease activity measures directly from medical records and incorporate additional laboratory and other clinical data.


Background
The estimated prevalence of rheumatoid arthritis (RA) among US adults is approximately 0.6% (1.5 million people age ≥18 years) [1,2]. Administrative claims databases continue to serve as one of the largest sources of data to study RA treatment and patient outcomes. Nevertheless, the utility of these studies is limited as disease activity is not directly captured in claims databases and there is no method to directly measure disease activity in relation to treatment modification and patient outcomes that are captured in claims data. Researchers have attempted to overcome this limitation by developing claims-based indexes of disease severity. Studies have typically used a Delphi approach to identify variables thought to be associated with disease activity and tested correlation between their approach and clinical information obtained from the electronic health record [3] or existing measures of RA disease activity [4]. A recent study by Desai et al. (2015) [5] attempted to validate the Claims-based Index for Rheumatoid Arthritis Severity (CIRAS) developed by Ting et al. [3], who used the Delphi approach, against the C-reactiveprotein-based 28-joint Disease Activity Score (DAS28-CRP) in a relatively small population (n = 315) of patients enrolled in the Brigham Women's Hospital Rheumatoid Arthritis Sequential Study (BRASS) and Medicare [5]. Unfortunately, they found the correlation between CIRAS and DAS28-CRP was poor and attempts to improve the performance by adding additional claims-derived variables also performed poorly (R 2 = 0.23).
These findings highlight the need to develop an algorithm to measure disease activity/severity in observational studies, as claims data are regularly used to evaluate disease progression and response to RA therapies. The ability to measure disease activity is also important in understanding how clinicians make therapeutic decisions, determining whether patients who may benefit from more aggressive treatment are escalating treatment, and providing real-world evidence of adoption of treatment guidelines.
The goal of this study was to develop and validate claims-based statistical models to predict disease activity measured by DAS28 in a larger population of Veteran patients enrolled in the Veterans Affairs Rheumatoid Arthritis (VARA) registry.

Design and data
A cohort study design was used on historical data available in the VARA registry and the Veterans Administration (VA) Informatics and Computing Infrastructure (VINCI), which houses the VA Corporate Data Warehouse (CDW) data available for research. This study used VARA and VINCI data from 1 January 2006 until 31 December 2014 to predict disease activity using data typically available in administrative claims databases. The gold standard clinical DAS28 was measured in Veteran patients enrolled in the VARA registry as part of routine care.
The VARA registry is a prospective, multicenter, observational registry involving 11 VA medical centers (Birmingham, AL; Brooklyn, NY; Dallas, TX; Denver, CO; Jackson, MS; Iowa City, IA; Little Rock, AR; Omaha, NE; Portland, OR; Salt Lake City, UT; and Washington, DC). Clinical disease activity measures (i.e., DAS28 and duration of disease) were obtained from the VARA registry, which has been described elsewhere [6,7].
All administrative claims data were obtained from VINCI [8], which contains both CDW and Managerial Cost Accounting (MCA) system data. VINCI is a research and development environment jointly funded by Health Services Research & Development Service (HSR&D) and the Office of Information and Technology (OIT). VINCI contains rich clinical data in addition to administrative claims data. As the focus of this analysis was on developing a claims-based predictor of the DAS28 we limited our predictor variables to those available in patient tables, inpatient and outpatient procedure, diagnosis and pharmacy data domains.

Study population
Veteran patients had to be enrolled in the VARA registry to be included in the study, and had to be 18+ years of age, have a DAS28, and have at least 365 days of enrollment in the VA health care system prior to the DAS28 measurement. To ensure Veteran patients were actively using the system they were required to have had at least two encounters with the system during the 365-day baseline period. The first encounter with a DAS28 measurement that fulfilled other inclusion and exclusion criteria was eligible as the index date for training and testing the prediction model.
Veteran patients were excluded if they met any of the following criteria because their RA medications and treatment may be modified to treat cancer, transplant or other autoimmune disorders: The index date was defined as the first encounter with a DAS28 after 1 January 2006 and before 31 December 2014 that fulfilled inclusion and exclusion criteria. The baseline period was defined as the 365-day observational period prior to the index date, which is the time we allowed comorbidities to accrue and contribute to the prediction of patient-level DAS.

Study variables Dependent variables
The dependent variable was the continuous DAS28, which was collected at enrollment in the VARA registry and routinely thereafter. The DAS28 is a widely used disease activity assessment tool that affects treatment decisions by rheumatologists in daily clinical practice [10]. The DAS28 is a statistically derived composite index that takes into account the patient's number of swollen joints, number of tender joints, erythrocyte sedimentation rate (ESR), and general health using patient global assessment [11].

Predictor variables
Potential predictor variables for model development were established using three distinct but complementary approaches. The first approach identified a list of potential predictors from literature review and discussion with clinical domain experts (GWC, DC, and MT). The second approach utilized hierarchical grouping software to generate drug categories, medical conditions and procedures at a feasible level for model development -described as the "all predictor" model. The third approach started with all possible predictors (i.e., a priori clinically defined variables and potential predictors based on hierarchical grouping systems) then used a prescreening approach that preselected variables exhibiting significant correlation with the DAS28 (p value <0.05). An automatic variables selection technique, described in the "Statistical approach" section, was then used to identify final sets of predictor variables for the three approaches used to identify the initial pool of potential predictor variables. Table 4 in Appendix 1 lists the a priori clinically defined potential predictors used for model development. It is important to keep in mind that we restricted potential predictors to information available in administrative claims databases. For example, we recorded when specific laboratory measurements occurred but did not attempt to make use of the laboratory results. The potential baseline predictors involved patient demographics, whether specific laboratory tests or radiographs were performed, counts of clinic visits for primary, rheumatology, orthopedic, physical rehabilitation, occupational therapy, and emergency care. Hospital admissions and the number of unique drug classes served as proxies for healthcare utilization. Administrative codes were used to identify surgical procedures, hand surgery, orthopedic surgery and joint injections. Surgical procedures were identified using the Healthcare Cost and Utilization Project (HCUP) Surgical Flag Software [12,13]. We measured specific comorbidities and implemented the Rheumatic Disease Comorbidity Index (RDCI) to account for comorbidities [14]. Procedure codes were also used to measure the use of assistive devices, such as a cane, crutch, or wheelchair.
Careful attempts were made to accurately measure disease modifying anti-rheumatic drug (DMARD) use during the baseline period and up to 14 days following the index date to reflect treatment changes in response to the DAS28. Three variables were produced for each

Statistical approach
Least absolute shrinkage and selection operator (LASSO) is a widely used regularization technique used for developing high-dimensional prediction models without high variance [15]. LASSO selection arises from a form of ordinary least squares regression where the sum of the absolute value of the regression coefficients is constrained to be smaller than a specified parameter. Let X = (x 1 , x 2 , x 3 …x m ) denote the matrix of predictor variables (e.g., Table 1 with 250+ predictors) and let y denote the DAS28, (i.e., the response variable), where the x i s have been centered and scaled to have unit standard deviation and mean zero and y has a mean of zero. For a given tuning parameter t, the LASSO regression coefficients minimize: y−Xβ k k 2 subject to P m j¼1 β j ≤t Provided that the LASSO parameter t is small enough, some of the regression coefficients will be exactly zero. Therefore, the LASSO can be viewed as having a built-in variable selection technique. By increasing the LASSO parameter in discrete steps a sequence of regression coefficients is obtained, where the non-zero coefficients at each step correspond to selected parameters. The LASSO method produces a series of models, M 0 , M 1 …..M k , with each model being the solution for a unique tuning parameter value. In this series, M 0 can be thought of as the least complex model, for which the maximum penalty is imposed on the regression coefficient, and M k is the most complex model, for which no penalty is imposed. The prediction error is then computed for each model and the model that yields the minimum prediction error is chosen.
Cross-validation is important in guarding against overfitting the model to the data, meaning the model is fit to random error instead of true underlying relationships among the variables. Overfitting reduces model performance when applied to an independent dataset that was not involved in the training of the model. Fivefold "external" cross-validation was applied using randomly selected folds to train and test the statistical model. Every fold was used for both training and testing, e.g., when fold 0 was the testing set then folds 1-4 were used for training and when fold 4 was the testing set then folds 0-3 were used for training. SAS GLMSELECT (SAS version 9.4 with Enterprise Guide version 6.1 (Cary, NC, USA)) was used to implement the cross-validated LASSO procedure and the model that minimizes the crossvalidated external predicted sum of squares was used as the primary model selection criterion [16].

Classification analysis
As the secondary study endpoint, we attempted to use the predicted DAS28 to correctly classify patients who were categorized into typical clinical categories of remission and low, moderate, and high disease activity using the actual DAS28.
The correct classification rate (CCR) was used to determine how well the predicted values were classified into the four clinical groups based on the clinical DAS28. CCR is the percentage of correct observations (suitability with the expected value). CCR can be calculated using the following formula: The higher percentage of CCR shows higher accuracy [17]. Exact binomial 95% confidence internals were computed for the CCR [18].

Sensitivity analysis
We developed three models that varied by the initial pool of variables available for model development. As described above, the first model comprised the clinically defined variables identified from our comprehensive literature review and use of clinical domain experts. The second model comprised these same clinically defined variables, plus HCUP CCS condition and procedure codes, and Veterans Health Administration (VHA) drug class codes. The third model comprised variables from the second model; however, only those variables statistically associated with the DAS28 were included (prescreening). Additional sensitivity analysis was applied on duration of RA disease. Models were compared for RA patients with <5 years of disease and ≥5 years of disease.

Population
During the observation period from 1 January 2006 to 31 December 2014 there were 1582 VARA patients meeting all inclusion criteria for the model development phase. Study attrition is presented in Table 1. The average age at index for the population was 63 years (standard deviation (SD): 11) and 90% (95% CI: 88-91%) of the population were male. The DAS28 was normally distributed with an average DAS28 of 3.8 (SD: 1.54) and average Rheumatoid Disease Comorbidity Index (RDCI) of 2.2 (SD: 1.6).

Model development and testing
The clinical experts identified 253 a priori claims-based variables representing clinical concepts thought to be associated with the DAS28. After applying the restriction that variables must be present in >1% of the population we ended up with 175 potential predictor variables. After model development with the LASSO regularized regression, the final model contained 32 variables and the adjusted R-square was 0.221 ( Table 2). The "all predictors" and prescreening models started with different initial potential predictor variables but produced similar crossvalidated adjusted R-square values, 0.221 and 0.223, respectively.
Review of the scatter plots presented in Fig. 1a-c show that the predicted DAS28 did not contain the same range of values as the true DAS28. The predicted DAS28 overestimated those with low DAS28 and underestimated the predicted value for Veteran patients with high DAS28.

Model-retained predictor variables
Variables selected and their standardized estimates for each model are presented in Appendix 2: Tables 5-7. Variables are ordered by the magnitude of the standardized estimate. Healthcare utilization (total visit count, primary care visits and occupational health visits) were relatively strong predictors of higher DAS28. Starting a new DMARD (biologic or non-biologic) 14 days after the DAS28 measurement was also consistently associated with higher DAS28. Higher baseline proportion of days covered (PDC) for DMARDs during the prior year was associated with lower DAS28. Measures indicating higher comorbidity, such as the RDCI and number of distinct VA drug classes were associated with higher DAS28. In the two models that used grouping software, we found analgesics, antidepressants and other agents acting on the central nervous system to be associated with higher DAS28. Please review Appendix 2 for a complete list of variables retained in each model.

Secondary analysis: correct classification of disease activity categories
The overall CCR for correctly classifying patients with the predicted DAS28 into clinical categories based on the true DAS28 ranged from 39.9% to 40.5% (Table 3). The true positive rate (TPR), which essentially represents measurement sensitivity, was fairly low in the group classified as having high disease activity by the predicted DAS28 and ranged from 9.9% to 10.8% (Table 3), while the positive predicted value (PPV) ranged from 57.9% to 63.0%. The TPR for those classified as having moderate disease ranged from 84.6% to 87.6%, while the PPV ranged from 42.5% to 43.0%. The TPR for low disease acitivity ranged from 17.9% to 20.2%, while the PPV ranged from 21.0% to 21.6%. None of the prediction models developed and tested accurately classified patients with remission. The TPR ranged from 0.0% to 0.3%, while the PPV ranged from 100% to 0%. Attempts to statistically define the cut points for the predicted DAS28 to optimize correct classification did not meaningfully improve the CCR. The optimized CCR only increased by approximately 3% (not presented).

Discussion
This study developed and internally validated statistical models to predict RA disease activity using data available in administrative claims data but was not successful in establishing a high level of predictive value. The relatively low adjusted R-square value and the inability to accurately classify patients into categories of disease activity using the predicted DAS28, draw into question the ability of a claims-based predictor to successfully represent patient disease activity in RA.
The sensitivity analysis focused on development of multiple models based on the initial pool of variables and applying these models to subsets of patients with <5 years of RA disease and ≥5 years of disease. This cut point was chosen because it has been used in other studies (e.g., TICORA 2004 study) [19] to represent early disease and because the data did not support comparisons with earlier cut-points.
The initial pool of predictors did not impact predictive accuracy, as the three models had similar adjusted R-square values. The models performed slightly better for patients with <5 year of disease. Similar results were found for patients with <2 years of disease (not presented). Duration of RA disease was recorded at VARA enrollment and would not be available in claims data, but could be approximated based on duration of enrollment and indicators of disease in claims-based data.
Variables retained in the three models were intuitive and interpretable. Variables indicating increased healthcare utilization, comorbidities, the use of agents that act on the central nervous system, and new DMARD prescriptions after the DAS28 measurement were associated with higher DAS28. Variables indicating longer exposure and a higher proportion of days covered by DMARDs in the baseline period were associated with lower DAS28. Even though many variables were associated with DAS28 the models poorly explained the variation in DAS28.
The classification accuracy of predicted DAS28 was evaluated by categorizing the true DAS28 and predicted DAS28 into remission (<2.6), low disease activity (2.6-3.1), moderate disease activity (3.2-5.1), and high disease activity (>5.1). The overall classification accuracy across the three models was modest. The models were poor predictors of remission and high levels of disease activity, which is likely due to the relatively small number of Veterans in the lower and higher end of the DAS28 scale. They performed best at classifying patients with moderate disease activity, which is the range for the majority of the population. In an additional analysis (not presented), we identified the optimal cut points of the predicted DAS28 to correctly classify patients by true DAS28 category of disease activity but were only able to increase classification accuracy to 43% across the three models.
A recent study evaluating the accuracy of a claimsbased model to predict DAS28-CRP also performed poorly and found other claims-based measures of RA disease severity were not well correlated with true DAS28-CRP. Desai et al. (2015) [5] evaluated the Fig. 1 Predicted 28-joint disease activity score (DAS28) vs true DAS28 for test data. a Clinical predictors. b All predictors. c Pre-screened variables claims-based index for RA severity (CIRAS) developed by Ting et al. [3] using a Delphi panel and found poor correlation with the true DAS28-CRP. They added additional variables to CIRAS that included medical claims for rheumatoid lung involvement, hand surgery, tuberculin test ordered and anti-CCP test orders. Furthermore, they also added pharmacy claims for steroids, opioids, non-steroidal antiinflammatory drugs (NSAIDs), number of nonbiologic DMARDs, and number of biologic DMARDs, and found the model R-square value (0.23) was similar to ours, even though they had a relatively small sample of Medicare patients (n = 315). They concluded that CIRAS may not approximate RA disease in observational cohorts and its use for confounding adjustment should be carefully considered. They also concluded that claims-based algorithms developed for clinical disease activity should be rigorously tested in a population with actual measures of disease activity to establish their generalizability before implementing in research. Our study leveraged a substantially larger population and rigorously tested our models to reduce the likelihood of overfitting the models to the data, yet had similar results.

Strength and limitations
The primary strength of this study was the access to DAS28 in a relatively large number of patients. This is the largest study that attempted to train and validate a claims-based predictor of RA disease activity using the actual DAS28. In addition, we used statistical methods that supported high-dimensional variable selection and prevented overfitting the model to the actual data. Of note, we replicated model development and classification accuracy using the Clinical Disease Activity Index (CDAI) and found similar adjusted R-square values and classification accuracy (results not presented).
One weakness of this study is that the population studied may not represent average RA patients. Veterans tend to be older and more likely male than the general population. Another weakness of this study was the reliance on claims-based predictors. Even though observational comparative effectiveness and safety studies often rely on commercial and Federal claims databases there is a new paradigm for population health analytics that involves the integration of the electronic health record with claims data. Accurate prediction models of disease activity will likely require integration of clinical data, such as laboratory test results and information captured in structured electronic health data or in medical notes. The use of standardized templates would be required to extract components of disease activity scores, such as the number of swollen or tender joints, and used in models to predict future disease activity and to better understand what features of the disease activity score influence changes in treatment patterns. Our future work aims to develop natural language processing tools to extract components of the DAS28 from templated notes within the VA and use this information along with other clinical data in the development of future RA disease activity prediction models.

Conclusions
The prediction models developed and tested found a relatively low level of predictive accuracy, drawing into question their use for confounding adjustment or to evaluate treatment decisions in patients with predicted DAS28. Our findings are consistent with other recent studies that attempted to use claimsbased data to predict RA disease activity. Future work to predict disease activity in patient populations should incorporate clinical data from the electronic health record in addition to key variables available in administrative claims data.