RA cohort identification
We identified and included RA patients treated with conventional, biologic, or synthetic disease-modifying anti-rheumatic drugs (DMARDs) from 01/01/2006 to 12/31/2015 using ICD-9-CM codes in Medicare data from 01/01/2006 through 09/30/2015, and ICD-10-CM codes from 10/1/2015 to 12/31/2015. To identify RA patients, we required two or more claims for RA (ICD-9-CM 714.0 or 714.2) occurring between seven and 365 days apart, with at least one from a rheumatologist. In addition, we required at least one prescription or infusion of an RA medication (hydroxychloroquine, sulfasalazine, methotrexate, leflunomide, infliximab, etanercept, adalimumab, golimumab, certolizumab pegol, abatacept, tocilizumab, rituximab, tofacitinib) [17]. To ensure an adequate baseline period that would allow the identification of incident ILD, we required greater than or equal to 12 months of continuous Medicare A+B-C coverage prior to the start of follow-up. The date of RA cohort eligibility (i.e., RA Cohort Index date) was defined as the date the patient met all three of the above requirements (RA diagnosis, DMARD, and at least 12 months of continuous coverage). The project was initiated in 2017, and given the availability of the Medicare claims data, we established a cohort inclusion cutoff date of 12/31/2015. All available Medicare data prior to the date of RA cohort eligibility were included in the baseline period.
We excluded patients with a diagnosis of other autoimmune diseases (e.g., systemic lupus erythematosus, scleroderma, myositis), malignancy except for non-melanoma skin cancer, HIV, or history of organ transplantation using all available data prior to the index date. All study activities were conducted in accordance with institutional review board approval at each medical center, and the data use was governed by a data use agreement from CMS.
Case qualification definitions
Each case was classified as to the type of inpatient or outpatient visit associated with the first ILD diagnosis. Cases were characterized based on the place of service where the initial diagnosis appeared in the claims data using a “case qualifying” status as follows: HospitalPrimary = inpatient primary diagnosis, HospitalNonPrimary = inpatient non-primary diagnosis, OutpatientCT = outpatient diagnosis preceded by CT within 90 days, and OutpatientHospital = outpatient diagnosis preceded by hospitalization within 90 days.
Medical record confirmation of ILD
The RA cohort of patients with suspected prevalent and incident ILD was further restricted to patients at five participating academic medical centers where linked medical records were available: Duke University, The Medical University of South Carolina (MUSC), The University of Alabama at Birmingham (UAB), The University of North Carolina (UNC), and Vanderbilt University Medical Center (VUMC). Each center cross-referenced the information available from the Medicare administrative data with the corresponding information in their center’s corresponding electronic medical records using either a search tool run against a central data warehouse or repository (e.g., i2b2) or their own local ILD registry. Medical record reviewers at each site abstracted clinical data, including clinical notes, date of diagnosis, CT scan reports, chest x-ray reports, lung pathology reports, and pulmonary function tests (PFTs), into a case report form. Case report data from all sites was de-identified, aggregated, and adjudicated independently by two ILD experts (pulmonology and rheumatology). The possible adjudication outcomes for each ILD case included: confirmed, not confirmed, insufficient information to determine, or not retrievable, with discordance in adjudication resolved by consensus (initial agreement as measured by kappa = 0.96, 95% CI 0.93–1.00). ‘Not confirmed’ indicated patients who had sufficient information by which to judge ILD case status, and the patient did not have ILD. “Insufficient information to determine” indicated suspected cases where there was insufficient data in the EHR to make a determination as to whether they had ILD or not (e.g., mention of an ILD-related diagnosis, but no primary data [e.g., HRCT results] was available). “Not retrievable” indicated that the EHR record could not be linked or obtained for the patient. Based on the entirety of the medical record, the adjudicators and site abstractors subsequently classified each confirmed case as incident ILD, prevalent ILD, or not able to be classified. The results of adjudicated cases were compared to the results from the algorithm to determine the positive predictive value (PPV) of the algorithm for prevalent ILD, and a separate algorithm for incident ILD. Incident ILD cases were considered to be correctly classified if the ILD onset date per the medical record review was within +/− 6 months of the ILD case date as identified by the claims-based algorithm.
ILD algorithm definition
Drawing from the literature, the study team created a list of ICD-9-CM and ICD-10-CM codes that could potentially indicate the presence of an ILD diagnosis (Supplemental Table 1). The team further divided these codes into Specific (bolded) and Sensitive (non-bolded) conditions. Searching the Medicare data, we required ICD-9-CM or ICD-10-CM diagnosis code for ILD from an inpatient hospitalization claim in any position (primary or non-primary position), OR one or more outpatient diagnosis codes for ILD from a pulmonologist, rheumatologist, or internist, plus an outpatient chest computed tomography (CT), or outpatient lung biopsy, or any hospitalization in the preceding 90 days. Similar algorithms have been used previously in other claims-based studies [8] and recently validated in patients with RA in the VA health system [14]. The rationale behind allowing for a recent hospitalization to act as a surrogate for an outpatient CT scan is that many diagnostic tests (like this one) performed on hospitalized patients are not separately recorded in the data.
To identify incident ILD, we applied further exclusion criteria to the aforementioned algorithm. Given the expectation that a period free of any ILD or other pulmonary diagnoses for approximately 2 years would be required to appropriately classify incident ILD (Fig. 2a, b), we used all available data prior to the index date with a minimum requirement of at least 12 months. Secondly, we evaluated all data 12 months after the index date such that we had a minimum of a two-year ascertainment period to identify and exclude prevalent ILD. During these time periods, we required the patient to have no ILD diagnosis codes, indicators of prevalent ILD (e.g., lung biopsy), or diagnosis codes for sarcoidosis. However, because ILD might require several months to evaluate and ultimately diagnose, we allowed evidence for ILD to accrue up to 6 months prior to the confirmed ILD event date. At least one ILD case qualifying event date must have occurred after the RA Cohort Index date, consistent with the goal of identifying ILD in a cohort of patients already classified as having RA according to validated approaches [17].
Statistical analysis
Descriptive statistics were generated for the cohort, stratified by ILD case status (confirmed, not confirmed, or insufficient information to determine). Imbalances in characteristics at p < 0.05 and with standardized mean differences (SMDs) > 0.10 were considered potentially clinically meaningful. We calculated positive predictive values (PPV) of the ILD algorithm with 95% confidence intervals estimated using a binomial approximation. The PPV was calculated for the ILD algorithm for any case of ILD (prevalent or incident), compared to ILD classification by medical record review. We also assessed the PPV of the incident ILD algorithm criteria in finding incident ILD, both with conditioning on having prevalent ILD and without conditioning as well. Finally, we examined seropositive RA associated with ILD, using an ICD-10-CM diagnosis code M05 (rheumatoid arthritis with rheumatoid factor), which has previously been shown to reasonably proxy for positive rheumatoid factor and/or anti-CCP antibody [18].