We applied the framework following the general steps used in early-CEA of medical tests as developed by Buisman et al. [14]. This framework is a useful guidance for researchers performing early-CEA of medical tests. Early-CEA evaluates medical tests in development by assessing how much these tests could improve health outcomes and healthcare efficiency. Moreover, early-CEA helps test developers to decide about further development of medical tests, set realistic performance-price goals and design and manage reimbursement strategies [14, 15].
Current diagnostic test strategy
IA patients at risk of having RA undergo a diagnostic workup to establish the presence of RA. This entails affected joint counts, blood testing, and radiographs ordered or established by the rheumatologist. If no other explanation for the symptoms is found (e.g., systemic lupus erythematosus (SLE), gout, or psoriatic arthritis) the patient is classified as having RA if at least 6 out of 10 points on the RA-2010 criteria are scored. Patients who score less than 6 but more than 2 points are regarded as patients at intermediate risk who do not fulfil the RA-2010 criteria. The RA-2010 criteria were the comparator in our early-CEA [6].
New diagnostic test strategies
The cost-effectiveness was assessed of four diagnostic tests that are currently being developed as part of the TRACER project [16]: B-cell related gene expression [17], IL-6 serum level test [18], MRI of the hands and feet [19–24] and genetic assays with susceptibility SNPs for RA [25]. For each of the tests (described subsequently), three different test strategies were modelled: add-on to the RA-2010 criteria for all patients with IA, add-on for intermediate-risk patients only and replacement of all blood tests and radiographs used to classify patients according to the RA-2010 criteria. For the add-on test strategies, the performance of the new test strategy was estimated by combining the sensitivity and specificity of the RA-2010 criteria and the new tests.
B-cell related gene expression
During the development of arthritis, B-cell RNA expression decreases over time [16]. Although the exact mechanism is poorly understood, the marker is useful to predict early arthritis in patients with seropositive arthralgia [17]. Currently no data are available for patients with IA. Therefore, we used the data from the seropositive arthralgia cohort studied by Baarsen et al. [17]. After discussion with the developers of this test, sensitivity of 0.60 and specificity of 0.90 was used. The cost of the test was set at €150.
IL-6 serum level test
IL-6 is a cytokine that is present in inflammation. In a recent evaluation of IL-6 serum level test performance in detecting RA in patients with IA the sensitivity was 0.70 and specificity was 0.53 [18]. We used these sensitivity and specificity values and assumed a cost of €50 per test.
MRI of hands and feet
MRI may reclassify patients to different joint domains of the RA-2010 criteria if there are more swollen joints than are identified on physical examination. MRI also provides additional information on bone marrow edema [19] and tenosynovitis [20, 21]. Based on the literature [22–24] and discussions with test developers, we set the sensitivity of MRI at 0.90 and the specificity at 0.60. The costs of MRI were assumed to be equal to the unit costs currently used by the Dutch Healthcare Authority, of €189 per MRI examination (€756 for four MRI scans (both hands and both feet)) [26].
Genetic assay with susceptibility SNPs for RA
RA is a complex disease involving several genes. Heritability for RA is estimated to be around 50–60 % [25]. Expert review of the literature suggests that using genetic risk factors combined with current knowledge would result in sensitivity of 0.40 and specificity between 0.80 and 0.90 to identify patients with RA [25]. We used these estimations and applied sensitivity of 0.40 and specificity of 0.85 with an estimated cost of €750 per test based on expert opinion from test developers.
Treatment
In the current and new diagnostic test strategies, test-positive patients received methotrexate (MTX) at 25 mg/week orally [27]. Due to the side effects of MTX, patients could switch to other synthetic DMARDs (e.g., Sulfasalazine, Leflunomide). After failure of two synthetic DMARDs, patients could switch to biologic DMARDs (i.e., TNF-inhibitors, IL6-inhibitors, B cell depletion, or T cell inhibition) [28].
RA patients who were additionally detected by the new diagnostic test strategies as compared to the RA-2010 criteria were assumed to be given early treatment. As a result, we modelled that they had an improvement of 0.2 in the disease activity score in 28 joints (DAS28) at 12 months as compared to patients in the current test strategy. The improvement of 0.2 in the DAS28 was based on sensitivity analysis in which we evaluated the effect of changing this value on the model results (see univariate sensitivity analysis below).
Model structure
In RA, the diagnosis and subsequent prognosis are complex processes in which various tests and measures of disease activity influence treatment decisions and subsequently outcomes in terms of both costs and effects. As the diagnosis is often reconsidered in the first year of disease, especially in those initially not classified as having RA, we decided to model the first year as a decision tree with chance nodes at 6 and 12 months to classify patients as true positive (TP), false positive (FP), true negative (TN) and false negative (FN) during the first year. Patients were classified as TP if they had a positive test result (≥6 points on the RA-2010 criteria or positive on the new test) at baseline and at 12 months, used MTX or stopped MTX due to side effects. Moreover, the symptoms should not be explained by another classified diagnosis [6]. Patients were considered as TN if they had a negative test result (<6 points on the RA-2010 criteria or negative on the new test) at baseline, did not use MTX at 12 months, and had symptoms explained by another classified diagnosis. Patients were considered FP if they scored ≥6 points on the RA-2010 criteria or were positive on the new test at baseline but did not use MTX at 12 months, and had symptoms explained by another classified diagnosis. Patients were classified as FN if they scored <6 points on the RA-2010 criteria or were negative on the new test at baseline but used MTX or stopped MTX due to side effects at 12 months, and had no symptoms explained by another classified diagnosis. Using a combination of initial RA-2010 criteria scores and the use of DMARDs not explained by any other disease is a common way of dealing with a disease for which no hallmark sign is available [6, 29].
The first year is followed by a four-year individual-level Markov model (i.e., patient-level state transition model) that simulates the change in disease activity (DAS28) over time in 3-month cycles. The cycle time is 3 months because patients are commonly seen by the rheumatologists every 3 months. This 5-year model was used to simulate what would happen if a proportion of FN patients in the current strategy were diagnosed earlier with lower levels of disease activity in the new test strategy. The time horizon was 5 years because the long-term effects of biological drug use are unknown. Patients were categorised into three disease states: remission (DAS28 ≤ 2.6), low disease activity (DAS28 > 2.6 to ≤3.2) and moderate and severe disease activity (DAS28 > 3.2).Footnote 1 This categorization by DAS28 score is common in the field of RA [30–32]. Resource use, costs and utilities were linked to these three categories. The patients who were classified as TP or FN at 12 months entered the patient-level state transition model. A proportion of patients with DAS28 > 3.2 were modelled to start a biologic DMARD in addition to MTX. They were assumed to stay on a biologic DMARD and could switch to another biologic DMARD for the remainder of the 4 years. The patients who were classified as TN or FP at 12 months entered a background model in which they stayed for the remaining 4 years, assuming no change in utilities, biologic DMARD costs for 10 % of FP patients in the first year after diagnosis and otherwise, no RA-related costs. Figure 1 shows the decision model comparing the current diagnostic test strategy with the new add-on diagnostic test strategy for intermediate-risk patients (described subsequently). This 5-year cost-effectiveness model is an extension of our 1-year model published elsewhere [33].
Data sources
To populate the model, we mainly used data from three different sources. Additional file 1: Table S1 shows the characteristics of the three sources. First, data from the REACH cohort (usual care) were used to populate the 1-year decision tree with 552 patients with IA who were suspected of having RA (details about the cohort can be found in Additional file 2: Table S2). Patients had to have at least one joint clinically diagnosed as affected by synovitis that could not be classified as another inflammatory joint disease. The prevalence of RA was 54 % at 12 months based on the RA-2010 criteria and MTX use.
Second, data from the tREACH trial were used in our model for RA patients after 1-year follow-up [34]. The tREACH trial includes patients aged 18 year-old or older with arthritis in at least one joint, and symptom duration less than 1 year. Patients were randomized into three initial treatment strategies: triple DMARD therapy (MTX, sulfasalazine and hydroxychloroquine) with intramuscular glucocorticoids, triple DMARD therapy with an oral glucocorticoid tapering scheme and MTX monotherapy with an oral glucocorticoid tapering scheme. See Claessen et al. [35] for a detailed description of the tREACH trial.
Third, summary data from the DREAM registry as published by Vermeer et al. [36] were used to inform on the start of biological drugs. This publication describes data from two cohorts, a treat-to-target cohort and a usual-care cohort of patients with clinical RA. The treat-to-target strategy used a standardized treatment step-up protocol [36]. In contrast, the treatment switches were not performed by protocols in the usual care cohort.
Model inputs
Additional file 3: Table S3 gives an overview of all model input parameters, their estimates and distributions for probabilistic sensitivity analysis, and data sources.
Estimation of transition probabilities
During the first 12 months of our model, the probabilities of patients being TP, FN, TN and FP were elicited from the REACH cohort in which patients were classified according to the RA-2010 criteria, use of MTX and use of other synthetic DMARDs at baseline, 6 and 12 months.
At the start of the patient-level state transition model (i.e., at 12 months), the DAS28 of TP and FN patients at 12 months in the REACH cohort resulted in patients entering one of the three disease states. Patients who entered the DAS28 >3.2 state at the start or later in time could be eligible for starting biological drugs. To model this, summary data from the DREAM cohort on the start of biological drugs were used, in which the observed use of biologic DMARDs in clinical practice was 15 % at 24 months. We transformed this 15 % rate into a 3-month transition probability of 2 % to start biologic DMARDs in those patients with a DAS28 >3.2. This 2 % was distributed over the three disease states in a 1–3–6 distribution (state 1– state 2– state3) based on flare rates (DAS28 > 3.2) in the tREACH cohort.
Estimates of resource use and costs
We distinguished two cost categories: direct medical and productivity costs. Direct medical costs include costs of visits to rheumatologists and other health professionals (e.g., physical therapist), laboratory tests including diagnostic tests and those to monitor side effects, and medication use. Productivity costs represent the number of days that a patient with a paid job was absent from work in the past 3 months. Resource use and productivity losses per disease state were obtained from the REACH study in the first year [29] and from the tREACH study [34, 35] in the second and third year. The latter was extrapolated to 5 years. In the background model TN patients were assumed to incur no RA-related costs, while 10 % of FP patients incurred biologic DMARD costs in the first year after diagnosis due to misdiagnosis, for which the frequency was based on the REACH study.
The unit costs of visits and productivity losses were based on reference prices published in the Dutch Manual of Costing in economic evaluations [37]. Diagnostic test costs were based on tariffs published by the Dutch Healthcare Authority [26], and medication costs were obtained from the National Health Care Institute [38]. All costs were adjusted to €2014 using the general price index from the Dutch Central Bureau of Statistics [39]. All cost parameters can be found in Additional file 3: Table S3.
Estimation of QALYs
When assessing the impact of a new test or treatment on quality of life over time, the health outcomes are usually measured in terms of quality-adjusted life years (QALYs). The QALY combines the number of life years with the level of health-related quality of life (i.e., utilities) in those years [40]. The EuroQol 5-dimension 3-level questionnaire (EQ-5D-3L) was used to estimate utilities. The baseline utilities of TP, FP, TN, and FN were obtained from the REACH study and were 0.60, 0.65, 0.65 and 0.60, respectively. Based on the literature we assigned an improvement of 0.10 over the first year to the TPs [41–45]. Based on the REACH study we assigned an improvement of 0.05 and 0.10 over the first year to the FPs and TNs, respectively. Based on the placebo group in the STIVEA trial, we assigned a 0.05 reduction over the first year for FNs, assuming that FN patients would receive little therapy [45].
In the patient-level state transition model, patients were assigned EQ-5D values based on their DAS28 every 3 months, stratified for the start of biologic DMARDs. As observed in the tREACH study, the EQ-5D values for patients not using biologic DMARDs were higher. Furthermore, the EQ-5D values were not normally distributed. About 25 % of patients in the tREACH study had a decrease in EQ-5D at least once in 3 years, which led to a utility score lower than 0.50. Therefore, different distributions of utility values were estimated. One distribution was estimated for patients with at least one EQ-5D decrease below 0.50 over time and another distribution was estimated for patients who always had an EQ-5D higher than 0.50 over time. In the background model, patients were assumed to have an EQ-5D value of 0.75 that remained constant over time.
Analyses/modelling
We performed a base-case analysis with four diagnostic tests that were used in three diagnostic test strategies as described above. We calculated the incremental costs per QALY gained in each new test strategy compared with the current test strategy (i.e., incremental cost-effectiveness ratio (ICER)). Probabilistic sensitivity analyses were performed in which incremental costs and QALYs were calculated as the mean of 1,000 Monte Carlo simulations, where each simulation samples simultaneously from the appropriate distributions of the input parameters (see Additional file 3: Table S3 for the distributions). Cost-effectiveness planes and acceptability curves were constructed from the Monte Carlo simulation. In addition, we used the headroom (i.e., potential profit) method to assess the maximum additional cost for which each new diagnostic test was still likely to be cost-effective at a willingness-to-pay threshold of €20,000 per QALY gained [46, 47].
Furthermore, we explored the impact of our model parameters in univariate sensitivity analyses, varying the sensitivity, specificity, new test costs, improvement in the DAS28 in TP patients in the new test strategy, who were FN in the current test strategy, and costs of biologic DMARDs for FP patients in the first year after diagnosis. The range over which the model parameters were varied are shown between brackets in Fig. 4. We report these analyses for an add-on test for intermediate-risk patients. For each analysis, one model parameter was altered while the other parameters were held constant at the baseline value. In our analyses, differential discounting was applied in accordance with the Dutch guidelines for economic evaluation research, with an annual discount rate of 4.0 % for all costs and 1.5 % for health effects [48].
Model validation
The model structure and input parameters were checked for clinical correctness by rheumatologists. We also verified the model for coding and logical correctness by running extreme value scenarios. Furthermore, an independent modeller internally validated our model to check the model structure, all input parameters with distributions and the visual basic code used to programme the model in Excel.
Ethical approval and patient consent
No ethical approval and consent from patients was needed for this study.