Randomized controlled trial design in rheumatoid arthritis: the past decade

Much progress has occurred over the past decade in rheumatoid arthritis trial design. Recognized challenges have led to the establishment of a clear regulatory pathway to demonstrate efficacy of a new therapeutic. The use of pure placebo beyond 12 to 16 weeks has been demonstrated to be unethical and thus background therapy and/or early rescue has become regular practice. Goals of remission and 'treating to targets' may prove more relevant to identify real-world use of new and existing therapeutics. Identification of rare adverse events associated with new therapies has resulted in intensive safety evaluation during randomized controlled trials and emphasis on postmarketing surveillance and use of registries.


Introduction
Much has changed since methotrexate was approved for treatment of active rheumatoid arthritis (RA) in 1986 based on a total of 126 patients enrolled in two randomized controlled trials (RCTs) [1,2] and treated for a maximum of 24 weeks. Today, RCTs are expected to be 6 to 24 months in duration and employ composite outcomes by American College of Rheumatology (ACR) responses and/or Disease Activity Score (DAS), inhibition of radiographic progression at 6 and 12 months with continued benefit at 24 months, and improvement in physical function and health-related quality of life at 6 months with continued benefit over long-term treatment. Over the past decade, approval of etanercept [3,4] and leflunomide [5] in 1998 and infliximab in 1999 [6] established a firm regulatory precedent in RA, resulting in the introduction of three more disease-modifying antirheumatic drug (DMARD) therapies ( Figure 1) with another three expected within the year.
This progress in clinical development was driven, in part, by the Guidance Document for Development of New Therapies for Treatment of RA, which was issued by the US Food and Drug Administration (FDA) and finalized in 1998 [7], followed by recommendations from the European Agency for the Evaluation of Medicinal Products in 2004 [8]. Together, these documents set a precedent for requiring longer-term RCTs, of 12 to 24 months in duration, evaluating radiographic progression and patient-reported physical function in addition to accepted outcomes assessing signs and symptoms of disease. This review will address difficulties in comparing clinical trials, including the importance of comparator groups, background therapy, and means to use placebo controls. Additionally, identification of rare adverse events in RCTs and confirmed in postmarketing surveillance as well as newer approaches designed to reflect clinical practice more realistically will be discussed.
The tremendous progress in clinical development in RA over the past decade has revolutionized rheumatology and significantly benefited our patients. It is hoped that this precedent will lead to similar advances in other rheumatologic diseases, although to date these remain more elusive. Hopefully, the next decade will bring new agents to address the large unmet needs in other rheumatic diseases.

Arthritis Research & Therapy Vol 11 No 1 Strand and Sokolove
Difficulties in comparing trial data: no two randomized controlled trials are the same There have been few head-to-head trials of biologic agents in RA. It is not surprising that sponsors of regulatory trials have not pursued this study design, leaving clinicians only the option of comparing data across RCTs. To do so requires trials that enroll patient populations with similar demographics and disease characteristics and that use comparable treatment interventions and outcome measures -a tall order, especially in heterogeneous diseases such as RA (Table 1).
Across trials, it is clear that therapeutic responses are not consistent. This is perhaps best exemplified by the variability of ACR20/50 (ACR 20%/50% improvement criteria) responses with methotrexate, which range from 46% to 78% at 1 year and from 56% to 84% at 2 years (Table 2). These cannot be completely explained by differences in median methotrexate doses, use of folic acid supplementation [9], or enrollment of subjects with early versus well-established disease. Even in patients with early disease (duration of less than or equal to 1 year), ACR20/50 responses with methotrexate monotherapy ranged from 54%/32% (ASPIRE [Active Controlled Study of Patients Receiving Infliximab for Treatment of RA of Early Onset]) [10] to 63%/46% (PREMIER) [11] to 65%/42% (Etancercept in Early RA [ERA]) [12]. Similarly, in three-arm RCTs comparing either monotherapy with combination tumor necrosis factor inhibitor (TNF-I) + methotrexate, ACR20 responses for TNF-I monotherapy versus combination varied from 32% versus 50% (ASPIRE) and 41% versus 62% (PREMIER) in early disease to 48% versus 69% in a population with approximately 7 years of disease duration (Trial of Etanercept and Methotrexate with Radiographic Patient Outcomes [TEMPO]) [13]. Those naïve to methotrexate (ASPIRE and PREMIER) as well as those receiving successful therapy for not more than 6 months generally will have more favorable responses to this 'gold standard' DMARD.
Radiographic progression is also quite variable across protocol populations receiving methotrexate, ranging from 0.9 to 2.8 Total Sharp/Sharp van der Heijde score (TSS) points (range 0 to 448) at 12 months in populations with 6 to 7 years of disease duration (US301 and TEMPO) [5,13] to 1.3 to 5.7 TSS points in early disease trials (ERA, ASPIRE, and PREMIER) [10][11][12] (Figure 2). Differences in progression rates are best predicted by pre-existing damage (for example, TSS at baseline). Calculating estimated yearly progression (baseline TSS divided by mean disease duration) illustrates the broad differences in expected progression rates across protocols, ranging from 3.5 to 6.6 in established disease (US301 and TEMPO) to 8.4, 9.5, and 27.4 (ERA, ASPIRE, and PREMIER) in early disease ( Figure 2). It is therefore important to interpret RCT data carefully in the context of demographic and baseline disease characteristics of each population, realizing that no two trials have enrolled truly similar populations, even with similar designs.

Active controlled trials
An active controlled trial demonstrating 'noninferiority' of a new to an accepted therapy is a standard design to demonstrate efficacy and may avoid use of placebo. As a consequence of the variability in responses discussed above, it is a challenge to predict clinical outcomes in protocols and accurately calculate sample sizes, particularly when using an active comparator, even the gold standard methotrexate. This has prompted the FDA and the European Medicines Agency Timeline of regulatory (US Food and Drug Administration) approvals for currently used disease-modifying antirheumatic drugs over the past 10 years. Major regulatory trials used in approval of each agent are listed below the agent. to require a placebo control to confirm that the active comparator was indeed efficacious -thus the three-arm design in US301 and inclusion of a short-term placebo substudy in the recent Actemra versus Methotrexate Double-Available online http://arthritis-research.com/content/11/1/205  Blind Investigative Trial In Monotherapy (AMBITION) [14] with tocilizumab.
If noninferiority is satisfied, efficacy is established and statistical superiority may then be queried and demonstrated. However, care must be taken to ensure that a protocol is not 'oversubscribed' (that is, enrollment of a number so large that small differences between therapies may be statistically significant but not clinically meaningful). This was illustrated by the comparison of methotrexate to leflunomide in MN302 [15]: differences of 1 in mean swollen joint count and 0.01 in mean Health Assessment Questionnaire-Disability Index (HAQ-DI) scores at 12 months. Thus, the requirement of two replicate trials for regulatory confirmation of statistical superiority has evolved [7].

Background disease-modifying antirheumatic drug trials
Early trials employed placebo controls. The last 'pure placebo' controlled RCTs in RA compared leflunomide with sulfasalazine versus placebo for 6 months (1998) [16], leflunomide with methotrexate versus placebo over 24 months with rescue of nonresponders on or after 4 months of treatment (1998) [5], and adalimumab monotherapy versus placebo in DMARD failure patients with rescue at 8 weeks (2000) [17]. Subsequent trial designs have used placebo only superimposed upon background therapy, typically methotrexate. Only in ATTRACT (Anti-TNF Trial in RA with Concomitant Therapy) [6], a 24-month RCT, was blinded treatment continued for 11 months before rescue. Thereafter, rescue of placebo treatment has been offered at 12 to 16 weeks [18][19][20][21][22][23] or mandatorily for nonresponders at 16 weeks in RAPID (RA Prevention of Structural Damage) 1 and 2 trials with certolizumab [24,25].
Over the past decade, the paradigm of 'step-up' or 'add-on' therapy has been used in several landmark RCTs. In these trials, patients with active disease despite DMARD therapy (again typically methotrexate) are recruited as partial responders following an incomplete or loss of therapeutic effect and then randomly assigned to the addition of study drug or placebo for 6 months. Although this trial design has been criticized [26], it offers several advantages, including the avoidance of exposure to pure placebo treatment and the fact that it does not require washout of prior DMARD therapy, thereby facilitating recruitment. A persistent concern has been whether patients enrolled in these add-on trials had previously responded to background therapy. As it would not be ethical to enroll subjects either having never responded to or no longer deriving benefit from background treatment to continue ineffective DMARD + placebo for an additional 6 months, it is unlikely that either patients or their treating physicians would have permitted their enrollment. Thus, equipoise, the principle that a subject be cognitively indifferent between two therapies, would not have been maintained.
Even with background therapy, the ethical issue of using placebo has prompted the use of a primary efficacy endpoint at 6 months, with demonstration of continued benefit in those 'successful responders', with still blinded or open-label treatment. FDA requirements have now been modified to 3 and 6 months for improvement in signs and symptoms with continued active treatment (open-label or blinded) and 6 to 12 months for assessment of structural damage and physical function, and 'maintenance of benefit' in those continuing active treatment over 12 to 24 months [27]. This allows placebo with or without background therapy to be 'rescued' on or after 2-3 months of treatment.
Following the Etanercept Phase 3 [3] and ATTRACT [6] trials, add-on therapy RCTs have comprised the majority of clinical development programs for adalimumab, abatacept, and rituximab. Despite differences in add-on treatment and timing of trials, demographics and disease characteristics of recruited patient populations have been remarkably similar: mean disease duration of 8 to 13 years, baseline DAS of 5.7 to 6.3, mean DMARDs failed of 2 to 3, mean prior methotrexate treatment of 2 to 4 years, and doses ranging from 15 to 19 mg/week. Two important criteria influence the outcome of this trial design: length of methotrexate treatment required at study entry and use of rescue therapy. Maximal responses to methotrexate (and other synthetic DMARDs, Radiographic progression with methotrexate is also quite variable across protocol populations, best predicted by damage at baseline. Estimated yearly progression (baseline Total Sharp/Sharp van der Heijde score divided by mean disease duration) helps to illustrate differences in protocol populations and explains differences in change scores over the course of 12  Although there is a clear regulatory precedent for using this trial design to demonstrate efficacy in RA, it is hoped it will be used progressively earlier in clinical development programs. Once safety (and efficacy) become evident in patients with a long duration of disease who have failed multiple DMARDs, it is appropriate to study a promising therapeutic agent in earlier disease populations, even DMARD-naïve patients, such as in ASPIRE [10], PREMIER [11], and AMBITION [14], prior to its approval.

Randomized controlled trials in anti-tumor necrosis factor failure patients
The evaluation of novel agents in a more real-world setting, after failure of TNF-I use, has added to our knowledge base.

Proof-of-concept trials
Proof-of-concept trials in RA require at least 3 months of treatment to allow sufficient time for improvement in manifestations of active disease to be demonstrated and confirm that benefit continues. This necessity has been demonstrated repeatedly when early studies of promising agents of only 1 month of duration were not confirmed with longer-term treatment over 8 to 12 weeks, as reported with several p38 mitogen-activated protein kinase inhibitors [38] and a TNF-α converting enzyme inhibitor [39], although a mechanistic explanation for loss of response remains elusive. Requiring 3 months of treatment has several important implications, including the necessity for toxicology studies of sufficient duration to 'cover' 12 weeks of dosing of a new agent in the clinic. As use of a pure placebo control as a comparator is now considered unethical, new therapies are introduced into the clinic superimposed upon background therapy, typically methotrexate. For synthetic agents, this means that drug-drug interaction studies must precede combination use to ensure no meaningful effects on half-life or metabolism of the background treatment, including nonsteroidal anti-inflammatory drugs (NSAIDs) as well as other commonly prescribed medications. It also means that a new therapy must be able to demonstrate benefit in a population of patients with active disease despite DMARD treatment, in general a more refractory population. In clinical development, it is therefore important to progressively study patients with earlier disease who have failed fewer DMARDs and who are more likely to be responsive to treatment in order to fully characterize the efficacy of new therapies. Similarly, the observed safety profile of a promising therapeutic may differ in more robust patients with earlier RA and fewer comorbidities.

Methotrexate as an active comparator
Trials designed to show 'noninferiority' against an accepted efficacious therapy have long been used in rheumatology for iterative approvals with nonselective NSAIDs as well as cyclooxygenase-2 (COX-2)-selective agents [40]. Recent three-arm RCTs designed to compare monotherapy versus combination TNF-I + methotrexate treatment have importantly demonstrated superiority of the combination versus either monotherapy as well as superiority of anti-TNF versus methotrexate monotherapy for inhibition of radiographic damage [13]. Importantly, these three-arm RCTs have helped to better define 'real-world' use of TNF-Is and firmly established the additional clinical benefit of combination therapy when initiated simultaneously with methotrexate (and thus before methotrexate 'failure'). The early RA RCTs, ERA [12], ASPIRE [10], and PREMIER [11], have confirmed the impressive benefit of combination therapy in early disease, and TEMPO [13] demonstrated that it is not too late to see dramatic improvement in patients with 7 years of disease duration. Notably, methotrexate responses were high in this trial as 40% of subjects had received this DMARD within 6 months, thereby enriching the population with 'successful patients' who could tolerate the studied therapy.
In addition to potential synergy as well as additive efficacy attributed to different mechanisms of action, there are other potential explanations for the impressive benefit of combination biologic agent plus methotrexate. Methotrexate (as well as azathioprine and leflunomide) decreases the immunogenicity [41] of biologic agents and prolongs the half-life of anti-cytokine monoclonal antibodies (other than certolizumab), which may contribute to improved responses and/or responses that are more sustained.

Other trial designs
Other trial designs have been used to minimize or avoid the use of placebo controls. A frequent design in juvenile inflammatory arthritis (JIA) is the randomized withdrawal study, popular in pediatric populations in which the use of placebo is not ethical. This design includes an open-label runin period in which all subjects receive active medication and subsequently those who respond to treatment are randomly assigned to blinded continuation or withdrawal of active medication. Flare in disease activity is measured as the primary outcome, and once it is documented, patients are eligible to receive open-label active therapy. This design was first used with etanercept [50] and has resulted in subsequent approvals for other biologic agents in JIA [51,52]. However, the use of randomized withdrawal studies in adult populations is more controversial, both from an ethical point of view and due to criticism that efficacy may not be definitively demonstrated.

Real-world randomized controlled trials 'treating to target'
Clearly RCTs do not mimic the real-world use of therapies: subjects enrolled in trials are a selected population with few of the comorbidities generally present in RA patients. Studies have confirmed that most patients followed in practice and enrolled in RA registries would not be eligible for clinical trials [53,54]. There are several reasons for this, including the need to identify a responsive population to be able to demonstrate improvement successfully (for example, efficacy and inclusion/exclusion criteria that limit eligible subjects to those without medical conditions that could confound assessment of the safety of the agent). Patients whose RA is successfully controlled on current therapy will tell us little about the benefit of a new agent, nor would it be ethical to remove an efficacious treatment for purposes of ascertaining effect in an RCT. With the addition of so many new agents to our therapeutic armamentarium, it is no surprise that it is hard to find patients for enrollment in RCTs in RA, especially those on background therapy yet with sufficiently active disease. Thus, criteria defining 'active disease' have become more lenient, yet the ranges of baseline joint counts and disease activity in subjects enrolled in most recent trials are still remarkably similar. The introduction of a newly approved therapy into the clinic means it will be used in a broader patient population with more comorbid conditions and concomitant therapies. Efficacy outcomes may thus be less impressive than those observed in an RCT. Furthermore, rare safety events that were not observed in trials may become evident in postmarketing surveillance and/or longitudinal observational studies.
Although RCTs are the gold standard for evaluation of therapeutic efficacy, enrolled patient populations [53,54] and therapeutic protocols do not mimic those seen in the real world. The lack of flexibility to adjust treatment limits extrapolation of their results to real-world use. The advent of 'treatment to target' studies, while not designed for regulatory approval, provides an opportunity to study therapeutic regimens with the flexibility to change treatments -including the effect on patient expectations when treatments are changed. Trials published to date have not been blinded and their designs pose significant challenges: balancing randomization, inability to blind patients or investigators to treatment, lack of an intent-to-treat analysis, and inclusion of relatively small sample sizes. Treatment designs have progressed from initially looking for ACR and/or DAS responses to current goals of achieving 'low disease activity' and 'remission' [55,56] as well as assessing productivity within the home and workplace. The ongoing TEAR (Treatment of Early Aggressive RA) trial in the US is a blinded RCT using a 'treatment to target' approach, with results expected in the near future.
The FinRaCo (Finnish RA Combination Therapy) [56] trial introduced the paradigm of 'treating to target': allowing therapeutic titration in those not achieving a prespecified goal such as 'low disease activity' defined by a DAS of less than 2.4. Including COBRA (Combinatietherapie Bij Reumatoide Artritis) [57] and a large US combination trial [58], these were among the first to clearly demonstrate that early combination therapy was superior to monotherapy. Similarly, the TICORA (Tight Control for RA) [59] trial required aggressive escalation of traditional DMARDs with liberal use of intraarticular corticoidsteroid injections; 'remission' was achieved in 65% of subjects (defined as a DAS of less than 1.6).
The BeSt study was designed to demonstrate whether sequential DMARD monotherapy, step-up combination therapy, or an initial combination regimen including either prednisolone or anti-TNF therapy (infliximab) provided better and more sustained disease control in early RA. The opportunity to perform a trial in 'two dimensions' -using a disease target and a dynamic treatment strategy -led to several findings not previously observed in traditional RCTs. It confirmed that approximately 30% of the subjects receiving methotrexate monotherapy responded well but that further improvement (defined as a DAS of less than 1.4) can be achieved by an additional 40% of participants overall-higher than that achieved in most conventional RCTs. Additionally, the initial use of combination therapy with either TNF-I or DMARDs with high-dose steroids resulted in a more rapid onset of effect and more sustained control of disease activity, including structural benefit at 1 year compared with traditional DMARD monotherapy. Trials based on changes in treatment according to outcomes have demonstrated realworld benefit from aggressively targeting therapies as well as the superiority of biologic over nonbiologic DMARDs not observed in traditional RCTs.

Assessment of safety
Recent experiences with the selective COX-2 inhibitors [61] and other agents removed from the market due to documented liver toxicity [62] have underscored the importance of evaluating the safety of a new therapeutic prior to approval as well as of ensuring continued postmarketing surveillance. It is difficult to estimate adequate sample sizes in RCTs for assessment of safety, a lesson well learned when attempting to demonstrate that gastrointestinal safety of the COX-2s exceeded nonselective NSAIDs [61]. Furthermore, safety signals not evident in RCTs prior to approval may emerge in larger postmarketing trials or surveillance.
International Consensus for Harmonization guidelines for therapies of chronic diseases require that 1000 patients be exposed at the recommended dose, 300 patients for at least 6 months and 100 patients for at least 1 year [8]. Although the first two TNF-Is were approved for use only in patients with active RA, having failed multiple DMARDs with limited databases, their rapid acceptance and broader use prompted the FDA to require larger exposure populations prior to approval of adalimumab and abatacept. Ongoing postmarketing surveillance has further confirmed or identified safety 'signals' not observed in RCTs designed for regulatory approval. One or two cases of opportunistic infections, including tuberculosis or lymphomas, were evident in RCTs with etanercept and infliximab, but larger exposures in realworld use and trials in other clinical indications were required to identify signals for congestive heart failure [63], demyelinating disorders [64,65], and cytopenias [66]. However, there is still great difficulty in sorting agent-specific risk from background disease risk as exemplified by cohort studies demonstrating no increase in or even decreased risk for congestive heart failure with use of TNF-I in RA patients [67]. Although some RCTs of current TNF-I (in populations other than RA) [68] have identified a possible signal for increased risk of lung cancer, this was not observed in any RA trials. Although a meta-analysis [69] of RCTs did support this association, a longitudinal cohort study evaluating over 13,000 patients with RA treated with biologic therapy (>97% of whom were TNF-I users) found no evidence for increased risk for solid tumors over RA patients receiving traditional DMARDs [70]. More recently, the ASSURE trial, an RCT designed to evaluate safety of abatacept, again identified a small but statistically increased signal for lung cancer in those randomly assigned to abatacept [33].
It is generally believed that 2,500 to 3,000 patient years per treatment are required to identify very rare adverse events [8]. Natilizumab (Tysabri™), a monoclonal antibody that inhibits the α4β7 integrin and that is currently approved for treatment of multiple sclerosis and Crohn disease, provides a good example. Soon after approval, three cases of progressive multifocal leukoencephalopathy were reported [71], all occurring in 3,000 patients exposed to this agent in RCTs, an incidence of 0.1%. However, the incidence increased when examining subjects who received this agent in longer-term treatment or in combination with interferon-beta: 2 out of 2,000 treated more than 2 years (0.2%), 2 out of 589 receiving combination therapy (0.34%), and 1 out of fewer than 100 treated more than 3 years (more than 1.0%) [72]. An FDA-required detailed Risk Minimization Action Plan (RISKMAP) has allowed the reintroduction of this agent for the treatment of both clinical indications in the US, although new cases continue to accrue [73]. Such events may be due, in part, to the desire that efficacy be maximized in RCTs -often, biologic agents are administered in 'industrial strength' rather than pharmacologic or physiologic doses and/or at dosing intervals of less than the measured half-life of the agent, potentially resulting in accumulation.
As it has been difficult to identify relatively rare safety 'signals' of potential concern, large safety RCTs have been advocated. Two such RCTs, STAR [32] and ASSURE [33], superimposed use of the new therapeutic, adalimumab and abatacept, respectively, versus placebo on background DMARD therapy in RA. Although some have argued that such studies with a primary endpoint of safety cannot confirm efficacy of the test agent, they have identified the presence of certain safety concerns. As with a pilot and subsequent RCT, combination treatment with anakinra + etanercept resulted in less efficacy and more toxicity [46], and the combination of abatacept + TNF-Is in ASSURE [33] revealed an increased incidence of serious infections as well as lung cancer.
Registries established to monitor biologic therapies in RA have contributed significantly to our ability to confirm and further quantify risks potentially associated with traditional and biologic DMARD therapies and promise to do so in other rheumatic diseases. Thus, the FDA now recommends that new treatments be studied in well-characterized populations with adequate exposure and recommends labeling limited to use in these types of patients. It is expected that broader realworld use and subsequent trials in other populations will allow expanded use of the agent.

Conclusions and future directions
Much progress has occurred over the past decade in trial design in RA. These include the following: • Establishment of a clear regulatory path to demonstrate efficacy of a new therapeutic • The use of 'pure' placebo beyond 12 to 16 weeks has been demonstrated to be unethical. Thus, background therapy and early rescue have become regular practice.
• The recognition that identification of rare adverse events associated with a new therapeutic requires largeexposure databases and continuing postmarketing surveillance, including establishment of registries.
• Postapproval trials, especially 'treating to target' designs, are more relevant to identify real-world use of new and existing therapeutics.
Not all DMARDs or biologic agents behave as expected, and thus far biomarkers have not permitted earlier prediction of therapeutic efficacy. Although RCTs remain the gold standard for demonstrating efficacy of a new therapeutic, it is expected that shorter duration trials with better 'early' outcomes will facilitate efficient clinical development. Also, trials in patients with early RA, even undifferentiated arthritis, will push the envelope of treatment with current therapies and promising agents to come. We have much to look forward to in the next decade of clinical development in rheumatology.