The ACR20 and defining a threshold for response in rheumatic diseases: too much of a good thing
© BioMed Central Ltd 2014
Published: 3 January 2014
Skip to main content
© BioMed Central Ltd 2014
Published: 3 January 2014
In the past 20 years great progress has been made in the development of multidimensional outcome measures (such as the Disease Activity Score and ACR20) to evaluate treatments in rheumatoid arthritis, a process disseminated throughout rheumatic diseases. These outcome measures have standardized the assessment of outcomes in trials, making it possible to evaluate and compare the efficacy of treatments. The methodologic advances have included the selection of pre-existing outcome measures that detected change in a sensitive fashion (in rheumatoid arthritis, this was the Core Set Measures). These measures were then combined into a single multidimensional outcome measure and such outcome measures have been widely adopted in trials and endorsed by the American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR) and regulatory agencies. The secular improvement in treatment for patients with rheumatoid arthritis has been facilitated in part by these major methodologic advancements. The one element of this effort that has not optimized measurement of outcomes nor made it easier to detect the effect of treatments is the dichotomization of continuous measures of response, creating responders and non-responder definitions (for example, ACR20 responders; EULAR good responders). Dichotomizing response sacrifices statistical power and eliminates variability in response. Future methodologic work will need to focus on improving multidimensional outcome measurement without arbitrarily characterizing some patients as responders while labeling others as non-responders.
Prior to 1990 in rheumatology and especially in rheumatoid arthritis (RA), trials tested the efficacy of treatments using outcome measures that varied from trial to trial. One trial might assess 12 outcomes related to symptoms and signs of disease (for example, joint counts, pain, erythrocyte sedimentation rate, morning stiffness), while another might include as many as 15, yet these outcomes might be different from the ones measured in the first trial. Because so many different outcomes were assessed with no primary outcome, the meaning of trial results when one or two of the outcomes showed efficacy for a treatment was unclear. Further, it was not possible to compare the efficacy of treatments across trials because each trial generally used its own set of outcome measures. In trial reports authors could report evidence that a treatment’s efficacy was superior to placebo if 1 of 12 outcome measures showed a significant effect of treatment, whereas in another trial report in the same journal, authors could suggest that the same treatment was not efficacious if 2 or 3 of the outcomes showed significant efficacy over placebo. The lack of standardization across trials and the use of multiple comparisons made it impossible to identify which drugs were actually efficacious and how they compared with one another. In addition, many of the outcome measures used in these trials were not sensitive to change and would not have shown efficacy even if the treatment worked terrifically well. Further, the same outcome measures were not always assessed using the same techniques, so that the sensitivity to change of one of the measures might be different in one trial versus another.
With that background, an international group of rheumatologists meeting under the auspices of the American College of Rheumatology (ACR) collected data from randomized trials of second line drugs in RA and carried out a series of analyses that examined, among trials of known effective drugs, which of the outcome measures being used were likely to show efficacy . Among the commonly used outcome measures that were unlikely to show that effective treatments actually worked were proximal interphalangeal circumference, walk time, functional class (graded 1 through 4), hemoglobin, grip strength and morning stiffness. Morning stiffness was not sensitive to change because it was absent in many patients with RA, making it impossible for them to experience an improvement when treated with an effective drug . Among the outcome measures that were found to be most sensitive to change were the patient global assessment, tender joint count and, in trials of second line drugs, swollen joint count and erythrocyte sedimentation rate.
American College of Rheumatology disease activity measures for rheumatoid arthritis clinical trials: Core Set
Disease activity measure
Tender joint count
Swollen joint count
Patient’s assessment of pain
Patient’s global assessment of disease activity
Physician’s assessment of physical function
Patient’s assessment of physical function
Acute-phase reactant value
For trial duration ≥1 year and agent being tested as a ‘DMARD’, also perform:
Radiography or other imaging technique
With this list of seven measures, the committee had standardized RA outcome assessment and decreased the number of outcome measures. However, trials still assessed seven measures, often with all as primary outcomes, and there needed to be a single measure that reflected the breadth of RA activity, including both physician-measured assessments and patient-reported outcomes. With this in mind, an international committee again assembled and tested a variety of possible definitions of improvement. Using different thresholds and combinations of core set measures, the committee chose a definition that showed the greatest sensitivity to change. Other factors considered by the committee included ease of use, and accord with rheumatologists’ impressions of improvement. The ACR definition of improvement  (often called the ACR20 because it requires at least a 20% improvement in the core set measures for a patient to reach improvement) was promulgated and has been widely adopted in RA trials. A little later, the European League Against Rheumatism (EULAR) also developed their own definition of response , which broke improvement into three categories and, unlike the ACR definition, required both a low level of disease and a certain degree of improvement for a patient to be characterized as having good improvement. Subsequent work has suggested that the ACR20 and the EULAR definition of improvement perform compbly , and many trials have included both, choosing one of the measures as a primary outcome and reporting the other as a secondary outcome. Importantly, the US Food and Drug Administration also recommended the ACR20 as a preferred outcome measure for testing the efficacy of new drugs for RA with respect to signs and symptoms of disease. Since most trials in RA are carried out by industry, this endorsement by the Food and Drug Administration was a critical element to the widespread dissemination and use of the ACR20. Even now , the ACR20 is probably the most widely used outcome measure in RA trials.
With the success and widespread use of the ACR20 came the desire among rheumatologists studying other rheumatic diseases to have similar standardized definitions of response and improvement. In the few years after the ACR20 was published, similar efforts were undertaken for juvenile RA, osteoarthritis, low back pain, psoriatic arthritis, and spondyloarthropathies; more recently, efforts for myositis and vasculitis have plleled earlier efforts with a focus on developing a uniform set of measures for trial outcomes and sometimes defining a threshold for improvement.
It is not surprising that the promulgation of a rationally selected core set of outcome measures and its consolidation into one multidimensional measure of response has occurred contemporaneously with the improvement of treatments in rheumatic disease. Making uniform and efficient the measurement of response in rheumatic disease has facilitated the comparison of new and conventional treatments. For example, the ACR20 and variations on this measurement tool have been used to assert that anti-tumor necrosis factor inhibitors perform as well or better than conventional treatments in RA , an argument that would have been difficult to make with the old chaotic scheme of multiple measurements. Also, meta-analyses have convincingly demonstrated that some new therapies for RA did not work as well as either conventional or new biologic agents [8–10]. These treatments shown to be less efficacious have then lost favor in the marketplace.
Beneficial and detrimental effects of Core Set and ACR20 on trials in rheumatoid arthritis
Selected outcome measures most likely to change with treatment
Dichotomized a continuous measures of response
Made uniform trial outcome measures across studies, making comparisons possible
Decreased the number of outcomes from >10 to 7 and then to 1 (ACR20)
Unfortunately, one effect of this process has not been beneficial (Table 2). In developing a definition of response, the ACR Committee and other rheumatic disease study groups have used thresholds to define response. Often clinically based, these thresholds initially seemed like a wonderful way of communicating the effect of a new treatment, that a certain number of patients would experience improvement when treated. The problem is that taking a continuous measure and arbitrarily cutting it so as to create a dichotomous response/non-response measure, called 'responder analyses', sacrifices statistical power and inflates the number of patients needed to evaluate the efficacy of treatments. Due mostly to their loss of power, responder analyses are discouraged in the clinical trials literature , and in a recent position paper, the Pharmaceutical Research and Manufacturers of America (PhRMA) have advised against use of these analyses . The loss of power in these analyses has been repeatedly shown in simulation studies  and has been the subject of prominent editorials in clinical journals . As noted by Altman and Royston , responder analyses lead to several problems. First, statistical power is reduced; they estimate that it is equivalent to discarding one-third of the data collected. This is especially inadvisable when only small numbers of patients can be recruited, an especially acute problem in some rare rheumatic diseases like myositis, vasculitis and scleroderma. Generally speaking, the use of a dichotomized response/non-response measure should be discouraged in studies of these diseases and probably in other rheumatic disease trials too. Altman and Royston and the PhRMA position paper also note other problems introduced by responder analysis, including an underestimation of the degree of variation between groups with variation subsumed within each response group and yet made invisible when the response is dichotomized. Individuals close to each other, but on opposite sides of the response cut point, are characterized as being very different rather than similar.
With the enlarging armamentarium of effective treatments in RA, the need to compare the efficacy of treatments will intensify. Small differences would be expected and use of a dichotomous measure of response would demand very large sample sizes to compare treatments. This goal could be accomplished more efficiently with a continuous outcome measure. Further, if only small numbers of patients are needed to test a treatment in a subgroup of persons with RA (or among those with other rheumatic disorders), a continuous outcome measure will facilitate the testing of treatment without demanding impractically large sample sizes. Given these anticipated needs, an ACR committee once again assembled and created a new outcome measure based on the ACR20 called the ACRHybrid. With the ACRHybrid a patient’s response is based mostly on their average percentage improvement in the core set measures with the caveat that average improvement is adjusted based on whether it satisfies the ACR20, 50 or 70. While endorsed by the ACR , the ACRHybrid has yet to be used as a primary outcome measure in any large-scale RA trial. This measure or another continuous measure would permit the definitive evaluation of the comptive efficacy of RA treatments and would facilitate evaluation of how regimens compare in terms of efficacy. The continued use of dichotomous measures to evaluate these issues has made the evaluation of therapeutic uncertainties more challenging at a time when it is increasingly necessary to determine which of our new agents is more efficacious.
While dichotomous measures sacrifice statistical power and can hide valuable information about treatment response, this does not mean that clinical investigators should avoid defining important dichotomous outcomes like the minimally important clinical improvement or disease activity low enough to be acceptable to patients. It just means, especially for trials of treatments of uncommon rheumatic diseases, comptive RA trials and other similar situations, that these dichotomous measures should not be used as primary outcomes. Recommendations on how to define these dichotomous outcomes can be found elsewhere .
Beyond RA, the continued development and use of dichotomous measures of response in rheumatic diseases may be sacrificing our ability to detect whether treatments are efficacious. While core set measures need to be developed for rheumatic disease trials and these ought to follow the process used for RA, the final step of that process should be to identify a single multidimensional outcome on a continuous scale.
The past 20 years have witnessed huge advances not just in the armamentarium of treatments available for RA but in the use of valid and responsive measurement tools to assess their effectiveness. Selecting outcome measures sensitive to change, consolidating these into single measures and adopting standardization of measurement across trials has facilitated the assessment of treatments. The dichotomization of treatment response has unfortunately not produced major benefits and should be jettisoned in favor of a primary assessment of treatment efficacy that utilizes continuous response measures.
David T Felson MD MPH is Professor of Medicine and Epidemiology at Boston University Schools of Medicine and Public Health. He chaired the ACR committee that defined a core set of outcome measures for use in RA trials and that developed the preliminary definition of improvement in RA (also called the ACR20). More recently, he co-chaired an ACR/EULAR effort to define remission in rheumatoid arthritis. Professor Felson also has an active research program in osteoarthritis. He has received the Henry Kunkel Young Investigator Award and the Clinical Research Award from the ACR and the Howley Prize from the Arthritis Foundation for his research.
Dr LaValley has a PhD in Statistics from the Pennsylvania State University and completed a post-doctoral fellowship in Biostatistics at the Harvard University School of Public Health. In 1995 he was hired as a biostatistician for the Boston University Arthritis Center and as an Assistant Professor of Biostatistics at the Boston University School of Public Health. In 2008 he became Professor of Biostatistics, and since 2010 has served as the Research Director of the Center for Enhancing Activity and Participation among Persons with Arthritis (ENACT) at Boston University. His main areas of interest are meta-analysis, clinical trial methods, longitudinal data analysis, logistic regression and survival analysis.
American College of Rheumatology
European League Against Rheumatism
Pharmaceutical Research and Manufacturers of America
The authors appreciate the technical assistance of Anne Plunkett.