Endoscopic ulcers as a surrogate marker of NSAID-induced mucosal damage
© BioMed Central Ltd 2013
Published: 24 July 2013
Skip to main content
© BioMed Central Ltd 2013
Published: 24 July 2013
The characteristic of a biomarker that makes it a useful surrogate is the ability to identify a high risk of clinically important benefits or harms occurring in the future. A number of definitions or descriptions of surrogate definition have been put forward. Most recently the Institute of Medicine of the National Academy of Sciences in the USA has put forward an evaluation scheme for biomarkers, looking at validation (assay performance), qualification (assessment of evidence), and utilisation (the context in which the surrogate is to be used). This paper examines the example of endoscopy as a surrogate marker of NSAID-induced mucosal damage using the Institute of Medicine criteria. The article finds extensive evidence that the detection of endoscopic ulcers is a valid marker. The process of qualification documents abundant evidence showing that endoscopic ulcers and serious upper gastrointestinal damage are influenced in the same direction and much the same magnitude by a variety of risk factors and interventions. Criticisms of validation and qualification for endoscopic ulcers have been examined, and dismissed. Context is the key, and in the context of serious NSAID-induced upper gastrointestinal harm, endoscopic ulcers represent a useful surrogate. Generalisability beyond this context is not considered.
Establishing a biomarker as an effective surrogate, something measureable now but indicative of some later important clinical event, is both important and difficult. The often-quoted ideal of a surrogate marker is the blood cholesterol level. We know that if this level is elevated, there is an increased risk of future serious cardiovascular harm, including death, and that reducing cholesterol levels reduces that risk of serious harm. For example, the Scandinavian Simvastatin Survival Study randomised patients with clinically established coronary heart disease to 5 years of simvastatin or placebo . The statin produced significant and large reductions in blood lipids, as well as a 24% reduction in coronary mortality over 10 years, but with a number-needed-to-treat of about 50, meaning that there were 2% fewer coronary deaths when a statin was used .
In 2001 a National Institutes of Health working group defined surrogacy based on prediction of a more serious outcome from epidemiologic, therapeutic, pathophysiologic or other scientific evidence :
A biological marker (biomarker) is a characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes or pharmacologic responses to a therapeutic intervention.
A clinical endpoint is a characteristic or variable that reflects how a patient feels, functions or survives.
A surrogate endpoint is a biomarker that is intended to substitute for a clinical endpoint. A surrogate endpoint is expected to predict clinical benefit (orharm, or lack of benefit or harm) based on epidemiologic, therapeutic, pathophysiologic or other scientific evidence.
Analytical validation: analyses of available evidence on the analytical performance of an assay.
Qualification: assessment of available evidence on associations between the biomarker and disease states, including data showing effects of interventions on both the biomarker and clinical outcomes.
Utilisation: contextual analysis based on the specific use proposed and the applicability of available evidence to this use. This analysis includes a determination of whether the validation and qualification conducted provide sufficient support for the use proposed.
The document is mainly aimed at determining how the US Food and Drug Administration should consider evidence around the use of biomarkers or surrogates. The publication has tested the methodology on several possible surrogates: tumour size for cancer, C-reactive protein, troponin, β-carotene, and low-density and high-density lipoprotein for cardiovascular risk. The message from each case study is that context is the key; utility for one purpose may well not mean utility for all.
For low-density lipoprotein cholesterol, 'the high probability that lowering LDL [low-density lipoprotein] for several interventions decreases risk of cardiovascular disease, and LDL, although not perfect, is one of the best biomarkers for cardiovascular disease' . What is important here is that the panel considered low-density lipoprotein cholesterol a useful surrogate despite the fact that beneficial changes may occur for the surrogate in most patients, but benefits in terms of clinical events occur only for a few.
Determining the true value of a surrogate is not easy. Some approaches have taken a distinctly statistical approach [5, 6]. Other studies are more philosophical [7–9]. The message that context is the key can be read into all of the various approaches to defining what is a surrogate, and how to evaluate whether a putative surrogate really is one.
The fact that evaluation is needed can be seen from a short search of the literature demonstrating the extent of interest in surrogate endpoints. Of the (almost) 3,600 papers with surrogate in the title found using PubMed, fewer than 100 also mention validation or validity in the title or abstract. Those papers that did examine the validity of potential surrogate markers or markers actually used as surrogates often found the evidence lacking. For example, surrogate endpoints used in liver surgery trials were generally not validated , and a simple walking test used in hypertension trials did not explain the treatment effect .
A wide-ranging systematic review evaluated the evidence that endoscopic ulcer may be a useful surrogate for more serious clinical harm from NSAIDs, and concluded that it was a strong candidate . Other researchers disagreed . This paper revisits the evidence in light of the Institute of Medicine report .
It is widely believed that there is a biological progression from lesser to more severe gastrointestinal damage with NSAIDs: from dyspepsia and other gastrointestinal symptoms, through endoscopic erosion and asymptomatic ulcers detected endoscopically, to ulcer complications (bleeding and perforation), and even to death [14, 15]. Asymptomatic ulcers can also bleed.
Endoscopic ulcers may be an early step in a biological progression from mucosal injury to symptomatic ulcer and ulcer complication. These complications include the following:
Obstruction complicating peptic ulcers: this is a function of the ulcer's anatomical location in which lesions involving the pylorus are more likely to present with obstruction than those in the gastric corpus.
Perforation complicating peptic ulcers: like obstruction, this also depends on anatomical location; most perforating ulcers occur in the duodenum.
Bleeding: this is less predictable, but is increasingly seen in association with anti-thrombotic agents.
Death: with improved resuscitation and endoscopic therapy techniques, this complication largely depends on comorbidity. But while mortality from upper gastro intestinal bleeding has fallen substantially over recent years, bleeding associated with NSAIDs retains a higher mortality, above 10% .
The three parts of the process involve validation, qualification, and utilisation. There are broad general requirements in each of these three sections; different candidate markers will have different characteristics, but the aims can briefly be stated as follows:
Analytical validation is as an assessment of assays for the biomarker and their measurement performance characteristics, determining the range of conditions under which the assays will give reproducible and accurate data. Put simply for our purposes, is upper gastrointestinal endoscopy an accurate and reliable test for the development of serious upper gastrointestinal events?
Strength of association: large relative risk, or odds ratio.
Consistency: relationship seen in different populations or circumstances.
Specificity: exposure causes only specific effect.
Temporal relationship: exposure precedes the event.
Biological gradient: a dose-response relationship.
Plausibility: biological plausibility.
Coherence: the cause-and-effect interpretation of data should not seriously conflict with the generally known facts of the natural history and biology of the disease.
Experiment: does removing the exposure lessen the effect?
Analogy: comparison between weaker and stronger evidence, or strong evidence of causality between another exposure and similar effect.
Utilisation is a contextual analysis of the available evidence about a biomarker with regard to the proposeduse of the biomarker. This part considers how the surrogate marker will be used in very specific circumstances. If the circumstances change, so might the evaluation of the biomarker. In other words, a useful surrogate marker in one circumstance may not be a useful surrogate in another. Generalisation has always to be justified.
Perhaps because endoscopy is a simple test, with the operator seeing a lesion, these methods have not been subjected to the same intense scrutiny that might have accompanied a new blood test, for instance. The common definition used has been that an endoscopic ulcer has to be a gastric or duodenal lesion ≥3 mm (sometimes ≥5 mm) with significant depth, although the depth is not defined.
Commentators have cast doubt on the reliability of methods for detection of endoscopic ulcers [13, 18], in part because of the paucity of data demonstrating inter-observer accuracy and precision. Studies on gastroduodenal ulcer scars examined endoscopically and reporting differences between operators might be seen as supporting interobserver disagreement as a problem . There are two main lines of criticism - that experienced endoscopists disagree about whether endoscopic ulcers are real ulcers, and that there has been a shift in prevalence of endoscopic ulcers over time because of a lack of training and consistency between endoscopists.
The first criticism  comes from a short abstract describing three experienced endoscopists viewing a training tape in a blinded fashion . There was a 100% agreement with obvious ulcers and trivial lesions. The endoscopists considered that only one-third of endoscopic ulcers (≥3 mm in greatest dimension, with depth) were actual ulcers. That is a fair criticism at one level, but in a sense it misses the point - the definition of an endoscopic ulcer for the purposes of acting as a surrogate need not be the same as that of a real ulcer.
The second criticism was that, in a systematic review of endoscopic ulcers in placebo arms of NSAID trials , early studies had no endoscopic ulcers while later studies had a significant rate of endoscopic ulcers, pointing to a failure of training . The authors of the review have made their own eloquent defence , but there are other powerful arguments against the criticism.
The picture is one of a low prevalence of 1.1% in 559 healthy subjects given placebo in short-duration studies, and a 3.8% prevalence in 2,368 patients with osteoarthritis or rheumatoid arthritis given placebo in long-duration studies - typically six times longer than those for healthy subjects. The variability in Figure 1 is a reflection of chance effects in small populations , and given the small size of many of these patient groups the picture is one of consistency. The result for healthy subjects is similar to the 1% recorded in 619 healthy controls in northern Norway more than 20 years ago .
Despite a lack of formal tests for accuracy and precision for the endoscopic detection of ulcers, the weight of evidence from large amounts of data is that there is no cause for concern about the test.
Summary of the consistent effects of various risk factors and ulcer prevention strategies with NSAIDs
Influence on outcomes
Effect on endoscopic ulcers
Effect on upper gastrointestinal bleeding
RCTs show incidence of ulcers to increase with age, more than threefold over five decades
Patients over 75 years old, 2.5-fold increased risk of upper gastrointestinal complications
History of previous ulcer or bleeding
Previous history increases risk of ulcer fourfold in RCTs
Previous history increases risk of bleeding fivefold in RCTs; observational studies support this
Meta-analysis of RCTs shows 60% decrease in ulcers
Meta-analysis of RCTs shows 80% decrease in ulcers
Dose-related increase in ulcers over range of 81 to 325 mg daily
Low-dose aspirin is associated with increased risk of bleeding events
Aspirin plus coxibs or NSAIDs
Low-dose aspirin increases ulcer rates with placebo, coxib, and NSAID in RCTs
Low-dose aspirin increases bleeding rates when added to coxibs or NSAIDs
Ulcer prevention strategies used with NSAIDs
Misoprostol reduced ulcers by 70% in a meta-analysis of RCTs
Misoprostol reduced bleeds by 40% in a meta-analysis of RCTs
Histamine-2 receptor antagonists
Histamine-2 receptor antagonist therapy reduced ulcers by 60% in a meta-analysis of RCTs
Histamine-2 receptor antagonist therapy reduced bleeds by 30 to 40% in two observational studies
Proton pump inhibitors
Proton pump inhibitor therapy reduced ulcers by 60% in a meta-analysis of RCTs
Observational studies support reduced risk of upper gastrointestinal complications and bleeding with proton pump inhibitor
Pooled analysis across RCTs indicates that coxibs reduce ulcers by about 70% compared with NSAIDs
Pooled analysis across RCTs indicates about a 40 to 50% reduction in ulcer complications with coxibs; observational studies show a consistent 50% reduction or more in upper gastrointestinal bleeding events
Increased risk of upper gastrointestinal bleeding with NSAIDs in two meta-analyses of observational studies
Relative risk or odds ratio
Hernández-Díaz and Rodríguez (1990s, ≥80,000 patients)
Massó González and colleagues (2000 to 2008, ≥40,000 patients)
Ibuprofen ≤,2400 mg
2.1 (1.6 to 2.7)
2.7 (2.4 to 3.0)
Diclofenac ≤100 mg
3.1 (2.0 to 4.7)
4.0 (3.5 to 4.4)
Naproxen ≤1,000 mg
3.5 (2.8 to 4.3)
5.2 (4.3 to 6.2)
Piroxicam ≤20 mg
5.6 (4.7 to 6.7)
9.3 (7.5 to 11)
Current NSAID use
4.2 (3.9 to 4.6)
4.6 (4.3 to 4.9)
Figure 5 shows consistency in 6-month incidence rates in this population of patients in different studies; three studies had consistent gastrointestinal bleeding rates of 4 to 5% for celecoxib. Recurrent bleeding rates for other therapies varied from 19% for naproxen in the absence of any effective gastroprotective strategy to about 6% for diclofenac plus omeprazole, and 0% for celecoxib plus omeprazole. This observation is interesting, of course, because it shows how different strategies influence potentially serious harm.
This observation is also interesting because one of these trials measured both bleeding events and endoscopic ulcers in the same study . The definition of bleeding was prespecified. The endoscopic evaluations were carefully done: a single operator performed all endoscopic examinations in a treatment-blinded fashion to avoid between-observer variation. An ulcer was defined as a circumscribed mucosal break ≥5 mm in diameter with a perceptible depth. With diclofenac plus omeprazole, endoscopic ulcer incidence was 1.4 times higher than for celecoxib; the incidence of recurrent bleeds was 1.3 times higher. This again argues for consistent effects of different therapies on both endoscopic ulcers and gastrointestinal bleeding events. This study is important for three key reasons:
The study is the only one in which both the clinical outcome and surrogate marker were measured together.
Upper gastrointestinal bleeding was determined against prespecified criteria.
A single operator determined the presence of gastroduodenal ulcers using a larger size than is the norm, meaning that these endoscopic ulcers were not trivial.
In populations where risk of bleeding is lower, the number of bleeding events is so small that it is impractical to measure them in the same study. High-risk populations such as this offer an ethical approach to confirming links between a putative surrogate and a clinical endpoint.
There do not appear to be any black swans - evidence contradicting the general finding that influences on bleeding events and endoscopic ulcers are coherent in direction and magnitude. A suggestion that sulindac, a non steroidal anti-inflammatory prodrug, elevates bleeding events but not endoscopic ulcers is only weakly supported. There is good evidence of sulindac elevating bleeding events ; the evidence for a lack of effect on endoscopic ulcers derives from one study lacking data  and from another on 15 healthy subjects given sulindac for 7 days .
Overview of the evidence for each of the Hill criteria
Strength of association
There is a strong association between use of aspirin or NSAIDs and the development of both endoscopic ulcers and clinical bleeding events. Protective strategies have a large effect in preventing both events
There is a consistent effect across a range of different risk factors and interventions
Exposure to aspirin or NSAIDs causes a spectrum of gastrointestinal harm, but these are found without exposure. The link between aspirin and NSAIDs is specific only because of the elevation of the incidence rates
Exposure to aspirin or NSAIDs precedes harm
There is a consistent dose response with aspirin and NSAIDs, with higher doses and longer use increasing the incidence rates of the harms
There is a biological underpinning for upper gastrointestinal harm with aspirin and NSAIDs
The consistent effect of aspirin or NSAIDs on a broad spectrum of upper gastrointestinal harms, from symptoms, to endoscopic findings, to serious bleeding events, is evidence of coherence
A broad range of preventative therapies (misoprostol, histamine antagonists, proton pump inhibitors, coxibs) with different mechanisms of action all demonstrate significant reduction of harm with aspirin or NSAIDs
Detection of ulcers endoscopically in circumstances where aspirin or NSAIDs are not causative (for example where there may be infection with Helicobacter pylori) would be regarded as a marker of high risk for developing more serious ulcer disease with bleeding
All of the evidence put forward to justify the surrogate nature of gastroduodenal endoscopic ulcers is in the context of harm from the use of aspirin or NSAIDs. The use of endoscopic ulcers as a surrogate would be justifiable in the context, for example, of new preventative measures being used with established NSAIDs, especially those known already to be associated with either serious upper gastrointestinal clinical events, or endoscopic ulcer, or both. Examples would be any of the new combination products of traditional NSAID plus proton pump inhibitor, as in naproxen plus esomeprazole , or NSAID plus histamine-2 receptor antagonist, as in ibuprofen plus famotidine .
With increasing use of gastroprotection in the community, and guidance that gastroprotection with proton pump inhibitors should be used even with coxibs , the incidence of bleeding events may fall to the point where clinical trials without gastroprotection become unethical. In Japan, a large increase in the use of proton pump inhibitors has resulted in a precipitous fall in bleeding rates, and particularly deaths from a bleeding event .
Whether gastroduodenal endoscopic ulcers could justify a surrogate status in any other context could only be considered on a case-by-case basis. Each of the various stages of validation, qualification, and utilisation would need to be revisited for that specific context, and the Hill criteria also revisited.
This paper has sought to re-examine whether the evidence we have justifies using gastroduodenal endoscopic ulcers as a surrogate for serious upper gastrointestinal bleeding events within the context of the use of aspirin or NSAIDs. The article has followed a pathway for the evaluation of biomarkers and surrogate endpoints in chronic disease, building on a previous review that predated this pathway. Two important conclusions stand out.
There is a strong case for considering endoscopic ulcers as a surrogate for NSAID-induced gastrointestinal harm. These are valid measurements, supported by a wealth of evidence linking endoscopic with more serious upper gastrointestinal harm, within the context of the use of aspirin or NSAIDs. Criticisms of the original findings have been considered, and rejected. The weakness originally identified - the absence of an observation of the direct progression from endoscopic ulcers to ulcer complications - remains.
The structure suggested in the Institute of Medicine report has provided a constructive and focused way of examining this particular example of putative surrogacy. Separating the validity of the measurement, the qualification of the evidence of association, and the context or contexts in which the assumption of surrogacy is valid represents an important methodological statement that has worked well in this case, as it did in case studies in the report.
The Institute of Medicine evaluation process considered that if the biomarker-clinical endpoint relationship persisted over multiple interventions, it may be thought to be more generalisable. The evidence of links with age, previous ulcer history, and H. pylori infection do not involve NSAIDs, and offer the prospect of generalisability to other contexts. This evaluation examined links with a number of risk factors and interventions, but within the context of NSAID use. Context is the key, and those other contexts require their own separate evaluations.
In the end, and despite efforts to the contrary, decisions on whether any marker is a useful or justifiable surrogate retain an element of subjectivity or even bias. Even Austin Bradford Hill, in his influential 1965 address to the Royal Society of Medicine that examined differences between association and causation, admitted that: 'In asking for very strong evidence I would, however, repeat emphatically that this does not imply crossing every "t", and swords with every critic, before we act' . The Institute of Medicine advice offers a mechanism to be systematic in assessing the strength and nature of evidence before the swords come out.
In the context of NSAID-induced upper gastrointestinal harm, endoscopic ulcers appear to be a valid surrogate marker. The Institute of Medicine evaluation schema has been proven to give direction and focus to evaluating a surrogate marker.
Determining whether a biomarker is a useful, acceptable, or valid surrogate for a future beneficial or harmful event is complex, has been the subject of a number of approaches, and retains a degree of subjectivity.
Surrogates are useful when they are relatively common or early in a biological pathway, but the clinical event is rare and/or late.
The Institute of Medicine of the National Academy of Sciences in the USA has put forward an evaluation scheme for biomarkers, looking at validation (assay performance), qualification (assessment of evidence of association), and utilisation (a description of the context in which the surrogate has utility).
This evaluation scheme has been applied to the example of endoscopy (particularly gastroduodenal ulcers) in the context of NSAID-induced gastrointestinal harm.
Considerable evidence indicated that endoscopic ulcers were a valid measure.
Considerable evidence indicated that there was a strong association between endoscopic ulcers and serious gastrointestinal harm, with a variety of risk factors and interventions influencing them in the same direction and with a similar magnitude.
In the context of NSAID-induced upper gastrointestinal harm, endoscopic ulcers appear to be a valid surrogate marker.
The Institute of Medicine evaluation schema proved to give direction and focus to evaluating a surrogate marker.
nonsteroidal anti-inflammatory drug.
This review has drawn heavily on previous work with other authors , and the author recognises their contribution.
This article has been published as part of Arthritis Research & Therapy Volume 15 Suppl 3, 2013: 'Gastroprotective NSAIDS'. The full contents of the supplement are available online at http://arthritis-research.com/supplements/15/S3. The supplement was proposed by the journal and developed by the journal in collaboration with the Guest Editor. The Guest Editor assisted the journal in preparing the outline of the project but did not have oversight of the peer review process. The Guest Editor serves as a clinical and regulatory consultant in drug development and has served as such consultant for companies which manufacture and market NSAIDs including Pfizer, Pozen, Horizon Pharma, Logical Therapeutics, Nuvo Research, Iroko, Imprimis, JRX Pharma, Nuvon, Medarx, Asahi. The articles have been through the journal's standard peer review process. Publication of this supplement has been supported by Horizon Pharma Inc. Duexis (ibuprofen and famotidine) is a product marketed by the sponsor.