Tolerability and adverse events in clinical trials of celecoxib in osteoarthritis and rheumatoid arthritis: systematic review and meta-analysis of information from company clinical trial reports

The objective was to improve understanding of adverse events occurring with celecoxib in the treatment of osteoarthritis and rheumatoid arthritis. Data were extracted from company clinical trial reports of randomised trials of celecoxib in osteoarthritis or rheumatoid arthritis lasting 2 weeks or more. Outcomes were discontinuations (all cause, lack of efficacy, adverse event, gastrointestinal adverse event), endoscopically detected ulcers, gastrointestinal or cardio-renal events, and major changes in haematological parameters. The main comparisons were celecoxib (all doses) versus placebo, paracetamol (acetaminophen) 4,000 mg daily, rofecoxib 25 mg daily, or nonsteroidal anti-inflammatory drugs (NSAIDs) (naproxen, diclofenac, ibuprofen, and loxoprofen). For NSAIDs, celecoxib was compared both at all doses and at licensed doses (200 to 400 mg daily). Thirty-one trials included 39,605 randomised patients. Most patients had osteoarthritis and were women of average age 60 years or above. Most trials lasted 12 weeks or more. Doses of celecoxib were 50 to 800 mg/day. Compared with placebo, celecoxib had fewer discontinuations for any cause or for lack of efficacy, fewer serious adverse events, and less nausea. It had more patients with dyspepsia, diarrhoea, oedema, more adverse events that were gastrointestinal or treatment related, and more patients experiencing an adverse event. There were no differences for hypertension, gastrointestinal tolerability, or discontinuations for adverse events. Compared with paracetamol, celecoxib had fewer discontinuations for any cause, for lack of efficacy, or diarrhoea, but no other differences. Compared with rofecoxib, celecoxib had fewer patients with abdominal pain and oedema, but no other differences. Compared with NSAIDs, celecoxib had fewer symptomatic ulcers and bleeds, endoscopically detected ulcers, and discontinuations for adverse events or gastrointestinal adverse events. Fewer patients had any, or a gastrointestinal, or a treatment-related adverse event, or vomiting, abdominal pain, dyspepsia, or reduced haemoglobin or haematocrit. Discontinuations for lack of efficacy were higher. No differences were found for all-cause discontinuations, serious adverse events, hypertension, diarrhoea, nausea, oedema, myocardial infarction, cardiac failure, or raised creatinine. Company clinical trial reports present much more information than published papers. Adverse event information is clearly presented in company clinical trial reports, which are an ideal source of information for systematic review and meta-analysis.


Introduction
Arthritis is a common, progressive condition, which is associated with considerable pain and inflammation, and has a strong impact on quality of life. It is the major reason for hip or knee replacements [1].
It is more prevalent in women than men, and in older people. One community-based study [2] conducted in Scotland showed that 25% of patients had arthritis by age 65. Of these, a quarter had pain that was highly disabling and at least moderately limiting. A further quarter had pain that was more severe. In a UK general practice survey of patients' perspectives in osteoarthritis [3], a quarter of responders reported some dissatisfaction with their treatment and another quarter stated that their pain control was poor. High levels of negative impact were associated with inability to walk, bathe, dress, or This systematic review and meta-analysis of celecoxib in osteoarthritis and rheumatoid arthritis was conducted using information from company clinical trial reports, supplied by Pfizer Ltd, of completed randomised, double-blind trials from the celecoxib clinical trials programme. The objectives were to examine tolerability, minor and major adverse events, and endoscopically detected ulceration associated with celecoxib in arthritis.

Materials and methods
Randomised, double-blind, controlled trials, of 2 weeks' duration or longer with any dose of celecoxib and any comparator, in osteoarthritis or rheumatoid arthritis, were supplied as company clinical trial reports by Pfizer Ltd. Open-label extension studies were not included. A declaration was signed by Pfizer that all completed (by December 2003) trials of relevance from the celecoxib clinical trial programme had been made available. A protocol for the review and analysis, including definitions of outcomes, was agreed beforehand.
Financial support was provided by Pfizer Ltd, with the provision that all relevant trial reports completed by December 2003 were made available, and that the authors were free to publish their findings whatever the outcome of the review. Other funding was from Pain Research funds of the Oxford Pain Relief Trust. No funding source had any role in deciding what to publish, when to publish, or where to publish it.

Trials
Thirty-one Phase II, III, and IV clinical trial reports of celecoxib in osteoarthritis or rheumatoid arthritis were provided for evaluation. All compared celecoxib in various dosing regimens with placebo, paracetamol (acetaminophen) 4,000 mg/day, rofecoxib 25 mg/day, or an NSAID commonly used in the treatment of arthritis. Comparator NSAIDs were given at the maximum licensed dose; these were naproxen 1,000 mg, ibuprofen 2,400 mg, diclofenac 100 to 150 mg, and loxoprofen 180 mg daily. Details of the included trials are in Table 1.

Trial inclusion and exclusion criteria
Patients were adults who had a clinical diagnosis of osteoarthritis or rheumatoid arthritis that was symptomatic, usually of 3 months' duration or longer, and required long-term treatment with anti-inflammatory drugs or other analgesics for the control of pain. Further details of inclusion and exclusion criteria for both osteoarthritis and rheumatoid arthritis can be found in Additional file 1.

Trial methods
Eligible patients typically entered a pretreatment period of up to 14 days, during which baseline observations were conducted. Nonstudy NSAIDs and other analgesics were discontinued, with the exception of aspirin (up to 325 mg daily) and paracetamol (up to 2 g per day for a maximum of 3 days but not within 48 hours of arthritis assessments), which were permitted for reasons other than control of arthritis pain. Other drugs specifically excluded were antibiotics for Helicobacter pylori eradication, metronidazole, anticoagulants, lithium, and anti-ulcer drugs including proton pump inhibitors, H2 antagonists, antacids, sucralfate, and misoprostol.
Patients were randomised under double-blind conditions to receive oral celecoxib, paracetamol, rofecoxib, an NSAID, or placebo. Several studies had both an active and a placebo comparator, and several compared different fixed dose regimens of celecoxib. Table 1 shows the study treatments, dosing, and number and baseline characteristics of patients for the individual trials. All trials conformed to good clinical practice guidelines. Table 1 Included studies of tolerability, adverse events, and endoscopically detected ulceration associated with celecoxib in arthritis Drug

Information collected on adverse events
In all studies, information was collected on patients who experienced any adverse event, serious adverse events, adverse events relating to body systems, and discontinuations. Information was collected on the occurrence of endoscopically detected ulcers and erosions from those trials in which all patients were scheduled to have endoscopy before and at various times during treatment. Definitions used in the trials were those of the World Health Organization (Adverse Reaction Terminology). The definitions used in this review are in Additional file 2.

Meta-analysis
Outcomes chosen for the meta-analysis Outcomes chosen related to adverse events and tolerability. These included discontinuation (all-cause, lack of efficacy, adverse event, and gastrointestinal adverse event), patients with any adverse event, patients with any treatment-related adverse event, and patients with any serious adverse event.
For gastrointestinal adverse events, we included an overall measure of gastrointestinal tolerability as well as individual gastrointestinal adverse events of nausea, vomiting, abdominal pain, dyspepsia, diarrhoea, and ulcers or bleeds. Treatmentemergent ulcers and bleeds were analysed together because of their important sequelae. Endoscopically detected ulcers

R649
were taken from reports in which all patients in the trial had endoscopy with the specific intent of measuring endoscopic lesions, and where this was a prime outcome in the trial. They were additionally analysed according to the concomitant use of low-dose aspirin.
Specific cardio-renal adverse events included cardiac failure, hypertension, raised creatinine, and oedema at any body site. Analysis of oedema by body site, or hypertension by subcategory, was not carried out, as event numbers were too low for practicable analysis.

Trial quality and validity
Three authors independently read each clinical trial report and scored the reports for reporting quality and validity. Disagreements were discussed and consensus achieved. Trials were scored for quality using a three-item, 1-to 5-point scale [12], and at least two points, one each for randomisation and double blinding, were required for inclusion. Trials were scored for validity using an eight-item, 16-point scale [13]; there was no minimum requirement for inclusion in the systematic review.

Analysis
Guidelines for quality of reporting of meta-analyses were followed where appropriate [14].
The prior intention was to pool data where there was clinical homogeneity, with similarity in terms of patients, dose, duration, outcomes, and comparators. It was recognised, however, that this could lead to a large number of comparisons, with small numbers of events, where random chance could dominate effects of treatment on adverse events [15].
The main issues were the comparator treatments in trials and the dose of celecoxib. Pooling of data was therefore restricted to comparison between celecoxib and placebo, paracetamol, rofecoxib, and NSAIDs, because each comparator had a different mechanism of action from any other. In addition, analysis of celecoxib against all active comparators combined was carried out. For active comparisons, most of the information was likely to reside in those between celecoxib and NSAIDs, and we chose to perform two analyses: comparisons of all doses of celecoxib with all doses of NSAIDs, and between licensed daily doses of celecoxib and licensed doses of NSAIDs. NSAIDs were used at licensed doses, usually at maximum daily dose, and rofecoxib was used at 25 mg daily.
Information for osteoarthritis and rheumatoid arthritis was combined because the number of patients in trials with rheumatoid arthritis was small. Though there are differences between the conditions, notably age of onset, there are no clear reasons why treatment-emergent adverse events should differ between conditions. Analysis of celecoxib dose, and of duration of studies, was restricted to discontinuations due to lack of efficacy or to adverse events, where there were more than 20 events, and where the outcome had direct clinical relevance.
Analysis of data could potentially be performed in two ways. The simplest method would be to combine the absolute proportions of patients experiencing an adverse event, using the intention-to-treat population (randomised, at least one dose of drug) as the denominator. This method has a potential disadvantage of not taking into account different durations of studies, and possible different exposures between treatments because of different withdrawal rates. An alternative method would be to calculate adverse events as the rate of events occurring per year of exposure, theoretically taking both different durations and differential exposure into account.
This second method was impractical for several reasons. Trial reports generally did not have information to allow calculation of median duration of use. For instance, they reported neither average days of use nor individual days of use, so that an average could not be calculated. The reports generally had information on compliance, and generally there was no significant difference between celecoxib and its comparators. The two largest trials, with over half the patients, gave patient years of exposure in the trial reports, and these were identical for celecoxib and NSAID. In a separate analysis of cardiovascular events in celecoxib trials, which included 30,000 of the 40,000 patients in this review, there were negligible differences between treatment durations [16].
Outcomes were pooled in an intention-to-treat (number of patients randomised and receiving at least one dose of trial drug) analysis. Homogeneity tests and funnel plots, though commonly used in meta-analysis, were not used here because they have been found to be unreliable [17][18][19]. Instead clinical homogeneity was examined graphically [20]. Relative benefit (or risk) and number-needed-to-treat (or harm) were calculated with 95% confidence intervals. Relative risk was calculated using a fixed effects model [21], with no statistically significant difference between treatments assumed when the 95% confidence intervals included unity. We added 0.5 to celecoxib and comparator arms of trials in which at least one arm had no events. Number-needed-to-treat (or harm) was calculated by the method of Cook and Sackett [22], using the pooled number of observations.
Adverse outcomes were described in terms of harm or prevention of harm, as follows. When significantly fewer adverse events occurred with celecoxib than with a control substance (placebo or active), we used the term 'the number-needed-totreat to prevent one event' (NNTp). When significantly more adverse events occurred with celecoxib than with an active comparator (paracetamol, rofecoxib, NSAID) we used the term 'number-needed-to-treat to harm one patient' (NNH).

Trials
Clinical reports of 31 randomised trials -21 in osteoarthritis, 4 in rheumatoid arthritis, and 6 in mixed osteoarthritis or rheumatoid arthritis -were provided for the analysis. Full company study reports for 23 trials contained 180,000 pages. These were comprehensive documents including detailed methods and results sections, tables, and figures. Appendices provided descriptions of the outcome measurement tools used, individual patient outcomes, compliance, case report forms, detailed statistical analyses, and protocol amendments. Full clinical trial reports were not available for eight trials, but extensive clinical trial summaries were provided. Information was extracted directly from the clinical trial reports or summaries.
All trials scored the maximum of five points for quality (Table  1), since they clearly described withdrawals in addition to the methods of randomisation and double blinding. All studies also scored the maximum of 16 points on the validity scale.

Patients and adverse events
Details of the patients included in the trials are in Table 1. In most trials, the majority of patients were women whose average age was 60 years or above (range 17 to 96 years). The relevant medical history, notably about NSAID intolerance or gastrointestinal symptoms after use of NSAIDs and about use of prophylactic low-dose aspirin, was usually reported. Three trials (002, 149, 181) specifically recruited patients with stable, treated hypertension in addition to arthritis. Patients were predominantly Caucasian, but several studies specifically recruited only Asian participants, or those of mixed Asian, Afro-Caribbean, or Hispanic descent.
The adverse event outcomes measured in each trial are detailed in Additional file 3. All of the adverse events were those reported by trial investigators, and none was reported after independent, blinded adjudication.
Adverse events were measured by recording treatment-emergent events, clinical laboratory test results, or changes from baseline in vital signs found by physical examination. At each follow-up visit, patients were asked if they had experienced any symptoms not associated with their arthritis. Patients and study personnel were blinded to the identification of medication throughout the study, and if randomisation blind was broken, the patient was removed from the study.

Discontinuation
Details of discontinuations are shown in Table 2. All-cause and lack-of-efficacy discontinuations were less frequent with celecoxib than with placebo or paracetamol. Adverse-event and gastrointestinal-adverse-event discontinuation (Fig. 1)

Figure 1
Scatter plot of trials comparing celecoxib with NSAID for discontinua-tions due to gastrointestinal adverse events Scatter plot of trials comparing celecoxib with NSAID for discontinuations due to gastrointestinal adverse events. Celecoxib at any dose is represented. The red symbol represents the longest trial, at 52 weeks. GI, gastrointestinal; NSAID, nonsteroidal anti-inflammatory drug. Percent GI discontinuations with celecoxib Percent GI discontinuations with NSAID 0 10000 20000 was less frequent with celecoxib than with NSAIDs (licensed dose or any dose) or any active comparator. All-cause discontinuations were also less frequent with any dose of celebcoxib compared with NSAID or any active comparator. Licensed doses of celebcoxib were not significantly different. Celecoxib did not differ from rofecoxib. The NNTp to prevent discontinu-ation due to lack of efficacy was 9 (8 to 11) compared with placebo, and 27 (14 to 390) compared with paracetamol. Licensed doses of celecoxib had an NNTp of 74 (47 to 180) for discontinuations due to an adverse event, and an NNTp of 58 (42 to 98) for discontinuations due to a gastrointestinal adverse event, compared with NSAIDs. Table 3 Discontinuations of treatment in arthritis because of lack of efficacy or adverse events  Proportions discontinuing because of lack of efficacy or adverse events varied according to drug, dose, and duration. Regarding duration, for instance, discontinuation because of gastrointestinal adverse events was higher for NSAIDs than celecoxib in the one 52-week trial and in trials of shorter duration (Fig. 1).
The details for all 39,605 patients in all trials are shown in Table 3. Discontinuation because of lack of efficacy was high with placebo, 18% over 2 to 6 weeks and 46% by 12 weeks. Effective treatment with licensed doses of celecoxib or NSAIDs reduced discontinuations due to lack of efficacy, with evidence of a dose-response for celecoxib over the range of 100 to 400 mg daily.
There was considerable variation between individual trials regarding discontinuations due to lack of efficacy at 12 weeks, for celecoxib and naproxen. The variability seemed unrelated to condition, and no sensible reason presented itself.
Discontinuations due to adverse events were low with placebo (6% at 12 weeks), little different with celecoxib, and somewhat higher with NSAIDs (Tables 2 and 3). In trials of 24 weeks or longer, discontinuations due to adverse events with 800 mg celecoxib, 100/150 mg diclofenac, and 2,400 mg ibuprofen were between 22% and 26%.

Any adverse event
The proportion of patients reporting any adverse event was of the order of 50% (Table 4). Patients taking celecoxib reported adverse events more frequently than those taking placebo (NNH 15; 11 to 21), and less frequently than with NSAIDs (NNTp 18; 14 to 23 for licensed doses) or any active comparator. There was no difference between celecoxib and either paracetamol or rofecoxib.

Treatment-related adverse events
About one-third of all reported adverse events were considered to be treatment related (Table 4). There was no difference between celecoxib and paracetamol or rofecoxib. More patients taking celecoxib than placebo had a treatment-related adverse event (NNH 71; 39 to 450). Fewer patients experienced a treatment-related adverse event with celecoxib than with NSAID (NNTp 24; 19 to 31 for licensed doses) or any active comparator.

Serious adverse events
The proportion of patients with a serious adverse event was low, averaging 1 to 3% (Table 4). Fewer patients taking celecoxib than placebo had serious adverse events (NNTp 280; 120 to 790). There was no difference in serious adverse event rates for celecoxib compared with paracetamol, rofecoxib, NSAID (Fig. 2), or any active comparator (Table 4). Serious adverse events occurred more often, at 6%, in the single 52-week trial than in trials of shorter duration (Fig. 2), but not more often than with NSAID.

Any gastrointestinal adverse event
The proportion of patients reporting any gastrointestinal adverse event was of the order of 25% (Table 4). More patients taking celecoxib than placebo reported a gastrointestinal adverse event (NNH 14; 12 to 19). There was no difference between celecoxib and either paracetamol or rofecoxib. Celecoxib had fewer patients reporting any gastrointestinal adverse event than either NSAID (NNTp 12; 10 to 13 for licensed doses) or any active comparator.

Gastrointestinal tolerability
Gastrointestinal tolerability (the proportion of patients having moderate or severe nausea, dyspepsia, or abdominal pain) was about 5% with celecoxib (Table 5). There was no difference between celecoxib and placebo, paracetamol, or rofecoxib. Celecoxib had less gastrointestinal intolerance than NSAIDs (NNTp 28; 24 to 36 for licensed doses of celecoxib) or any active comparator.

Nausea
The proportion of patients reporting nausea was about 3% with celecoxib (Table 5a). Nausea was significantly lower with celecoxib than placebo (NNTp 155; 71 to 840), and for celecoxib at any dose compared with NSAID or any active comparator. There was no difference between celecoxib and paracetamol, or rofecoxib, or between licensed doses of celecoxib and NSAIDs.

Figure 2
Scatter plot of trials comparing any dose of celecoxib with NSAID for serious adverse events Scatter plot of trials comparing any dose of celecoxib with NSAID for serious adverse events. The red symbol represents the longest trial, at 52 weeks. AE, adverse events; NSAID, nonsteroidal anti-inflammatory drug.  (Table 5a). There was no difference between celecoxib and placebo, paracetamol, or rofecoxib. Celecoxib at both licensed dose and any dose had fewer patients with vomiting than NSAID (NNTp 173; 115 to 350 for licensed doses) or any active comparator.

Abdominal pain
The proportion of patients reporting abdominal pain was about 5% with celecoxib (Table 5b). There was no difference between celecoxib and placebo, or paracetamol. Celecoxib (any dose) produced less abdominal pain than rofecoxib 25 mg (NNTp 67; 35 to 920). Celecoxib at both licensed dose and any dose had fewer patients reporting abdominal pain than NSAID (NNTp 41; 32 to 57 for licensed doses) or any active comparator.

Dyspepsia
The proportion of patients reporting dyspepsia was about 7% with celecoxib (Table 5b). Celecoxib (any dose) produced more dyspepsia than placebo (NNH 46; 32 to 84). There was no difference between celecoxib and paracetamol, or rofecoxib. Celecoxib at both licensed and any dose had fewer patients reporting dyspepsia than NSAID (NNTp 61; 43 to 100 for licensed doses) or any active comparator.

Diarrhoea
The proportion of patients experiencing diarrhoea was about 6% with celecoxib (Table 5b). Celecoxib (any dose) produced more diarrhoea than placebo (NNH 53; 37 to 97). Celecoxib (any dose) produced less diarrhoea than paracetamol 4,000 mg (NNTp 41; 22 to 450). There was no difference between celecoxib and rofecoxib, or between celecoxib (at the licensed dose or any dose) and NSAID, or any active comparator.

Clinical ulcers and bleeds
Clinical ulcers and bleeds in the company clinical trial reports were as reported by investigators, and were not subjected to independent, blinded adjudication in trials where this was not a primary outcome. The proportion of patients having a clinical ulcer or bleed was under 0.5% with celecoxib (Table 5b). No analysis was possible for clinical ulcers and bleeds for the comparisons between celecoxib and placebo, paracetamol, and rofecoxib, as there were only three events, no events, and one event, respectively. Celecoxib at both the licensed dose and any dose had fewer patients with clinical ulcers and

Clinical ulcers and bleeds
Celecoxib  Table 6 Cardio-renal adverse events reported in studies of patients treated for arthritis bleeds than NSAID (NNTp 250; 170 to 450 for licensed doses) or any active comparator.

Myocardial infarction
Myocardial infarction in the company clinical trial reports was as reported by investigators, and was not subjected to independent, blinded adjudication. The numbers of reported myocardial infarctions in each arm of each trial are given in Additional file 3.
The proportion of patients having a myocardial infarction was under 0.3% with celecoxib (Table 6). No analysis was possible for myocardial infarction for the comparisons between celecoxib and placebo, paracetamol, and rofecoxib, as there were only 10 events, no events, and 1 event, respectively. Proportions for celecoxib at both the licensed dose and any dose were not significantly different from NSAID, any active comparator, any active comparator excluding rofecoxib, or any comparator, including both rofecoxib and placebo.
The numbers of events were small, with fewer than 60 cases of myocardial infarction in the largest comparison. Most trials had either no cases of myocardial infarction, or a single case in one of the treatment arms. No analysis demonstrated a statistical difference between celecoxib and any comparator ( Table 6)

Cardiac failure
The proportion of patients with cardiac failure was under 0.2% with celecoxib (Table 6). No analysis was possible for the comparisons between celecoxib and placebo, paracetamol, and rofecoxib, as there were only 5 events, no events, and 10 events, respectively. Proportions for celecoxib at both the licensed dose and any dose were not significantly different from NSAID or any active comparator.

Raised creatinine
For the incidence of creatinine raised to 1.3 times the upper limit of normal or more, data were available only for the comparisons between celecoxib and placebo, celecoxib at licensed doses and NSAID, and celecoxib compared with any active comparator. There were no significant differences ( Table 6). The proportion of any patient having raised creatinine was up to 1% with celecoxib.

Hypertension and aggravated hypertension
This outcome combined a new diagnosis of hypertension with aggravated hypertension in patients with an existing diagnosis of hypertension, but in whom changed or additional treatment was needed for control of hypertension. The proportion of any patient having hypertension or aggravated hypertension was 1 to 2% with celecoxib (Table 6). There was no significant difference between celecoxib and any comparator, placebo, rofecoxib, or NSAIDs. For paracetamol there were only four events.

Oedema at any site
Oedema was reported in various ways in the trials, occasionally just as oedema, sometimes broken down by body site. The proportion of patients with oedema was usually about 3% (   5 to 10% in another trial (002) in patients with osteoarthritis, diabetes, and hypertension, also with oedema as a predefined end point.
Celecoxib was associated with significantly more oedema than placebo (NNH 79; 54 to 145). Celecoxib was no different from paracetamol. Celecoxib (200 mg daily) had significantly less oedema than rofecoxib (25 mg daily), with an NNTp of 14 (10 to 25). Celecoxib at licensed doses or at any dose was no different from NSAID for oedema (Fig. 3), but was significantly better than any active comparator (NNTp 62; 48 to 87).

Haemoglobin fall of 20 g/L or more
This parameter was not reported in studies comparing celecoxib with paracetamol or rofecoxib. The incidence of a haemoglobin fall of 20 g/L or more was about 1% with celecoxib (Table 7). There was no difference between celecoxib and placebo. Celecoxib at both the licensed dose and any dose had a lower incidence than NSAID (NNTp 92; 66 to 150 for licensed doses) or any active comparator.

Figure 3
Scatter plot of trials comparing any dose of celecoxib with NSAID for oedema Scatter plot of trials comparing any dose of celecoxib with NSAID for oedema. The red symbol represents the longest trial, at 52 weeks. NSAID, nonsteroidal anti-inflammatory drug.  Table 7 Analysis of changes to haematological parameters in patients treated for arthritis Haematocrit fall of 5% or more This parameter was not reported in studies comparing celecoxib with paracetamol. The incidence of a haematocrit fall of 5% or more was about 10% with celecoxib (Table 7). There was no difference between celecoxib and placebo or rofecoxib. Celecoxib at both the licensed dose and any dose had a lower incidence than NSAID (NNTp 18; 14 to 25 for licensed doses) or any active comparator.

Endoscopically detected ulcers
Seven trials were designed to ascertain the presence of endoscopically detectable ulcers of 3 mm or more; in these, celecoxib was compared with placebo and/or NSAID (Additional file 4). Six reported at 12 weeks, and one at 24 weeks. Five trials also reported results according to the use of low-dose aspirin of 325 mg or less daily. These results are shown in Table  8 and Fig. 4, analysed across all patients and according to aspirin use. In no comparison was there any significant difference between celecoxib and placebo. For both celecoxib and NSAID, there was the same 6% absolute increase in endoscopically detected ulcers with aspirin use. Celecoxib, at both the licensed dose and any dose, always produced more endoscopically detected ulcers than NSAID. The NNTp was the same at 7 to 8 both with and without concomitant aspirin use.

Deaths
There were 28 deaths during the trials or within 28 days of stopping medication, of which 21 were cardio/cerebrovascular, 1 was of unknown cause, and 6 were due to other causes.

Figure 4
Endoscopically identified ulcers in patients taking celecoxib and NSAID, with and without prophylactic low-dose aspirin Endoscopically identified ulcers in patients taking celecoxib and NSAID, with and without prophylactic low-dose aspirin. NSAID, nonsteroidal anti-inflammatory drug.

Discussion
There have been a number of systematic reviews of published papers of coxibs in arthritis, and several have examined specific adverse events. Serious upper gastrointestinal events in phase II and III studies were reported for rofecoxib [23] and celecoxib [24]. Others have looked at renal [25] or cardiac adverse events [26]. Cochrane reviews of cyclooxygenase inhibitors in rheumatoid arthritis have limited information to date on efficacy and safety of rofecoxib [27], and only five trials with 5,400 patients taking celecoxib [28]. Two previous systematic reviews of coxibs used company clinical trial reports. Deeks and co-workers [29] examined 15,000 patients in nine of the earlier trials of celecoxib, and Edwards and co-workers [30] examined some 5,700 patients in nine trials of valdecoxib.
Reviews looking at adverse events generally [29,30] have analysed adverse events by combining the absolute proportions of patients experiencing an adverse event, using the intention-totreat population (randomised, at least one dose of drug) as the denominator. Those examining particular, rare adverse events (gastrointestinal bleeding, cardiovascular events) have tended to use exposure correction, together with independent blinded adjudication of the event [16,25,26].
This systematic review greatly increases the quantity and quality of information available on adverse events with celecoxib in arthritis. We had data from 31 trials, with almost 40,000 patients. The individual trials all scored the maximum on two systems for scoring reporting quality and validity in pain trials. Use of similar methods for collecting and reporting adverse events ensured data of uniform nature and quality.
The average age in the trials was about 60 years (Table 1), but there was a wide range (17 to 96 years). Several studies recruited special groups, for instance, patients with diabetes or hypertension, or patients who were solely Asian, or of mixed Asian, Afro-Caribbean, or Hispanic descent. Most trials documented relevant medical history, such as previous NSAID use or intolerance, or use of prophylactic low-dose aspirin. While non-Caucasians were under-represented, and many patients with significant comorbidities were excluded from the trials, this population is probably as representative as possible in clinical trials.
This gives credibility to the review in terms of size, quality, and validity, allowing us to make sense of all but the most rare adverse event. At the same time, there are limitations.
Multiple comparisons could be made, including condition treated, duration of study, comparator drug, and dose. Ideally all these would be tested by sensitivity analysis. We limited our analyses to comparator and dose to avoid excessive subdivision and proliferation of statistical testing, which can lead to spurious statistical significance [31]. Analysis by condition or duration was avoided because few patients (8%) were in trials with rheumatoid arthritis only, and few observations (23%) were made in trials lasting less than 12 weeks. Instead we concentrated on analysis by comparator, where there was the possibility of major differences based on large amounts of high-quality experimental evidence, and on dose. Most doses were in the licensed range, but for completeness we chose to perform analyses of celecoxib versus NSAID by all doses, and those within the licensed range.
Generally, trial reports indicate that World Health Organization Adverse Reaction Terminology criteria were used to define adverse events, but these are not immediately accessible. For any particular treatment-emergent adverse event, we have had to assume that the same criteria were used consistently both within and between trials. Although adequate clinical trial monitoring makes this highly probable, we have no positive evidence that this was the case. Definitions are problematical for reporting adverse events [32,33].
The statistical direction of the results for each adverse event outcome and each comparison is shown in Fig. 5.
In comparison with placebo (10,000 patients), celecoxib had fewer all-cause and lack-of-efficacy discontinuations, but more adverse events. Lower discontinuations result from greater efficacy, but an active drug at an effective therapeutic dose is likely to produce some adverse events. Importantly, there was no difference in gastrointestinal tolerability or endoscopically detected ulceration.
Only two trials (1,056 patients) compared celecoxib (200 mg/ day) with paracetamol 4,000 mg/day. There were fewer allcause and lack-of-efficacy discontinuations with celecoxib, and almost identical adverse event profiles, indicating better efficacy with no excess harm. It is worth noting that recent large randomised comparisons of paracetamol with placebo over 12 weeks have failed to show any better efficacy for paracetamol than placebo [34].
Five trials (2,671 patients) compared celecoxib (200 mg/day) with rofecoxib (25 mg/day). Celecoxib had less abdominal pain and oedema. Rofecoxib is another cyclooxygenase-2 selective inhibitor, and similarity between their adverse event profiles is to be expected.
In the comparisons with NSAIDs, the better adverse event profile of celecoxib was marked, both at licensed doses (23,000 patients) and any dose (31,000 patients). There were more discontinuations for lack of efficacy with celecoxib at licensed doses than with NSAIDs, balanced by fewer adverse-event discontinuations or gastrointestinal-adverse-event discontinuations. There were fewer adverse events overall, treatmentrelated adverse events, combined and individual gastrointestinal adverse events, with the exception of diarrhoea, but including gastrointestinal tolerability, and endoscopically detected ulcers. There were also possible benefits relating to loss of blood in the lower gastrointestinal tract, with fewer patients having falls in haemoglobin or haematocrit. These results again are expected, and are similar to results for celecoxib, valdecoxib, and rofecoxib in recent analyses and a trial [35][36][37].
Cyclooxygenase-2 selective inhibitors are known to produce fewer upper gastrointestinal ulcers and bleeds [38][39][40][41][42], and less gastrointestinal upset [43], than NSAIDs. The results here confirm this for celecoxib. For gastrointestinal tolerability (moderate or severe nausea, dyspepsia, or abdominal pain), one patient fewer would suffer for every 28 treated with celecoxib than with NSAID. One in 17 would not have a haematocrit fall of 5% or more.
The lack of difference between celecoxib and NSAIDs with regard to cardio-renal adverse events is not unexpected. There are no known benefits for cyclooxygenase-2 selective inhibi-tors over nonspecific inhibitors relating to cardiac or renal function, and the known associations between NSAID use and renal failure [8] and heart failure [10,11] are likely to apply to cyclooxygenase-2 selective inhibitors.
Endoscopically detected ulcers were affected both by whether celecoxib or NSAID was used, and by whether or not prophylactic low-dose aspirin was used ( Table 8). The number-needed-to-treat to prevent one endoscopically detected ulcer was about 7, with or without aspirin. The protective effect of celecoxib was the same whether aspirin was present or not, and use of aspirin increased endoscopically detected ulcers by the same absolute incidence of 6%. This was nearly identical to results found in a systematic review of studies of valdecoxib in arthritis [30], but different comparisons make it difficult to know whether rofecoxib is different [37] (Fig. 6). The much lower incidence of endoscopically detected ulceration with celecoxib compared with NSAID reflected a similar result for rofecoxib [44,45], though the rofecoxib studies had no patients using aspirin. What is clear is that celecoxib plus low-dose aspirin produces no more endoscopically detected ulcers than NSAID without aspirin, and fewer than NSAID plus aspirin.
On maximum-dose NSAID, or celecoxib, or paracetamol, up to 30% of patients withdrew from treatment. The main reasons were lack of efficacy or adverse events. Withdrawals increased with duration of study, as would be expected ( Table  2). They were also influenced by drug and dose (Table 2), though small numbers of events hindered comparisons. The tendency for fewer withdrawals with celecoxib than NSAID mirrors what has been found in clinical practice [46], though not in clinical trials of valdecoxib [30], based on many fewer patients than in this review. Overall medical costs of cyclooxygenase-2 selective inhibitors are not different from those of NSAIDs [46,47], because higher acquisition costs of cyclooxygenase-2 selective inhibitors appear to be balanced by higher costs of treating or preventing adverse events with NSAIDs.
Even with as large a data set as here, some rare but serious adverse events occur in so few people that it is difficult to determine whether apparent differences (significant or nonsignificant) between treatments are real or meaningful. Examples are cardiac failure, myocardial infarction, and death, with total maximum numbers of 55, 59, and 28 respectively. The incidence of these events was of the order of 0.3 per 1,000 patients to 2 per 1,000 patients. Cardiac failure and death with celecoxib were lower than with NSAIDs (but not significantly), while myocardial infarction rates were higher (but not significantly). Incidence may be additionally affected by exposure bias, different exposure with different treatments ( Table 2). Analysis correcting for exposure bias may then be more appropriate [16], even though there appears to be little exposure bias between celecoxib and NSAIDs in arthritis trials.
Where adverse events are rare, even large numbers of patients in randomised, controlled trials will accumulate few events. If such trials are of relatively short duration, then there is even less opportunity to accumulate these rare events. In the 31 trials in this review, the longest duration of exposure was an average of about 7 months, and most were less than 3 months. The consequence is a residual uncertainty, as here for attributing higher risk of myocardial infarction with celecoxib than with other non-coxib comparators. Limitations in number of events and duration of constituent trials means that any possible relationship between celecoxib and myocardial infarction cannot be completely dispelled by these data alone, despite lack of a statistically significant difference.

Conclusion
This review of a large number of randomised trials and patients provides more accurate estimates of frequency and more confidence in the adverse event pattern. These are likely to be the minimum expected in clinical practice, where the population may be sicker, or take more medications, than in clinical trials.
Using company clinical trial reports removes some of the problems of selective reporting in published papers due to strict word limitations. Here the company clinical trial reports and extensive trial summaries provided about five pages of information per patient. While efficacy in published studies is poorly presented [48], it is available in clinical reports [49]. Information about adverse events is even more poorly presented in published papers [50], but it is clearly presented in company clinical trial reports. Company clinical trial reports represent an ideal source of information for systematic review and meta-analysis.