Performance of gout definitions for genetic epidemiological studies: analysis of UK Biobank
Arthritis Research & Therapy volume 19, Article number: 181 (2017)
Many different combinations of available data have been used to identify gout cases in large genetic studies. The aim of this study was to determine the performance of case definitions of gout using the limited items available in multipurpose cohorts for population-based genetic studies.
This research was conducted using the UK Biobank Resource. Data, including genome-wide genotypes, were available for 105,421 European participants aged 40–69 years without kidney disease. Gout definitions and combinations of these definitions were identified from previous epidemiological studies. These definitions were tested for association with 30 urate-associated single-nucleotide polymorphisms (SNPs) by logistic regression, adjusted for age, sex, waist circumference, and ratio of waist circumference to height. Heritability estimates under an additive model were generated using GCTA version 1.26.0 and PLINK version 1.90b3.32 by partitioning the genome.
There were 2066 (1.96%) cases defined by self-report of gout, 1652 (1.57%) defined by urate-lowering therapy (ULT) use, 382 (0.36%) defined by hospital diagnosis, 1861 (1.76%) defined by hospital diagnosis or gout-specific medications and 2295 (2.18%) defined by self-report of gout or ULT use. Association with gout at experiment-wide significance (P < 0.0017) was observed for 13 SNPs with gout using the self-report of gout or ULT use definition, 12 SNPs using the self-report of gout definition, 11 SNPs using the hospital diagnosis or gout-specific medication definition, 10 SNPs using ULT use definition and 3 SNPs using hospital diagnosis definition. Heritability estimates ranged from 0.282 to 0.308 for all definitions except hospital diagnosis (0.236).
Of the limited items available in multipurpose cohorts, the case definition of self-report of gout or ULT use has high sensitivity and precision for detecting association in genetic epidemiological studies of gout.
Accurate case definition is important for epidemiological studies. However, in multipurpose cohort studies frequently used for genetic epidemiological studies of gout, limited information is usually available for case definition. Many different combinations of available data have been used to identify gout cases in large genetic studies. For example, in the Global Urate Genetics Consortium study, the largest genome-wide association study (GWAS) of hyperuricaemia and gout reported to date, 15 different definitions of gout were used .
Population genetic studies frequently require large numbers of participants to achieve adequate statistical power, because common variants typically exert small effects on risk of disease. Within a study population, accurate case definition improves study power by maximising the number of true cases and minimising the number of falsely attributed disease-free control participants . Consistent case definition is important for analyses that pool genetic data from different studies, as well as for those analyses that aim to replicate reported genetic associations.
Authors of a recent analysis of the Study for Updated Gout Classification Criteria (SUGAR), using synovial fluid confirmation of monosodium urate crystals as the gold standard for gout definition, reported that the definition of self-report of gout or urate-lowering therapy (ULT) use had the best test performance characteristics of existing definitions used in epidemiological studies . The aim of the present study was to determine the performance of case definitions of gout using the limited items available in multipurpose cohorts, including self-report of gout or ULT use, for population-based genetic studies.
This research was conducted using the UK Biobank Resource (approval number 12611) . Data from the first tranche of UK Biobank genotyping and imputation data were used for this analysis (made publicly available in May 2015). Inclusion criteria were European ethnicity, age 40–69 years and genome-wide genotypes available. Exclusion criteria were self-reported sex mismatch with genetic sex, genotyping quality control failure, related individuals, either a primary or secondary hospital diagnosis of kidney disease (International Classification of Diseases, Tenth Revision (ICD-10), codes I12, I13, N00-N05, N07, N11, N14, N17–N19, Q61, N25.0, Z49, Z94.0, Z99.2), participants aged 70 years and over, and those with kidney disease, because these are risk factors for secondary gout.
Gout definitions and combinations of these definitions were identified from previous epidemiological studies [1, 3, 5]. Self-report of gout was defined by reporting of gout by the participant at the time of the study interview. Hospital diagnosis of gout was defined by either primary or secondary hospital discharge coding for gout (ICD-10 code M10, including sub-codes). Use of ULT required self-report of being on any of allopurinol, febuxostat or sulphinpyrazone and not having a hospital diagnosis of leukaemia or lymphoma (ICD-10 codes C81–C96). Winnard-defined gout was hospital diagnosis of gout or gout-specific medication (ULT or colchicine) as reported by Winnard et al. . For participants who did not meet any gout definitions, further exclusion criteria were corticosteroid use, non-steroidal anti-inflammatory drug use or probenecid use.
UK Biobank samples had been genotyped using an Axiom array (820,967 markers; Affymetrix, Santa Clara, CA, USA) and imputed to approximately 73.3 million single-nucleotide polymorphisms (SNPs) using SHAPEIT3 and IMPUTE2 with a combined UK10K and 1000 Genomes reference panel. Logistic regression of SNPs against gout as the outcome was performed, adjusting for age, sex, waist circumference, and ratio of waist circumference to height. We analysed 30 urate-associated SNPs reported by Köttgen et al. in the large (>140,000 European participants) Global Urate Genetics Consortium GWAS . Data were reported on the basis of number of SNPs detected at both genome-wide significance (P < 5 × 10−8) and experiment-wide significance (P < 0.0017). CIs for proportions were calculated using the Wilson score method and www.openepi.com . Heritability estimates were compared using the formula h1-h2 (se = sqrt(se1^2 + se2 ^2)).
Heritability estimates under an additive model were generated using GCTA version 1.26.0  and PLINK version 1.90b3.32  by partitioning the genome. To reduce computational time, a smaller control cohort of 10,000 individuals was randomly generated from the UK Biobank and used for each set of cases. SNPs were filtered for deviation from Hardy-Weinberg equilibrium (P > 1 × 10−6) and minor allele frequency >0.01. A genetic relationship matrix was created for each chromosome, which was then used to calculate heritability assuming a prevalence of gout of 2% in the general population.
Data including genome-wide genotypes were available for 105,421 participants. Demographic and clinical data for the entire study group are shown in Table 1. Mean age was 56.87 years; 49.18% participants were male; and mean body mass index was 27.36 kg/m2.
Figure 1 shows the number of cases identified by each gout definition. There was substantial overlap between most definitions. However, for those who met the hospital diagnosis criteria, 126 (33.0%) of 382 did not meet the self-report of gout or ULT use definition.
Table 2 shows the prevalence of gout identified by each gout definition in the entire study population and in men and women. The hospital diagnosis definition detected the fewest number of cases (n = 382, study population prevalence 0.36%). Definitions including self-report of gout detected significantly more cases than other definitions, with the definition of self-report of gout or ULT use detecting the highest number of cases (n = 2295, study population prevalence 2.18%).
Analysis of the urate-associated SNPs described by Köttgen et al.  showed similar ORs for all gout definitions (Fig. 2, Table 3). However, the number of SNPs associated with gout at genome-wide or experiment-wide significance differed depending on gout case definition. Association with gout at genome-wide significance (P < 5 × 10−8) was observed for five SNPs (ABCG2, SLC2A9, GCKR, SLC17A3 and SLC22A12) with gout defined by self-report of gout or ULT use, five SNPs (ABCG2, SLC2A9, GCKR, SLC17A3 and SLC22A12) with gout defined by self-report of gout, four SNPs (ABCG2, SLC2A9, GCKR and SLC17A3) with gout defined by the Winnard definition , three SNPs (ABCG2, SLC2A9 and GCKR) with gout defined by ULT use and two SNPs (ABCG2 and SLC2A9) with gout defined by hospital diagnosis.
Association with gout at experiment-wide significance (P < 0.0017) was observed for 13 SNPs with gout defined by self-report of gout or ULT use, for 12 SNPs with gout defined by self-report, for 11 SNPs with gout defined by the Winnard definition, for 10 SNPs with gout defined by ULT use, and for 3 SNPs with gout defined by hospital diagnosis (Table 3). The heritability estimates (i.e., proportion of variance in gout explained by common inherited genetic variants under an additive model of inheritance) were 0.289 (0.034) for the self-report of gout or ULT use definition, 0.283 (0.036) for the self-report of gout definition, 0.282 (0.040) for the Winnard definition, 0.308 (0.044) for the ULT use definition and 0.236 (0.160) for the hospital diagnosis definition. There were no significant differences between the heritability estimates.
Accurate and consistent phenotyping of cases and disease-free control participants is important to maximise study power and reduce the risk of misclassification bias in genetic association studies. Consistent definitions of disease phenotypes are also important for replication of genetic associations in different cohorts . In this analysis of UK Biobank data, the definition of self-report of gout or ULT use detected the highest number of gout cases and had greatest precision for genetic association analysis.
Our findings are consistent with a recent analysis of the SUGAR cohort that used synovial fluid confirmation of monosodium urate crystals as the gold standard for gout definition . The SUGAR analysis reported that the definition of self-report of gout or ULT use had the best test performance characteristics of existing definitions, with sensitivity of 82% and specificity of 72%. Collectively, these data support the use of the self-report of gout or ULT use definition for use in epidemiological studies when more detailed gout-specific clinical data are not available.
The different definitions of gout used in this study may reflect different disease presentations or patient populations. Although not all patients were captured by any definition, there was substantial overlap between most definitions. The definition of hospital diagnosis is very restrictive and is unlikely to capture most people with gout. Of note, 126 (33.0%) of 382 of those who met the hospital diagnosis criteria did not meet the self-report of gout or ULT use definition. There may be several reasons for this. First, the hospitalised population may have a different disease presentation from that of those identified in the community through self-report or ULT use. Furthermore, a diagnosis of gout made during a hospital admission may subsequently be revised to a different diagnosis, and the ascertainment methodology does not take this into account. Compared with the case definition of self-report of gout or ULT use, the Winnard definition led to a lower estimated prevalence of gout and also had lower precision for genetic association analysis. Therefore, when self-report information is available, we recommend the definition of self-report of gout or ULT use.
For all definitions tested, ABCG2 and SLC2A9 were associated with gout at genome-wide significance. These genes encode proteins that regulate uric acid transport within the gut and proximal renal tubule, respectively. The large effect sizes observed in this study are reminiscent of their dominant effect sizes in GWAS of control of serum urate levels , consistent with the central role of these two genes in regulating serum urate and gout risk. As part of evaluating the various definitions, we also calculated heritability estimates of gout, with the proportion of age-, sex- and body composition-adjusted variance explained by all common SNPs to be 0.282–0.308 (excluding the hospital definition). Previously, Köttgen et al. , also using GCTA software, had estimated a range of genome-wide heritability estimates of 0.27–0.41 for age- and sex-adjusted serum urate levels, depending on the individual sample sets analysed. The estimates of variance explained in serum urate and gout by common genetic variants in the European sample sets are comparable, suggesting that the common genetic variant-mediated heritabilities of serum urate levels and gout are similar. Clearly, environmental factors also contribute to the risk of gout, such as dietary exposures and medications. The heritability estimates use information from common SNPs under the assumption of additive contributions. Therefore, the estimates will not include the contribution of non-additive gene-by-gene and gene-by-environment interactions, rare genetic variants and copy number variations.
We acknowledge that our study has limitations. The analysis was restricted to European participants, and our genetic association results may not be generalisable to non-European populations. Furthermore, a definition that includes ULT may be less specific for gout if the study population is recruited from countries in which ULT is recommended for treatment of asymptomatic hyperuricaemia. A diagnostic gold standard was not available in this study, and therefore it is not possible to determine the false-positive or false-negative rates using this dataset. Disease validation was based on the genotype data available in this cohort, and gout was inferred on the basis of known genetic associations with hyperuricaemia and gout. The strength of association observed in this study population may not reflect findings in the general UK population; risk factors for secondary gout (age ≥70 years and kidney disease) were exclusion criteria. The study findings also are not applicable to studies in which researchers do not collect information about self-report of gout or gout medication use. Our study’s strengths include the large sample size with consistent data collection. The comprehensive data collection, including patient interviews, hospitalisation records and medication information, allowed us to compare a number of different case definitions within a single study.
The case definition of self-report of gout or ULT use has high precision for detecting association in genetic epidemiological studies of gout. When these variables are available within multi-purpose cohorts, the consistent use of this case definition should reduce the risk of misclassification bias and improve study power.
Genome-wide association study
International Classification of Diseases, Tenth Revision
Study for Updated Gout Classification Criteria
Köttgen A, Albrecht E, Teumer A, Vitart V, Krumsiek J, Hundertmark C, et al. Genome-wide association analyses identify 18 new loci associated with serum urate concentrations. Nat Genet. 2013;45(2):145–54.
Colhoun HM, McKeigue PM, Davey SG. Problems of reporting genetic associations with complex outcomes. Lancet. 2003;361(9360):865–72.
Dalbeth N, Schumacher HR, Fransen J, Neogi T, Jansen TL, Brown M, et al. Survey definitions of gout for epidemiological studies: comparison with crystal identification as the gold standard. Arthritis Care Res (Hoboken). 2016;68(12):1894–8.
Ollier W, Sprosen T, Peakman T. UK Biobank: from concept to reality. Pharmacogenomics. 2005;6(6):639–46.
Winnard D, Wright C, Taylor WJ, Jackson G, Te Karu L, Gow PJ, et al. National prevalence of gout derived from administrative health data in Aotearoa New Zealand. Rheumatology (Oxford). 2012;51(5):901–9.
Dean AG, Sullivan KM, Soe MM. OpenEpi: open source epidemiologic statistics for public health. Version 3.01 [updated 6 Apr 2013]. http://www.OpenEpi.com.
Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82.
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7.
Zondervan KT, Cardon LR. Designing candidate gene and genome-wide case-control association studies. Nat Protoc. 2007;2(10):2492–501.
This research was conducted using the UK Biobank Resource (approval number 12611).
This work involving use of the UK Biobank Resource was supported by the Health Research Council of New Zealand (grant number 14-527).
Availability of data and materials
The data in this study is owned by a third party, UK Biobank (www.ukbiobank.ac.uk), and legal constraints do not permit public sharing of the data. UK Biobank, however, is open to all bona fide researchers anywhere in the world. Thus, the data reported in this article can be easily and directly accessed by applying through UK Biobank Access Management System (www.ukbiobank.ac.uk/register-apply).
Ethics approval and consent to participate
UK Biobank has approval from the North West Multi-Centre Research Ethics Committee (11/NW/0382), and obtained written informed consent from all participants prior to the study.
Consent for publication
TRM has received consulting fees or grants from Ardea Biosciences and AstraZeneca. ND has received consulting fees, speaker fees or grants from Takeda, Teijin, Menarini, Pfizer, Ardea Biosciences, AstraZeneca, Fonterra, Crealta and Cymabay. MC declares that he has no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Cadzow, M., Merriman, T.R. & Dalbeth, N. Performance of gout definitions for genetic epidemiological studies: analysis of UK Biobank. Arthritis Res Ther 19, 181 (2017). https://doi.org/10.1186/s13075-017-1390-1