Association of HLA-DRB1 amino acid residues with giant cell arteritis: genetic association study, meta-analysis and geo-epidemiological investigation

Introduction Giant cell arteritis (GCA) is an autoimmune disease commonest in Northern Europe and Scandinavia. Previous studies report various associations with HLA-DRB1*04 and HLA-DRB1*01; HLA-DRB1 alleles show a gradient in population prevalence within Europe. Our aims were (1) to determine which amino acid residues within HLA-DRB1 best explained HLA-DRB1 allele susceptibility and protective effects in GCA, seen in UK data combined in meta-analysis with previously published data, and (2) to determine whether the incidence of GCA in different countries is associated with the population prevalence of the HLA-DRB1 alleles that we identified in our meta-analysis. Methods GCA patients from the UK GCA Consortium were genotyped by using single-strand oligonucleotide polymerization, allele-specific polymerase chain reaction, and direct sequencing. Meta-analysis was used to compare and combine our results with published data, and public databases were used to identify amino acid residues that may explain observed susceptibility/protective effects. Finally, we determined the relationship of HLA-DRB1*04 population carrier frequency and latitude to GCA incidence reported in different countries. Results In our UK data (225 cases and 1378 controls), HLA-DRB1*04 carriage was associated with GCA susceptibility (odds ratio (OR) = 2.69, P = 1.5×10−11), but HLA-DRB1*01 was protective (adjusted OR = 0.55, P = 0.0046). In meta-analysis combined with 14 published studies (an additional 691 cases and 4038 controls), protective effects were seen from HLA-DR2, which comprises HLA-DRB1*15 and HLA-DRB1*16 (OR = 0.65, P = 8.2×10−6) and possibly from HLA-DRB1*01 (OR = 0.73, P = 0.037). GCA incidence (n = 17 countries) was associated with population HLA-DRB1*04 allele frequency (P = 0.008; adjusted R2 = 0.51 on univariable analysis, adjusted R2 = 0.62 after also including latitude); latitude also made an independent contribution. Conclusions We confirm that HLA-DRB1*04 is a GCA susceptibility allele. The susceptibility data are best explained by amino acid risk residues V, H, and H at positions 11, 13, and 33, contrary to previous suggestions of amino acids in the second hypervariable region. Worldwide, GCA incidence was independently associated both with population frequency of HLA-DRB1*04 and with latitude itself. We conclude that variation in population HLA-DRB1*04 frequency may partly explain variations in GCA incidence and that HLA-DRB1*04 may warrant investigation as a potential prognostic or predictive biomarker. Electronic supplementary material The online version of this article (doi:10.1186/s13075-015-0692-4) contains supplementary material, which is available to authorized users.


Introduction
Genetic association studies are a well-established method for investigating genetic contributions to disease. In rheumatoid arthritis (RA) [1] and small-vessel vasculitis [2], genetically distinct subsets have been identified that have different associations with the major histocompatibility complex (MHC) region that encodes the HLA-DRB1 alleles. Comparison of HLA-DRB1 associations with RA in different ethnic groups helped to support the original "shared epitope" hypothesis of RA susceptibility [3] based on an amino acid risk motif at positions 67-74 in the third hypervariable region (HVR3) of the class II MHC molecule, encoded by HLA-DRB1. The group of RA "shared epitope" alleles now includes HLA-DRB1*01:01, HLA-DRB1*04:01, HLA-DRB1*04:04, HLA-DRB1*04:08, and HLA-DRB1*10:01; other alleles provide weaker protective effects, additional to the risk effects of the "shared epitope" [4]. Recently, it has been demonstrated that amino acid residues 11 and 13 in the first hypervariable region (HVR1) of class II MHC display the strongest associations with RA susceptibility [5].
Giant cell arteritis (GCA) incidence is highest in populations with Scandinavian ancestry [6][7][8] and this has led to suggestions that this might be due to genetic factors [9,10]. Susceptibility to GCA has been reported to be associated with carriage of HLA-DRB1*04, but not all studies have shown an association and there are conflicting data as to whether there is an association with specific HLA-DRB1*04 alleles [11]. Underrepresentation of HLA-DRB1*01 in GCA patients from Rochester, Minnesota, led to a suggestion that the risk of GCA may be due to a DRYF motif at positions 28-31 in the second hypervariable region (HVR2) of MHC class II [12], but a Spanish study failed to replicate this finding [13]. To date, however, formal meta-analysis has not yet been performed to determine the major susceptibility and protective HLA-DRB1 alleles. The relative contribution of genetic and environmental factors as an explanation for geographical differences in GCA incidence also remains disputed [6,7]. When genetic diversity within Europe is subjected to principal component analysis, the MHC is one of several genetic regions that are strongly associated with a component that runs along a north-south gradient from Norway/Sweden to Spain [14]. We therefore hypothesised that variations in the frequency of HLA-DRB1 GCA susceptibility alleles may partly explain geographical variations in GCA incidence.
Here, we report new GCA susceptibility data, combine these with the published data using meta-analysis, and propose a new hypothesis regarding a possible amino acid GCA 11-13-33 risk motif in HVR1 and HVR2 of class II MHC. This hypothesis fits the observed data better than previously proposed models.

Patients
The UK GCA Consortium was designed to support genetic association studies. Investigators, all experienced rheumatologists, recruited cases with a firm clinical diagnosis of GCA, based on all available information. Recruitment was retrospective. A positive temporal artery biopsy was not required as it was not always undertaken in classic presentations or could not be performed within an optimal time window or both. In some centres, the erythrocyte sedimentation rate was unavailable and so fulfilment of the 1990 American College of Rheumatology (ACR) criteria [15], which should not be used for clinical diagnosis of GCA [16], was not a requirement for inclusion. Clinical data on a subset of this cohort have already been published [17]. In this analysis, we included all patients who agreed to give a blood sample for genetic studies up to 2012 and where a sample was available. Written informed consent was provided by all patients, and the study was approved by the York Research Ethics Committee (reference 05/Q1108/28).

DNA extraction and genotyping
DNA was extracted from peripheral blood. HLA-DRB1 genotyping was performed by either single-stranded oligonucleotide polymerisation [18] or allele-specific polymerase chain reactions (standard primer sequences (HLA DRBplus Typing Kit, Amersham Biosciences, now part of GE Healthcare, Little Chalfont, UK), except for the forward primer of HLA-DRB1*10 which was redesigned as 5'-GCG GTT GCT GGA AAG ACG CG-3'). Direct sequencing was also performed to enable fourdigit genotyping of HLA-DRB1*04 subtypes [18] because of previous reports of a HLA-DRB1*04 association of GCA at the two-digit level. The HaplotypeViewer program was developed to facilitate rapid four-digit genotyping from sequence electropherograms and is freely available [19].

Analysis of genotyping data
Control data from the UK Rheumatoid Arthritis Genetics (UKRAG) Consortium were used for this analysis. Initial logistic regression analyses were undertaken by assuming additive genetic models to estimate the effect of each potential susceptibility/protective allele. Adjustments for genetic effects already proposed in the literature (HLA-DRB1*04) were also performed.

Meta-analysis of giant cell arteritis susceptibility data
To identify case-control studies of HLA-DRB1 association with GCA susceptibility, a literature search was conducted in PubMed, without language restriction, by using the terms "HLA" and "(giant cell arteritis) OR (temporal arteritis)". Reference lists of studies identified were also scanned. Publications were included if they provided sufficient detail on cases and controls to perform a meta-analysis. Where there were multiple publications with overlapping datasets, the report with the most complete dataset was chosen. Meta-analysis of the published summary carrier frequency data was performed (i.e., assuming a dominant mode of inheritance) because allele frequency and individual-level patient data were mostly unavailable from the authors of the studies. A random-effects model was used; the overall estimate was calculated by using as weights 1/(v i + τ), where v i is the variance of the estimated effect from the ith study and τ is the estimated between-study variance [20].
Worldwide giant cell arteritis incidence in relation to HLA-DRB1*04 population carrier frequencies To identify reports of the incidence of GCA in different countries, a second literature search in PubMed was conducted with combinations of the medical subject heading terms "giant cell arteritis", "temporal arteritis", and "epidemiology". Hand-searching was also performed in the reference lists of retrieved articles, review articles, and textbooks. Studies were included if they were available in full-text and included an estimate of the annual incidence, time period of the study, method of case definition, population studied, and geographical location of the study. Where necessary, the incidence figure was recalculated as number of new cases per 100,000 of the over-50 population per year. Studies that appeared to report duplicate or overlapping populations were excluded. Studies completing recruitment before 1980 were excluded in case of time trends in the incidence of GCA and because the quality of the reporting was generally lower for the older studies. Where more than one report existed for a single country (unless in ethnically distinct populations), the one with a later average period of recruitment was preferred. Where a single report included two separate sub-studies (regions or time periods), a weighted mean of the two sub-studies was used to arrive at an overall incidence figure.
We then sought data on ethnically matched HLA-DRB1 population allele frequencies at the two-and fourdigit levels for each geographical region identified in the second literature search. We considered HLA-DRB1*04 alleles and those identified as being potential susceptibility/protective alleles in our own UK dataset. Methods have been reported elsewhere [21]; briefly, we first consulted the Allele Frequency Net Database [22] and then, if necessary, Ovid Medline and Embase. Carrier frequencies for control populations were converted to estimated allele frequencies by using the Hardy-Weinberg equation. Finally, the following predetermined rule was used to generate an estimate of population HLA-DRB1 allele frequencies: reports with over 500 (four-digit typing) or 1000 (two-digit typing) controls were identified and a weighted mean calculated. In the absence of large studies, studies with more than 100 (four-digit typing) or more than 200 (two-digit typing) were identified and a weighted mean calculated. Determination of geographical latitude and linear regression analysis were performed as previously described [21].

Development of amino acid risk motif model
After determination of susceptibility and protective HLA-DRB1 alleles, amino acid residues in HVR1, HVR2, and HVR3 were obtained from the IMGT (International ImMunoGeneTics Information System) database [23], accessed 31 January 2012). For samples with only twodigit typing, we estimated amino acid residues on the basis of geographically relevant population frequencies of the four-digit subtypes from the Allele Frequency Net Database [22], assigning a probability to residues when they varied within the four-digit subtypes (only necessary at positions 28, 32, 37, 67, 70, 71, and 74). For each of these, the expected misclassification rate when assigned by using population frequencies is less than 1 %, apart from positions 67 (2 %) and 71 (3 %). For each polymorphic position, samples were assigned a dosage (i.e., the expected number of copies) for each residue. Logistic regression was then used to test for association at each position separately, and degrees of freedom were equal to one less than the number of distinct residues. For the most significant positions, forward stepwise regression was used to identify the residues associated with disease risk at that position.
We used population HLA-DRB1 frequencies for inferring amino acid residues in both cases and controls. Under the null hypothesis of no association, the frequencies would be the same, and any bias introduced by using population frequencies for cases would be toward the null. Using HLA-DRB1 four-digit frequencies that have been observed in patients with GCA to infer the amino acid residues in the GCA cases could lead to a biased analysis with inflated false-positive rate.
Results that reach a nominal significance level of 0.05 are highlighted. For the exploratory hypotheses, these should be interpreted in the light of multiple testing. Analyses were performed in SPSS 15 (IBM Corporation, Armonk, NY, USA) and Stata SE (StataCorp LP, College Station, TX, USA).

Patients
Two hundred twenty-five patients with GCA from 7 UK centres consented to analysis of genetic material for this study (125 from Leeds hospitals, including 38 from Otley; 33 from Harrogate; 23 from Southend; 17 from York; 16 from Dewsbury; 10 from Pontefract; and one from Ipswich). Their demographics and disease characteristics, including fulfilment of 1990 ACR criteria, are shown in Table 1. Of the 183 temporal artery biopsies performed, 140 (77 %) were positive. Patients were all European Caucasian.

Analysis of genotyping data
Allele frequencies in cases and 1378 UKRAG controls are shown in Table 2 with per-allele odds ratios with and without adjustment for HLA-DRB1*04, the previously proposed susceptibility allele. Initial analysis was performed at the two-digit level. Four-digit analysis was also performed for the common *04 subtypes, but statistical analysis was not performed on the rarer *04 subtypes.
The effect sizes for HLA-DRB1*04 carriage were similar when restricting analyses to biopsy-positive GCA cases (OR = 2.83, 1.99 to 4.03, P = 7.7×10 −9 ); the number of biopsy-negative GCA cases was too small for a separate analysis.
Worldwide giant cell arteritis incidence in relation to HLA-DRB1*04 population carrier frequencies Reliable population HLA-DRB1 data were not always available, notably for small, native tribes of Alaska and Saskatoon, where extrapolation from small and physically or genetically (or both) isolated communities was felt to be unwarranted. Table 4 summarises the GCA incidence articles included, together with the estimated population allele frequencies and the number of individuals on which these estimates are based. Substantial clinical heterogeneity was identified in the GCA incidence studies, including variations in the methods used to identify GCA cases and confirm GCA diagnosis. At the two-digit level (Table 4 and Fig. 1a and b), 17 studies were included in the analysis of worldwide GCA incidence in relation to population HLA-DRB1 allele frequencies. In view of our meta-analysis findings, we extracted data for HLA-DRB1*04, HLA-DRB1*01, and HLA-DRB1*15 population frequencies ( Table 4). The majority of these were from Europe and the Mediterranean area. Within this small dataset, HLA-DRB1*15 was more common in the general population at more northerly latitudes Owing to the retrospective recruitment of cases, contemporaneously recorded data on features of giant cell arteritis (GCA) at presentation were not always recorded in the medical notes. For this reason, in 24 cases, fulfilment of American College of Rheumatology (ACR) criteria could not be documented. Six of these were biopsy-proven. In some recruiting centres, plasma viscosity or C-reactive protein was measured instead of erythrocyte sedimentation rate (r = 0.52, P = 0.038) whereas no significant association with latitude was seen for population HLA-DRB1*04 or HLA-DRB1*01 (r = 0.47, P = 0.057; r = 0.39, P = 0.133).

Development of model with 11-13-33 amino acid risk motif
Tests for association showed that the most significant position was at 13 (P = 1.2×10 −9 ), followed by 11, 33, 37, and 9 ( Fig. 1c). At position 13, the most significant residue was H (OR = 2.11, 95 % CI 1.61 to 2.77, P = 5.5×10 −8 , equivalent to H at 33 and also to the 04 allele). However, stepwise regression found additional contributions from residues S (OR = 1.38, 95 % CI 1.07 to 1.77, P = 0.014) and F (OR = 0.66, 95 % CI 0.44 to 0.99, P = 0.038) (Table 5). Similarly, multiple residues were found at positions 11 and 37. There is very strong linkage disequilibrium in this region, and so many of these residues at different positions almost always occur together, as illustrated in Additional file 1. For example, we did not have the power to distinguish between the risk effects of V, H, and H at positions 11, 13, and 33, respectively, since the *10 allele, which differs at residue 13, is very rare. The previously proposed DYF motif (positions 28, 30, and 31) in HVR2 (OR = 1.54, 95 % CI 1.21 to 1.96, P = 0.00038) did not explain the observed data as well as simple HLA-DRB1*04 carriage. Similarly, variation in amino acid residues within HVR3 is unlikely to explain the observed GCA susceptibility data (Fig. 1c), especially since the other alleles comprising the "RA shared epitope" were not overrepresented in GCA.

Discussion
In this study, which includes both new UK data and the first formal meta-analysis of published data on HLA-DRB1 associations of GCA, we not only confirm a strong association of GCA with HLA-DRB1*04 allele carriage, including within our own UK data, but also identify possible protective effects of HLA-DRB1*01 and HLA-DRB1*15, supported by the meta-analysis of previous studies. We were able to impute amino acid residues quite reliably from published allele frequencies, enabling us to analyse amino acid residues even though four-digit typing was not available for every HLA-DRB1 allele. Based on this, it was the amino acid residues 11, 13, and 33 in the first and second hypervariable regions that best    ACR American College of Rheumatology Giant cell arteritis (GCA) incidence in Finland was taken as weighted mean of the two sub-studies reported in the article cited. Allele frequencies are the weighted mean of the allele frequencies reported in the studies cited (see Methods for details of how these were selected). For brevity, where the data were taken from the allelefrequencies.net website, the main citation for the website is given [22]; searching allelefrequencies.net on the country, sample size, and human leukocyte antigen (HLA) allele in question will give the allele frequencies used in this table explained the observed HLA-DRB1 susceptibility and protective effects, rather than the previously proposed DRYF amino acid motif in the second hypervariable region [12]. We also observed that some non-HLA-DRB1*04 amino acid residues had additional effects (individual amino acid residues that were retained by a multivariable regression model for each separate amino acid position are shown in the last two columns in Table 5), suggesting additional genetic complexities that we did not have the power to investigate in depth. We then systematically extracted data on population prevalence of the identified susceptibility and protective HLA-DRB1 alleles and compared this with reports of GCA incidence in different countries. We found a significant and independent relationship of GCA incidence both with HLA-DRB1*04 and with latitude. Conversely, HLA-DRB1*15 was, if anything, protective and did not contribute to incidence of GCA in the geo-epidemiological study. Strengths of this work include the presentation of the first UK HLA data in GCA, its presentation in the context of the international literature, the first meta-analysis of HLA-DRB1*04 GCA susceptibility studies, and the novel approach combining a traditional genetic association study with a geo-epidemiology approach. Using logistic regression for the UK, we could control for already-known HLA-DRB1 susceptibility effects in the per-allele analysis, which also has not been performed in other datasets, which mostly reported only carrier frequencies not allele frequencies. Based on this, we were able to suggest HLA-DRB1 amino acid residues that best fit the observed susceptibility/protective allele effects. This is the first synthesis of the literature on reported GCA incidence in relation to population HLA-DRB1 allele frequency and geographical latitude.
Our analysis is based on certain assumptions. Firstly, because many clinicians in the UK do not always request temporal artery biopsy except in cases of diagnostic doubt [24], we had prespecified in the analysis that GCA would be defined clinically rather than limiting inclusion to biopsy-positive cases only. The clinically diagnosed patients, however, had to be firmly diagnosed by an experienced consultant, and there had to be unequivocal clinical features and no alternative explanation for the symptoms after follow-up. Temporal artery biopsy is not 100 % sensitive for GCA; possible reasons for falsenegative biopsies in our cohort included delays in obtaining biopsies resulting in resolution of inflammation, suboptimal biopsy length, and biopsy reporting based on the classic pathologic criteria, which may be overly stringent [25]; sometimes the temporal artery is spared in patients with GCA, particularly in those with predominant disease of the aorta and its proximal branches [26]. We conducted a sub-analysis of the biopsy-positive subset and found no difference in the observed effect size for HLA- with such a small number of biopsy-negative cases, no meaningful statement can be made about the effect size in that group. Our meta-analysis showed that the effect size in the cohort overall was also comparable to that observed in previous reports, some of which included only biopsypositive cases. If not all the cases truly had GCA, this would have reduced the power of the study ("diluted out" the genetic association) but would have been highly unlikely to introduce artefactual genetic associations because the differential diagnosis of GCA is so wide. Similar pragmatic approaches to case definition for genetics studies, accepting a small, finite rate of misclassification in order to maximise recruitment, have been successfully used in other genetic association studies [27]. We also did not have the power to study whether there are differences in the effect size between regions of the UK, but regional variations in the incidence of diagnosed GCA have been described [28]; it remains unclear how far this is influenced by regional variations in population HLA-DRB1 frequency [29].
In regard to the HLA-DRB1 typing, it is recognised that HLA-DRB1 represents only a small part of the whole MHC and also that not having complete sequence-based four-digit typing may have resulted in some important information being missed. This study focused on HLA-DRB1 and we did not set out to analyse variation elsewhere in the MHC [30]. However, our finding that non-HLA-DRB1*04 residues also contributed significantly to GCA susceptibility/protection (Table 5) suggests that other alleles may also be involved. The MHC is a complex locus with extensive linkage disequilibrium, and an MHC-wide analysis requires larger datasets and specialised analysis methods. A concurrent international, collaborative large-scale genetic analysis of GCA (including samples from this study), using a different genotyping platform (Immunochip) with more extensive coverage of the HLA region [31], shows evidence of wider involvement of the MHC region while confirming the strong association with DRB1*04. Lastly, the literature reviews and metaanalyses are limited by the small number of studies in Forward stepwise regression was used to determine whether there are independent effects of more than one amino acid residue at the same position. Only amino acids that were retained by the stepwise regression model are shown in the last two columns. Odds ratio (OR) of less than 1.0 indicates a protective effect, whereas OR of more than 1.0 indicates a susceptibility effect. Note the high level of linkage disequilibrium at this locus; Additional file 1 shows the amino acid residues within the three hypervariable regions of HLA-DRB1 the literature, many of which were published some years ago, with corresponding variations in case ascertainment and in genotyping assays. Larger datasets using modern genotyping and statistical analysis methods will reveal further GCA susceptibility alleles within the whole HLA locus and allow their pattern of linkage disequilibrium to be analysed.
The P values reported here should be considered in the light of multiple testing, but owing to the a priori suggestion of HLA-DRB1*04 association and lack of consensus as to how to adjust for multiple testing at a multi-allelic locus where the different alleles are not independent of each other, we did not consider a Bonferroni correction to be appropriate here. Nevertheless, model over-fitting is a possibility, and it is essential that our findings be replicated in an independent dataset.

Conclusions
In summary, we report a novel approach to studying genetic influences of disease by combining traditional genetic association studies with geo-epidemiology methods that capitalise on publicly available data. Our new UK data and a synthesis of the published literature suggest that HLA-DRB1*04 might explain part of the observed geographical variation in GCA incidence. This is consistent with an autoimmune aetiology for GCA [32]. However, we found additional variation in susceptibility (Table 5) and incidence (Fig. 1a, b) that is not fully explained by HLA-DRB1*04 and is likely to relate to additional, unknown genetic and environmental factors. Previous studies of GCA have also demonstrated an association between HLA-DRB1*04 and visual loss [33] and also with glucocorticoid resistance [34]. Of interest, in Japan (where HLA-DRB1*04 population frequency is low), large-vessel vasculitis (Takayasu arteritis) is relatively more common than GCA. Takayasu arteritis was associated with alleles containing the 11-13-33 V-H-H motif (HLA-DRB1*0405) in a Turkish population but was not associated with another allele also containing V-H-H (HLA-DRB1*0401) in a European-American population; HLA-DRB1*1502, which was associated with Takayasu arteritis in both populations [35], does not contain the V-H-H motif. Very few patients in our dataset had large-vessel imaging, but genetic characterisation of the subset of GCA patients who have largevessel involvement or temporal artery sparing or both [26] would be of interest in future studies. From a clinical perspective, further study of well-phenotyped cohorts is required to determine whether HLA-DRB1*04 may serve as a biomarker of pathophysiologically relevant phenotypic disease subsets in order to develop better risk stratification, prediction of response to glucocorticoids, and ultimately targeted therapies.

Additional files
Additional file 1: Appendices I and II (Membership of consortia). Description of contents: Names and institutional affiliations of members of the UK GCA Consortium (Appendix I) and the UKRAG Consortium (Appendix II).
Additional file 2: Amino acids at hypervariable regions (HVR) in susceptibility, protective, and neutral GCA alleles. Table showing the amino acid residues in the hypervariable regions of HLA-DRB1, for each of the HLA-DRB1 alleles. Shaded columns denote the proposed 11-13-33 GCA risk motif.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions SLM designed the study and wrote the case record form, recruited patients, and validated clinical data, including diagnoses, DNA extractions, and genotyping assays, and performed the searches and data extraction from the published literature and online databases for HLA-DRB1 data (GCA susceptibility, amino acid sequences, and population allele frequencies), participated in design of geo-epidemiology analyses and carried out the geo-epidemiology statistical analyses, and drafted the manuscript. JCT carried out all the other statistical analyses, including amino acid analyses, and helped to draft the manuscript. LH-R and SM participated in design and validation of the genotyping assays, including sequencing and analysis of electropherograms, including use of HaploViewer. BD, AG, MG, LH, SJ, and CTP recruited patients and validated clinical data, including diagnoses. UKRAG Consortium provided control HLA-DRB1 data. JHB supervised the statistical analyses, including amino acid analyses, and helped to draft the manuscript. RW supervised the design and conduct of the geo-epidemiology analyses, performed the literature search for incidence of GCA in different countries, and helped to draft the manuscript. AWM conceived of the study, supervised the design and conduct of the genetic studies, including amino acid analyses, and helped to draft the manuscript. All authors revised the draft for important intellectual content and read and approved the final manuscript.