Validation of the reshaped shared epitope HLA-DRB1 classification in rheumatoid arthritis

Recently, we proposed a classification of HLA-DRB1 alleles that reshapes the shared epitope hypothesis in rheumatoid arthritis (RA); according to this model, RA is associated with the RAA shared epitope sequence (72–74 positions) and the association is modulated by the amino acids at positions 70 and 71, resulting in six genotypes with different RA risks. This was the first model to take into account the association between the HLA-DRB1 gene and RA, and linkage data for that gene. In the present study we tested this classification for validity in an independent sample. A new sample of the same size and population (100 RA French Caucasian families) was genotyped for the HLA-DRB1 gene. The alleles were grouped as proposed in the new classification: S1 alleles for the sequences A-RAA or E-RAA; S2 for Q or D-K-RAA; S3D for D-R-RAA; S3P for Q or R-R-RAA; and X alleles for no RAA sequence. Transmission of the alleles was investigated. Genotype odds ratio (OR) calculations were performed through conditional logistic regression, and we tested the homogeneity of these ORs with those of the 100 first trio families (one case and both parents) previously reported. As previously observed, the S2 and S3P alleles were significantly over-transmitted and the S1, S3D and X alleles were under-transmitted. The latter were grouped as L alleles, resulting in the same three-allele classification. The risk hierarchy of the six derived genotypes was the same: (by decreasing OR and with L/L being the reference genotype) S2/S3P, S2/S2, S3P/S3P, S2/L and S3P/L. The homogeneity test between the ORs of the initial and the replication samples revealed no significant differences. The new classification was therefore considered validated, and both samples were pooled to provide improved estimates of RA risk genotypes from the highest (S2/S3P [OR 22.2, 95% confidence interval 9.9–49.7]) to the lowest (S3P/L [OR 4.4, 95% confidence interval 2.3–8.4]).


Introduction
The pathogenesis of rheumatoid arthritis (RA) is multifactorial, involving both genetic and environmental factors. Although associations between some HLA-DRB1 alleles and RA were reported nearly three decades ago, the biological mechanism underlying this association remains unknown. The presence of the RAA sequence at positions 72-74 of the HLA-DR β-chain molecule for all HLA-DRB1 alleles known to be associated with RA led to the shared epitope (SE) hypothesis [1]. This hypothesis received support from numerous case-control association studies in both Caucasian and non-Caucasian populations. However, studies testing the SE hypothesis have rejected this simple model, which stipulates that each SE allele confers the same risk [2][3][4][5]. CI = confidence interval; OR = odds ratio; RA = rheumatoid arthritis; SE = shared epitope.
Recently, Tezenas du Montcel and coworkers proposed a model of the SE component in RA [6]. Those investigators reconsidered the SE hypothesis and generated a new classification of HLA-DRB1 alleles, based on their investigation using the MASC (Marker Association Segregation Chi Square) method [7], which was conducted in 100 trio families (one case and both parents) and 132 index cases from affected sibling pair families, all from the French Caucasian population. They proposed that the risk for developing RA depends on whether the RAA sequence occupies positions 72-74 and, if this is the case, on the amino acids at positions 71 and 70. For those RAA alleles, lysine (K) at position 71 conferred the highest risk, arginine (R) an intermediate risk, and alanine (A) or glutamic acid (E) the lowest risk. Glutamine (Q) or arginine (R) at position 70 conferred greater risk than did aspartic acid (D). This resulted initially in five allele groups, which were simplified to three allele groups defining six genotypes with different RA risks. This study was the first to model the HLA component in RA taking into account both association and linkage data, resulting in a reshaped SE hypothesis.
Here, we tested this classification for validity by replication in a new, independent sample of 100 French Caucasian trio families, evaluating the risk hierarchy of the proposed classification for homogeneity with that of the initial sample.

Study design and study population
An association study using conditional logistic regression was performed to investigate the hierarchy of risks associated with HLA-DRB1 genotypes in an independent sample of trio families. The new independent sample (sample B), similar to that used to generate the new classification (sample A), included 100 trio families (one RA patient and both parents) of French Caucasian origin (criteria fulfilled for each of the four grandparents). DNA from all of the trio families included in samples A and B was collected between 1994 and 1998, as were initial clinical characteristics of the RA index patients. RA diagnosis met the 1987 American College of Rheumatology (formerly, the American Rheumatism Association) criteria [8]. All individuals provided written informed consent, and the study was approved by the Hospital Bicêtre ethics committee (Kremlin-Bicêtre, Assistance Publique-Hôpitaux de Paris).
Clinical characteristics were updated in 2001 and 2002 for sample A and in 2004 for sample B. Four RA index patients in sample A and two RA index patients in sample B died between the time of DNA collection and the present study. The updated clinical characteristics of sample B were similar to those of sample A (the initial sample): 90% of RA patients in sample B were female versus 87% in sample A; the mean (± standard error) age at RA onset was 31 ± 9 years versus 32 ± 10 years; the mean (± standard error) disease duration was 16 ± 8 years versus 18 ± 7 years; erosions were present in 79% versus 90%; 76% were positive for serum rheumatoid factor ver-sus 81%; and nodules were present in 19% versus 31%. Rheumatoid factor was considered positive when there was at least one positive rheumatoid factor finding during the course of the disease, as determined using latex fixation, Waaler Rose assay, or laser nephelometry.

HLA-DRB1 genotyping
Blood samples were collected for DNA extraction and genotyping. HLA-DRB1 typing was performed using the polymerase chain reaction-sequence specific primer (SSP) method using Dynal Classic SSP DR low resolution and the Dynal Classic high resolution SSP (Dynal Biotech, Lake Success, NY, USA) for subtyping of HLA-DRB1*01, *04, *11, *13 and *15 alleles. Sequencing of exon 2 of HLA-DRB1 was performed for all four HLA-DRB1*04 alleles, ambiguous with the Dynal Classic method. HLA-DRB1 allele frequencies of control genotypes (obtained by combining untransmitted parental alleles for each family) were similar between samples and were comparable to the allele frequencies reported for the French population in the 11th Histocompatibility Workshop [9].

HLA-DRB1 allele classification
HLA-DRB1 alleles were divided in two groups according to the presence or absence of the RAA sequence at positions 72-74, defining S and X alleles ( Table 1). The S alleles were then subdivided into three categories, according to amino acid at position 71, as follows: S 1 when an alanine or a glutamic acid was present at position 71 (A-RAA or E-RAA sequences; A-RAA alleles were too infrequent not to be pooled, as described previously [6]); S 2 when a lysine was present (K-RAA sequence); and S 3 when an arginine was present at position 71 (R-RAA sequence). Then S 3 alleles were subdivided according to amino acid at position 70: S 3D alleles encoding the D-R-RAA sequence and S 3P alleles encoding the Q or R-R-RAA sequence. Because the S 2 alleles had either Q or D at position 70, they had -by this '70-71-72/74' nomenclaturethe Q or D-K-RAA sequence.

Statistical analysis
We first investigated transmission of the five alleles (S 1 , S 2 , S 3D , S 3P and X) using a χ 2 test with one degree of freedom for each allele. Alleles with significant over-transmission from heterozygous parents to RA patients (>50%) are linked to and associated with RA. Alleles with significant under-transmission (<50%) exhibit no RA association and could be pooled for further analysis.
Then, for each genotype 'I', the odds ratio 'ORi' relative to a reference genotype and 95% confidence interval (CI) were calculated by conditional logistic regression. In this analysis, the genotypes observed for the RA patients were conditioned to the parents' genotypes [10,11]. The RA patient genotypes were compared using a likelihood ratio test with the pseudocontrols (i.e. the three other genotypes that could be formed by parental gametes). Given reference genotype with baseline risk termed β 0 , each OR β i (i = 1 ... n) was estimated by the maximization of the log likelihood (L): Where Xi is an indicator taking value 1 for genotype 'i' and 0 for the other genotypes, and β i = log ORi, with β 0 being the baseline risk for reference genotype. Likelihood computations and estimation were performed using the program developed by Clayton [12]. All the results were produced using STATA software (David Clayton, Cambridge, UK).
In case of replication of the genotype risk hierarchy, a homogeneity test on genotypic ORs was performed between the two trio family samples. In this test, we considered that, if homogeneity was present, then Q = -2(ln(maxL AB ) -(ln(maxL A ) + (ln(maxL B )))) would follow a χ 2 distribution with n degrees of freedom (n being the number of β i s estimated). L A , L B and L AB were the maximum likelihood over β i in sample A, sample B and pooled samples A and B, respectively.
If homogeneity between the two samples was confirmed, then the classification was considered validated, and OR (95% CI) were estimated by conditional logistic regression on the entire sample (samples A and B combined).

Test of the shared epitope allele classification in the new independent sample
We first observed significant over-transmission of S 2 alleles (53 S 2 alleles transmitted versus 33.5 alleles expected; P = 1.9 × 10 -6 ) and of S 3P alleles (47 S 3P alleles transmitted versus 33.5; P = 0.001), as was previously reported [6]. S 1 , S 3D and X alleles were under-transmitted: 28 S 1 alleles were transmitted versus 40 expected (P = 0.007), 11 S 3D alleles was transmitted versus 18 expected (P = 0.02), and 30 X alleles were transmitted versus 44 expected (P = 0.003). These three lowrisk alleles (S 1 , S 3D and X) were pooled as L alleles, as reported previously. Thus, in subsequent analyses we considered only three alleles (S 2 , S 3P and L alleles), with six corresponding genotypes.
The conditional logistic regression analysis provided the following hierarchy of genotype risks: S 2 /S 3P and S 2 /S 2 genotypes were associated with greatest risk for RA, with ORs of 19.5 and 18.0, respectively; these were followed by S 3P /S 3P , S 2 /L and S 3P /L genotypes, with ORs of 8.7, 5.3 and 3.1, respectively (with the reference genotype being L/L; Table 2). This hierarchy was precisely the same as observed previously [6].

Results of the homogeneity test
The homogeneity test on genotypic ORs between the new sample and the initial one resulted in a χ 2 with five degrees of freedom of 1.3 (P = 0.80). Because this test was not statistically significant, we considered the two samples to be homogeneous and the new classification to be valid.

Odds ratio estimation on the pooled sample of 200 trio families
Because the two samples were homogeneous, ORs were estimated, by conditional logistic regression, for the pooled sample of 200 trio families (Table 3). Table 1 Classification of HLA-DRB1 alleles This classification of alleles, observed in our samples, is according to the amino acid sequence at positions 70-74, as described by Tezenas du Montcel and coworkers [6].

Discussion
In the present study we validated the classification of HLA-DRB1 SE alleles in RA proposed by Tezenas du Montcel and coworkers [6]. This is the first study to validate a model of the HLA-DRB1 component of RA based on the SE hypothesis [1], with detailed investigation of the SE through the contribution of SE single amino acids to RA susceptibility, taking into account both linkage and association data. This work results in a risk genotype hierarchy, for which we provide OR estimates. The ORs were obtained exclusively from trio families, providing unbiased estimates for the sample investigated; this contrasts with estimations derived from case-control studies, for which the population matching between cases and controls can be questioned.
Further studies in other Caucasian and non-Caucasian populations are required to validate this new classification fully and investigate population-specific effects. The ORs reported here relate to relatively early onset RA, as is found in trio families. Because the mean duration of RA in both samples was long (18 years in sample A and 16 years in sample B), selection (survivor) bias would be possible even if we had considered those RA index patients who died between the time of DNA collection and the present study. Investigation of a population with common, sporadic RA is needed to assess the potential clinical relevance of this new classification. Studies with larger sample size would be able to refine the 95% CI of the OR. In the present study non-overlapping 95% CIs were observed only between the S 2 /S 3P highest risk genotype (OR 22.2, 95% CI 9.9-49.7) and the S 3P /L lowest risk genotype (OR 4.4, 95% CI 2.3-8.4). A significant difference between other associated genotypes remains to be established. This would provide major clues that may help in deciphering the genetic component of RA, if significant differences could be correlated with distinct pathophysiological mechanisms. It was recently reported that the SE-RA association was confined to rheumatoid factor positive patients [13] or to anti-citrullin positive RA patients [14]. The precise relationship between the HLA risk genotypes and rheumatoid factor or anti-citrullinated peptide antibodies should therefore also be determined. The interaction between HLA-DRB1 genotypes and any new RA gene established by association and linkage, such as PTPN22 Alleles S 2 , S 3P and L are defined according to amino acid sequence at positions 70-74, as described in Table 1. The L alleles are the low-risk alleles S 1 , S 3D and X. The reference genotype is L/L. a Pseudo-controls are the three other genotypes that could be formed by the parental genotypes. Alleles S 2 , S 3P and L are defined according to amino acid sequence at positions 70-74, as described in Table 1. The L alleles are the low-risk alleles S 1 , S 3D and X. The reference genotype is L/L. a Pseudo-controls are the three other genotypes that could be formed by the parental genotypes. [15,16], could be investigated taking this new classification into account. Ultimately, this could help in identifying other RA genetic factors that may specifically interact with only one of the HLA-DRB1 genotypes. Several previous studies indicated that other genes within HLA, such as the HLA class III region, probably contribute to RA risk [17,18]. The search for interactions between additional HLA class III genetic variants, not considered in the present study, and HLA-DRB1 genotypes taking this new classification into account would be of great interest.
Large sample size studies could refine the classification for infrequent alleles. In the present study we were unable to examine rigorously the amino acid at position 71 or at position 70, particularly for the S 1 allele group, in which small sample size prevented study of the role played by the different alleles encoding the D-E-RAA motif. This D-E-RAA motif has been reported to be protective in the literature and constitutes an alternative SE hypothesis, although we obtained no support for it during our initial study [19]. The different S 2 sequences Q-K-RAA (*0401) and D-K-RAA (*1303) should be evaluated separately, because the presence of an aspartic acid at position 70 has been reported to influence susceptibility to RA [20]. Similarly, the S 3P sequences Q-R-RAA (*0101, *0102, *0404, *0405, *0408) and R-R-RAA (*1001) should be differentiated. The investigation of other amino acid positions from the third hypervariable region of the HLA-DR β-chain would be interesting, especially for positions 67 (for which the presence of an isoleucine might be important [21]) and 86 (as proposed by Gao and coworkers [22]).
Because numerous association studies have suggested that the primary role played by the SE might lie in the development of severe RA [23], the relevance of this classification should be evaluated for RA prognosis in prospective cohorts. A first investigation with the new classification already provides some support for a correlation with progression of radiographic damage [24]. Indeed, it would be of great help to be able to identify those RA patients at risk for development of more severe disease, who may require more aggressive therapeutic management than patients with better prognosis.

Conclusion
In the present study we validated a first model of the effect of HLA-DRB1 on RA, reshaping the SE hypothesis and providing initial estimates for the resulting risk genotypes. Building on this new HLA genotype classification could lead to improvement in our understanding of the genetics, pathophysiology and potential clinical use in management of RA.