Analysis of TNFAIP3, a feedback inhibitor of nuclear factor-κB and the neighbor intergenic 6q23 region in rheumatoid arthritis susceptibility

Introduction Genome-wide association studies of rheumatoid arthritis (RA) have identified an association of the disease with a 6q23 region devoid of genes. TNFAIP3, an RA candidate gene, flanks this region, and polymorphisms in both the TNFAIP3 gene and the intergenic region are associated with systemic lupus erythematosus. We hypothesized that there is a similar association with RA, including polymorphisms in TNFAIP3 and the intergenic region. Methods To test this hypothesis, we selected tag-single nucleotide polymorphisms (SNPs) in both loci. They were analyzed in 1,651 patients with RA and 1,619 control individuals of Spanish ancestry. Results Weak evidence of association was found both in the 6q23 intergenic region and in the TNFAIP3 locus. The rs582757 SNP and a common haplotype in the TNFAIP3 locus exhibited association with RA. In the intergenic region, two SNPs were associated, namely rs609438 and rs13207033. The latter was only associated in patients with anti-citrullinated peptide antibodies. Overall, statistical association was best explained by the interdependent contribution of SNPs from the two loci TNFAIP3 and the 6q23 intergenic region. Conclusions Our data are consistent with the hypothesis that several RA genetic factors exist in the 6q23 region, including polymorphisms in the TNFAIP3 gene, like that previously described for systemic lupus erythematosus.


Introduction
The etiology of rheumatoid arthritis (RA) includes a genetic component that has become amenable to investigation in recent years. A major development has been the availability of large-scale genome-wide association (GWA) studies. The first GWA studies in RA were readily able to confirm the two clearest RA genetic factors -in the human leukocyte antigen region and in the PTPN22 gene [1][2][3]. In addition, such studies have found other significant associations. Some of these associations have already been confirmed in additional studies, such as the TRAF1-C5 locus and the intergenic region in the 6q23 chromosome [2][3][4][5].
Two single nucleotide polymorphisms (SNPs) in 6q23, namely rs6920220 and rs13207033 (or its perfect surrogate rs10499194), have exhibited peak association with RA in an independent manner [2]. This finding has been interpreted as indicating the involvement of multiple genetic variants in RA susceptibility [2]. The associated region does not contain any known protein-coding sequence and lacks any evident functional consequence [2,4], but a strong RA candidate gene, the tumor necrosis factor--induced protein 3 (TNFAIP3) gene (also known as A20), is at about 185 kilobases (kb; Figure 1 shows the positions of these two loci). In addition, the rs6920220 SNP -together with SNPs in the TNFAIP3 gene -have been found to be reproducibly associated with systemic lupus erythematosus (SLE) susceptibility [6,7]. Therefore, we have hypothesized that genetic variation in TNFAIP3 could also be involved in susceptibility to RA. This gene is an excellent candidate for such an effect because it is a feedback negative regulator of tumor necrosis factor signaling through nuclear factor-B (NF-B) [8][9][10].
To test our hypothesis and to relate the TNFAIP3 locus to the intergenic 6q23 region, we have genotyped tagSNPs at both loci. Analysis in 1,651 RA patients and 1,619 control individuals revealed significant but weak associations at each locus. Each of these signals was statistically reinforced when signals in the other locus were accounted for. These results are consistent with multiple RA genetic factors in chromosome 6q23 that include polymorphisms in the TNFAIP3 gene and that interact with one another.

DNA samples
Recruitment of samples included in this study has already been described [11]. Samples were obtained from Caucasian Spanish patients with RA (cases; n = 1,651) and control individuals (controls; n = 1,619). Cases and controls were recruited in different hospitals, and an attempt was made to match them by place of origin [see Table S1 in Additional data file 1]. All patients were classified in accordance with the 1987 American College of Rheumatology criteria [12]. Participants gave their informed consent and the ethical committees of participating centers approved the study.

Single nucleotide polymorphism selection
Two separated regions of linkage disequilibrium (LD) in 6q23 were selected for analysis ( Figure 1). The first, of 56 kb, includes the TNFAIP3 gene and the region of high LD with polymorphisms in the gene. The second, of 65 kb, is the LD region that includes the two association peaks from previous studies and that is limited by the recombination hot spots at about 138.002 and 138.067 megabases described in the HapMap CEU data (corresponding to samples with European ancestry) [13]. These two regions were 144 kb apart. Tag-SNPs [see Table S2 in Additional file 1] were selected from the HapMap CEU data using the Haploview software [14] to provide coverage with a pair-wise r 2  0.8 of all of the SNPs with minor allele frequency over 0.05 (TNFAIP3 locus) or 0.1 (intergenic locus). TagSNPs of the intergenic region included the peak SNPs in previous studies [1] and rs13207033, which is a perfect proxy of rs10499194 [2].

Single nucleotide polymorphism genotyping
PCRs were done with the Qiagen Multiplex PCR kit (Qiagen, Valencia, CA, USA) on 30 ng genomic DNA. PCR products were purified by Exo-SAP digestion with Exonuclease I (Epicentre, Madison, WI, USA) and Shrimp Alkaline Phosphatase (GE Healthcare, Barcelona, Spain). Single-base extension reactions were done using the SNaPshot Multiplex Kit (Applied Biosystems, Foster City, CA, USA). Oligonucleotide sequences are presented in the additional materials [see Table S2 in Additional data file 1].

Figure 1
Map of the studied region in chromosome 6q23 Map of the studied region in chromosome 6q23. Recombination hot-spots from the HapMap CEU data are represented as black squares below the rule showing physical distances along the chromosome in kilobases. The positions of the two rheumatoid arthritis-associated peak single nucleotide polymorphisms from previous studies (rs13207033 and rs6920220) are marked by arrowheads. The position of the TNFAIP3 gene and its structure are shown.

Statistical analysis
Hardy-Weinberg equilibrium (HWE) concordance was tested in control samples. LD was analyzed using Haploview [14].  2 tests for the 2 × 2 contingency tables were used to compare allele frequencies. The minor allele of each SNP was taken as reference for all comparisons, and minor allele frequencies are reported in the tables. Allele frequencies of each SNP were compared between controls from each center or region of origin, as a way to detect population heterogeneity. In addition, combination of results after stratification by individuals' origin was done following the Mantel-Haenszel approach, and heterogeneity of effect sizes was explored using the Breslow-Day test. ratio tests for the additive, dominant and recessive genetic models were obtained relative to the co-dominant model. Multivariate logistic regression analysis was used to evaluate the conditional effect of the SNPs. For conditioning on haplotype #5, a new genotype for this haplotype was created with codes 0 (non-carrier), 1 (heterozygote), and 2 (homozygote) for each individual. No problem of colinearity was detected with the inclusion in the model of haplotype #5 and SNP rs582757, which contributes to defining the haplotype, because the same allele of this SNP is present in three other common haplotypes. Stepwise logistic regression with all SNPs was conducted to detect the best multi-SNP models. Haplotypes were estimated using the Phase 2.1 software [15]. Odds ratios (ORs) for each haplotype were calculated taking as reference all chromosomes not bearing the haplotype. A customized version of Statistica 7.0 (Statsoft, Tulsa, OK, USA) was used for analyses except for statistical power, which was estimated using the 'Power and sample size calculations' software [16].

TNFAIP3 locus
Six tagSNPs were sufficient to cover the TNFAIP3 gene and 20 kb of flanking sequences to either side [see Table S2 and Additional data file 1]. Genotypes of these tagSNPs were obtained with a 99.1% call rate, and they were in HWE except for the rs629953 SNP, which was excluded from further analysis.
No significant differences in allele frequencies were found between samples stratified by center of recruitment or region of origin. Analysis of allele frequencies revealed that the minor allele of the rs582757 SNP was significantly less frequent in patients with RA than in control individuals (OR = 0.89; Table  1). Genotype frequency comparisons yielded similar results (not shown). Estimation of the haplotype frequency distribution showed that only six haplotypes accounted for more than 98% of all chromosomes in patients with RA and controls ( Table 2). The rs582757 alleles were distributed in several haplotypes and only the most common haplotype, which was defined by the major alleles of the five tagSNPs, was significantly different between patients with RA and control individuals (haplotype #5 in Table 2). Multivariate analysis combining haplotype #5 and rs582757 SNP genotypes revealed that any of them could account for the association with RA and that the two association signals were not independent (that is, association with the haplotype genotypes was not significant when conditioned on the rs582757 SNP, and vice versa) [see Table  S3 in Additional data file 1]. This finding suggests that association is due to a causal polymorphism that is tagged by some haplotypes containing the T allele of rs582757 (Table 2). No significant change was detected by stratifying by sex (data not shown) or by the presence of rheumatoid factor (RF) or anticitrullinated peptide antibodies (ACPAs) [see Table S4 in Additional data file 1].

6q23 intergenic locus
We have included 10 tagSNPs in addition to the two SNPs that have shown peak association in previous studies in the 6q23 intergenic locus [see Table S2 in Additional data file 1]. Genotypes of these 12 SNPs were in HWE and showed a high call rate (99.0%) in our samples.
Comparison of allele frequencies did not reveal clear association of any of these SNPs with RA (Table 3). Association in the peak rs13207033 SNP was completely absent. The rs6920220 SNP exhibited a trend (P = 0.07) in the same direction that has previously been reported [1,2,4]. Only the Genotype analyses of all of the other SNPs yielded findings similar to those of the allele frequency comparisons, and no differences were found by sex stratification in any of them (not shown). Two of the SNPs in this locus exhibited significant allele frequency differences between recruitment centers (rs6920220, P = 0.01; and rs675520, P = 0.004), and one of them was also different between regions of origin of the samples (rs6920220, P = 0.04). However, these differences did not introduce detectable artefacts in the global results, as shown by the similar results obtained above (Table 3) and with the Mantel-Haenszel approach (OR = 1.12 versus OR M-H = 1.12 for rs6920220; and OR = 0.96 versus OR M-H = 0.97 for rs675520) and lack of significant heterogeneity of the ORs as assessed with the Breslow-Day test (rs6920220, P = 0.5; and rs675520, P = 0.054).
Association of the rs6920220 SNP with RA has previously been reported to be stronger in patients with ACPAs or RF than in the ACPA-negative or RF-negative subgroup [4]. However, we did not detect any significant difference between these patient subgroups [see Table S4 in Additional data file 1]. In contrast, we found that the rs13207033 SNP was associated with RA only in the patients with ACPAs. Three other SNPs also exhibited significant association exclusively in ACPA-positive patients (Table 4). In all of these SNPs, the risk allele was the most common. Conditional logistic regression of these four SNPs, taken two by two, was unable to distinguish between them (data not shown). No association was found when the patient subgroups stratified by RF status were compared with control individuals (data not shown).
Haplotype analysis of the 12 tagSNPs in this locus did not reveal any significant association [see Table S5A in Additional data file 1]. Also, no significant difference was detected in haplotype frequencies between patients with ACPAs and control individuals [see Table S5B in Additional data file 1].

Statistical interaction between the 6q23 intergenic locus and TNFAIP3
We found weak evidence of association with RA both in TNFAIP3 and in the 6q23 intergenic locus. SNPs from the two loci were not in LD (all values of r 2 < 0.07 in our samples). However, lack of pair-wise correlation does not exclude complex interdependence between the loci.
To explore more complex relationships, we used stepwise logistic regression with the 17 SNPs. This unsupervised multivariate process was run both in a forward and in a backward mode. That is, it was run starting with the most associated SNP and adding a SNP in each step until the model was not longer improved, or starting with a model incorporating all SNPs and eliminating the least associated in each step until the model deteriorated. The forward process yielded a best model that combined the rs6920220 SNP from the intergenic region and the rs582757 SNP from the TNFAIP3 locus (P = 0.009). The two SNPs were significantly associated when considered conditional upon the other (P = 0.027 and P = 0.013, respectively). The backward stepwise procedure yielded a best model with three SNPs (P = 0.009): two of the intergenic region, namely rs13207033 and rs609438, and the rs582757 SNP from the TNFAIP3 gene. Each made a significant contribution to RA when assessed conditional upon the other two (P = 0.046, P = 0.019, and P = 0.007, respectively). Therefore, the two procedures showed that the best models differentiating cases and controls include SNPs from the two loci. In addition, they suggested interactions between them because the P values of association for each SNP were lower (more significant) in the multivariate analysis than when taken individually. For example, the rs6920220 and the rs13207033 SNPs were not associated with RA in isolation (P = 0.07 and P = 0.8, respectively), but they were associated in the multivariate models. If these results are confirmed, then they amount to epistasis between the two loci.

Discussion
Association of RA susceptibility to SNPs in the intergenic region of chromosome 6q23 has attracted strong interest in a locus that, because of its lack of coding sequences, is especially difficult to investigate [1,2,4]. This was our motivation to study the clearest candidate among the genes flanking this locus, namely TNFAIP3. The recently reported coincident association in SLE increased our interest and suggested that the genetics of the region could be complex and include the TNFAIP3 gene [6,7].
Our findings regarding the TNFAIP3 locus revealed weak association with RA that could be explained either by the rs582757 tagSNP or by the commonest haplotype. Because of the weakness of the association and the multiple SNPs tested, these findings should be considered tentative. However, confidence in this association is increased by considering the summary statistics from the Wellcome Trust Consortium Case Controls (WTCCC) GWA study [1], which included 1,860 patients with RA and 2,938 healthy control individuals. That study yielded very similar results at the rs582757 SNP (25.0% versus 27.1% in RA patients and control individuals, respectively; OR = 0.90; P = 0.02). Additional preliminary data from a larger study are also concordant with association with RA of SNPs in the TNFAIP3 gene [17]. It is also of interest that a TNFAIP3 SNP with strong correlation (r 2 > 0.9) with the rs582757 SNP is associated with reduced expression of TNFAIP3 and with coronary artery disease in patients with type 2 diabetes [18].
Regarding the 6q23 intergenic region, we found weak association with a previously unreported SNP, rs609438, and replication of association with the rs13207033 SNP. This latter was observed only in patients positive for ACPAs. Previous reports could be interpreted as supporting this preferential association of the rs13207033 SNP, because association with this SNP was much clearer in the study conducted by Plenge and coworkers [2] (P = 4 × 10 -7 ), in which all patients were ACPA positive than in the WTCCC GWA study [1] (P = 0.01), in which the patients were unselected. There are already antecedents of this type of preferential association in relation to the ACPA status of patients with RA, including the shared epitope, the PTPN22 nonsynonymous SNP and IRF5 [19][20][21]. However, specific analysis in other sample collec- tions will be required to confirm this specificity of the rs13207033 association.
An unexpected result was the lack of association with RA of the rs6920220 SNP Association of this SNP with RA has previously been demonstrated in several studies with an overall OR of 1.23 [1,2,4]. Our study had enough power to detect this effect (power = 0.8 for P = 0.01). However, we found a weaker effect (OR = 1.12) than in previous studies. Such a difference in effect size between studies is common, and recent examples have been found in confirmed RA genetic factors such as STAT4 and TRAF1-C5 [5].
In these examples, and in many others, the observed effect sizes were weaker in replication studies than in the discovery study. This phenomenon has been characterized as the 'winner's curse'. It means that findings of discovery studies often overestimate the true associa- Allele frequency differences in tag-single nucleotide polymorphism (SNP) of the intergenic 6q23 region between patients with rheumatoid arthritis positive for anti-citrullinated peptide antibodies (ACPA + cases) and control individuals (controls). MAF, minor allele frequency; n, number of minor alleles; N, total number of alleles.
(page number not for citation purposes) tion because they are conditional on those studies being the first to detect the association [22]. An alternative explanation for the lack of association in our study could be related to the differences in rs6920220 allele frequencies that were observed between recruitment centers. However, we consider this to be unlikely because the global comparison between patients with RA and control individuals showed that they were identical when the samples were taken as a whole or when they were stratified by place of origin.
The two large and comprehensive SLE studies of the 6q23 region have yielded multiple independent association signals, including polymorphisms in the TNFAIP3 gene [6,7]. At least three independent signals were detected in each of the two studies, although they were not completely concordant between each other or with the peak associations previously reported in RA. Our multivariate logistic analyses produced similar results, suggesting that polymorphisms in the two loci contribute to RA susceptibility. It therefore seems possible that this region contains multiple genetic factors shared by RA and SLE. Our data do not allow us to be more conclusive.
TNFAIP3 is a clear candidate for a role in RA by virtue of the anti-inflammatory effects of its encoded protein. It is involved in many regulatory feedback loops through the cooperative activity of its two ubiquitin-editing domains [9]. TNFAIP3 protein levels are drastically increased upon NF-B stimulation by various factors, including tumor necrosis factor and interleukin-1 [9,10]. Once upregulated, TNFAIP3 inhibits NF-B activity at multiple levels. Therefore, it seems likely that polymorphisms that reduce expression or function of TNFAIP3 will favor exaggerated inflammatory responses that may contribute to RA development and expression. However, our study has only provided suggestive evidence of the involvement of this gene in RA susceptibility. This evidence is reinforced by data from the WTCCC GWA study [1] and preliminary data presented at the 2008 American College of Rheumatology meeting [17]. Investigation of the 6q23 region should proceed with increased enthusiasm, given its likely involvement in multiple immune-mediated diseases and the possible involvement of TNFAIP3 -an important regulator of the NF-B pathway.

Conclusions
We have found evidence of multiple RA genetic factors in the 6q23 region including polymorphisms in the TNFAIP3 gene. These factors appear to be shared with SLE susceptibility. Involvement of TNFAIP3 is of practical interest, given its inhibitory effect on the NF-B pathway. Nevertheless, there remain many aspects that require further analysis: confirmation of our results, delineation of genetic influences on specific RA subphenotypes, and identification of the functional variants in this locus and their effects.
The following Additional files are available online:

Additional file 1
A Microsoft Word document that contains the following tables: Table S1 (distribution of samples by recruitment hospital), Table S2 (details of the SNPs that were studied and the oligonucleotides that were used), Table  S3 (conditional analysis between the rs582757 SNP and the most common haplotype in the TNFAIP3 locus), Table S4 (results for each SNP stratified by ACPA or RF status), and Table S5 (haplotype analysis of the SNPs in the intergenic 6q23 region). See http://www.biomedcentral.com/content/ supplementary/ar2650-S1.doc