Polymorphisms in peptidylarginine deiminase associate with rheumatoid arthritis in diverse Asian populations: evidence from MyEIRA study and meta-analysis

Introduction The majority of our knowledge regarding disease-related mechanisms of uncontrolled citrullination and anti-citrullinated protein antibody development in rheumatoid arthritis (RA) was investigated in Caucasian populations. However, peptidylarginine deiminase (PADI) type 4 gene polymorphisms are associated with RA in East Asian populations and weak or no association was found in Caucasian populations. This study explores the association between the PADI4 polymorphisms and RA risk in a multiethnic population residing in South East Asia with the goal of elucidating generalizability of association in non-Caucasian populations. Methods A total of 320 SNPs from the PADI locus (including PADI1, PADI2, PADI3, PADI4 and PADI6 genes) were genotyped in 1,238 RA cases and 1,571 control subjects from the Malaysian Epidemiological Investigation of Rheumatoid Arthritis (MyEIRA) case-control study. Additionally, we conducted meta-analysis of our data together with the previously published studies of RA from East Asian populations. Results The overall odds ratio (ORoverall) for the PADI4 (rs2240340) allelic model was 1.11 (95% confidence interval (CI) = 1.00 to 1.23, P = 0.04) and for the genotypic model was 1.20 (95% CI = 1.01 to 1.44, P = 0.04). Haplotype analysis for four selected PADI4 SNPs revealed a significant association of one with susceptibility (P = 0.001) and of another with a protective effect (P = 0.02). The RA susceptibility was further confirmed when combined meta-analysis was performed using these data together with data from five previously published studies from Asia comprising 5,192 RA cases and 4,317 control subjects (ORoverall = 1.23 (95% CI = 1.16 to 1.31, Pheterogeneity = 0.08) and 1.31 (95% CI = 1.20 to 1.44, Pheterogeneity = 0.32) in allele and genotype-based models, respectively). In addition, we also detected a novel association of PADI2 genetic variant rs1005753 with RA (ORoverall = 0.87 (95% CI = 0.77 to 0.99)). Conclusion Our study demonstrates an association between PADI4 and RA in the multiethnic population from South East Asia and suggests additional association with a PADI2 gene. The study thus provides further support for the notion that polymorphisms in genes for enzymes responsible for citrullination contribute to RA development in multiple populations of Asian descent.


Introduction
Most studies of the association between rheumatoid arthritis (RA) and genetic factors have focused on a group of human leukocyte antigens in the major histocompatibility complex [1] and a detailed account of the contributions from different major histocompatibility complex genes and their structural correlates was recently published for a number of Caucasian populations [2]. Additional contributions from more than 30 different non-HLA loci have been demonstrated, mainly in populations of Caucasian origin [3]. Important differences for RA susceptibility genes have, however, been described between Caucasian and non-Caucasian populations, as seen both from which HLA alleles are associated with disease [4][5][6][7] and from associations with non-HLA genes. A particular interesting difference between Caucasian and Asian populations has been demonstrated in genes from a peptidylarginine deiminase (PADI) locus, where a polymorphism was first demonstrated in a Japanese population [8] and later confirmed in additional Japanese and Korean populations [9][10][11][12]. These polymorphisms are of particular interest for the pathogenesis of RA since PADI4 and other PADI enzymes catalyze change from peptidylarginine to peptidylcitrulline, a target of anti-citrullinated protein antibody (ACPA), through a post-translational modification process referred to as citrullination [13,14].
The associations between PADI4 polymorphisms and RA have so far focused on the Japanese and Korean populations [9,11,15]. The effect of PADI4 polymorphisms on RA risk, however, remains unclear in the Han Chinese population [16,17]. An association between PADI4 and RA has also been observed in German and North American populations [18,19], while such an association has not been replicated in other Caucasian populations (for example, British, Spanish, Swedish and Hungarian) [19][20][21][22][23], despite a comparable allele frequency between these Asian and Caucasian populations. The largest study performed in the UK population with over 19,000 subjects found no evidence for association between the PADI4_94 SNP (rs2240340) and RA. In a meta-analysis on previously published European studies together with this UK study, the association between the PADI4_94 genotype and RA was weak and statistically not significant (odds ratio (OR) 1.06, 95% confidence interval (CI) = 0.99 to 1.13, P = 0.12) [23]. There is thus a need to further investigate the impact of genetic variations in PADI-associated genes in additional populations. Since ethnic differences are likely to go hand in hand with different environmental exposures, it will be helpful to study it in more divergent populations with some similarities in genetic background.
The PADI gene region is located at chromosome 1p36. This locus contains the cluster of all the PADI genes (PADI1 to PADI4 and PADI6). Of the five isotypes of PAD protein, PAD2 and PAD4 have been reported to be expressed as active enzymes in RA synovium, where citrullination of matrix proteins could potentially create antigenic peptides [24,25] and where increased citrullination has been demonstrated in inflamed synovial tissues [26,27].
In the present study, which was performed in the South East Asia region, the aim was to determine whether the association between the PADI4 polymorphisms and RA risk could be generalized to the Malaysian populations with Malay, Chinese and Indian ethnicity. We also investigated multiple SNPs from the locus (PADI1 to PADI4 and PADI6) in this multiethnic case-control study involving early RA.

Study population
The source of data for our investigation was the multicenter Malaysian Epidemiological Investigation of Rheumatoid Arthritis (MyEIRA) case-control study, comprising 1,238 cases of RA and 1,571 control subjects. The demographic characteristics of RA cases and controls are shown in Table 1. Of the 1,238 cases, 516 (41.7%) were Malays, 255 (20.6%) were Chinese, 379 (30.6%) were Indians and 88 (7.1%) were of other or mixed ethnicities from South East Asia. The details of the MyEIRA study have been described elsewhere [6]. Briefly, all RA cases were diagnosed by rheumatologists according to the 1987 revised American College of Rheumatology criteria [28]. The disease duration for the RA cases was on average 1 year (interquartile range = 2 years). For each potential case, a control subject was randomly selected from the population, taking into consideration the subject's age, sex and residential area. Of the 1,571 controls, 986 (62.8%) were Malays, 206 (13.1%) were Chinese, 285 (18.1%) were Indians and 94 (6.0%) were of other sub-ethnicities. We performed case-control association analyses with regard to the influence from different genetic factors for each ethnic group separately as well as for the meta-analysis where all four ethnic groups were included. The study was approved by the Medical Research and Ethics Committee, Ministry of Health, Malaysia, and written informed consent was obtained from all participants.

DNA extraction, selection of markers and genotyping
The genomic DNA was extracted using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany). All DNA samples were stored at -20°C until testing. We investigated 320 SNPs selected from the PADI locus on Immunochip and from other studies [8,[19][20][21]29]. The list of the genotyped PADI SNPs is presented in Table S1 in To assess genotyping robustness, comparisons were made between the TaqMan assay and the Illumina iSE-LECT HD custom genotyping array (Immunochip) assay regarding PADI4_94 (rs2240340), which resulted in a 100% match of the genotyping calls for both RA cases and controls.

Statistical analysis
Genotype, allele and haplotype frequencies were assessed with Yate's chi-square and/or Fisher's exact test when appropriate by means of IBM SPSS Statistics 20.0 software (SPSS Inc., Chicago, USA). The frequencies of the alleles and genotypes of PADI SNPs were compared between RA cases and control subjects and ORs with 95% CIs were calculated. Haplotype analysis was carried out by Haploview [30]. Power calculation was performed for a one-tail test at a significance level of 0.05. For metaanalysis, the Mantel-Haenszel method was employed with a fixed-effects model and 95% CI for cumulative or overall odds ratio [31]. The significance of the cumulative OR was determined by the Z-test. The between-study heterogeneity was assessed using the Cochran Q-statistic (P <0.10 considered significant). In addition, the I 2 metric [I 2 = (Q -df)/Q] was used to describe the percentage of variation across the studies due to heterogeneity. I 2 values of 25%, 50% and 75% were assigned as low, moderate and high estimates, respectively.

Characteristics of RA cases and controls
Among the RA cases, the overall mean ± standard deviation age was 48 ± 11.6 years, 86.3% were female, 64.5% were ACPA-positive, 40.1% were HLA-DRB1 SE-positive, and 51.5% were rheumatoid factor-positive. The mean ± standard deviation age of control subjects was 47 ± 11.4 years and 87.5% were female. The distribution of ethnic groups (Malay, Chinese, Indian and other or mixed ethnicities) is shown in Table 1.

Evaluation of the PADI gene polymorphisms and RA association in the MyEIRA study
Collectively, our study has 88% statistical power to detect genotype/minor allele frequency differences of the magnitude reported in the initial positive study in a Japanese population [8].
Overall, the single-point analyses of variations in the PADI1, PADI2, PADI3, PADI4 and PADI6 genes revealed a modest genetic effect size in the RA population and showed that the peak association varies between different ethnic groups. For example, the rs2526839 variant showed the peak association signal in Malay RA patients (OR = 1.24, 95% CI = 1.04 to 1.43, P = 0.0126), the rs3003444 variant in Chinese RA patients (OR = 1.60, 95% CI = 1.20 to 2.13, P = 0.0013) and the rs113475583 variant in Indian RA patients (OR = 1.91, 95% CI = 1.19 to 3.07, P = 0.0064) (see Figure S1 in Additional File 1).

PADI2 gene polymorphism as a risk factor in RA development
The PADI2 gene encodes PAD2 enzyme, which is the most widely expressed family member of PAD. An association between RA and the PADI2 variant was found in a Korean population [32]. We undertook this study for further investigation of whether PADI2 polymorphisms are also at risk for RA development in three independent Asian populations. Interestingly, the PADI2 variant meta-analysis demonstrated a possible novel association of the PADI2 rs1005753 variant with RA, both in the allelic model (OR overall = 0.88, 95% CI = 0.77 to 0.99, P = 0.04) and in the dominant/recessive genotype model (OR overall = 0.84, 95% CI = 0.72 to 0.99, P = 0.04) ( Figure 1). In this multiethnic case-control study, no evidence of heterogeneity was found for the PADI2 rs1005753 variant (P heterogeneity = 0.43, I 2 = 0%).
Generalizability of PADI4 gene polymorphisms as a risk factor in RA development A summary of the MyEIRA meta-analysis for the PADI4 polymorphism with RA is presented in Figure 2. The MyEIRA meta-analysis suggests an association of the PADI4 rs2240340 variant with RA also found in the Malaysian population (OR overall = 1.11, 95% CI = 1.00 to 1.23, P = 0.04). The meta-analysis also revealed a significantly increased OR (OR overall = 1.20, 95% CI = 1.01 to 1.44, P = 0.04) when applying a dominant/recessive genotype model for the PADI4 rs2240340 variant, suggesting that the PADI4 is associated with RA in the Malaysian population of Asian descent.
To clarify the role of the PADI4 gene as a possible susceptibility factor for RA development in different Asian ethnic populations, we performed a combined meta-analysis using our current data together with the previously published studies from Asia. We searched information from the PubMed database, ISI Web of Knowledge and Google. Six published original case-control studies related to PADI4 polymorphisms in RA in three different Asian populations were identified: three Japanese studies including the first positive report, one Korean study and two Chinese studies [8][9][10][11]16,17]. All studies had used the defined American College of Rheumatology criteria [28] for the diagnosis of RA as compared with our study. The strongest evidence of association reported in the first study was given by PADI4_94 (rs2240340); we therefore restricted the current combined meta-analysis to this genetic variant. Of the six published papers, the Korean study did not genotype for PADI4 rs2240340. Instead, we used genotype data of PADI4_89 (rs11203366). This choice was made because these SNPs are in strong linkage disequilibrium (r 2 = 1.00) and the minor allele frequency of these SNPs are nearly equal according to the International Human Genome Project Chinese and Japanese populations (International HapMap CHB + JPT) (Asian population). Five studies were included in the combined meta-analysis together with our new data for PADI4 rs2240340 [8][9][10][11]16]. The PADI4 SNPs investigated in Fan and colleagues' study were not analyzed because they exhibited significant deviation from Hardy-Weinberg equilibrium (P <0.001) [17], which questions the validity of the genotyping data.

Stratified analysis
Stratified analyses were performed to detect association between the PADI gene variants and RA risk within specific population subgroups as well as in relation to more homogeneous subsets of RA cases. Stratifying by sex, carriage of SE alleles, presence of ACPA or presence of rheumatoid factors among the cases revealed no evidence of different associations between the compared subgroups (data not shown). Nevertheless, it is noteworthy that a set of different SNPs variants were associated with risk of developing ACPA-positive and ACPA-negative RA within specific subpopulations, although the results were not significant after correction for multiple comparisons (Table 2). It would be interesting to investigate the association between RA and these SNP variants in other populations with more data.

Haplotype analysis and association test
We first performed the univariate single-population analyses for the association of PADI SNPs with RA, and subsequently those SNPs with statistical significant effects were selected for further haplotype analysis. However, we excluded the SNPs with inconsistent effects within the four ethnic groups, and also those SNPs with a significant level not low enough (P <0.05) in one or more ethnic groups. This analysis strategy led us to four PADI4 SNPs for haplotype analysis (rs79907974, rs2240340, rs1748021 and rs2240337). When haplotypes were constructed using these four PADI4 SNPs, seven haplotypes with frequency >1% out of 16 expected haplotypes were found in the study populations (Table 3). Haplotype GCGG was omitted from the association test because it was only found in the Indian ethnic group with a frequency of 5%. When meta-analyses for different ethnic groups in our study were conducted, ATAA and ATAG showed a  significant difference between RA cases and controls (P = 0.02 and P = 0.001, respectively; Table 3) with no significant heterogeneity (P heterogeneity = 0.78, I 2 = 0%). These haplotypes represent either protective (ATAA) or susceptible (ATAG) variants. We further tested these haplotype associations with different RA subsets defined by ACPA status. The results showed consistent effects in the RA subsets, and the OR overall was comparable between ACPApositive and ACPA-negative RA subsets (see Table S2 in Additional File 1).

Discussion
In this study, we used a multiethnic population of Asian descent in Malaysia to determine the association between PADI4 polymorphisms and risk of RA, and convincingly validated this association. Our current data, together with the previously published data on Asian populations, strongly support PADI4 as a RA susceptibility gene in different ethnic populations of Asian descent. Collectively, our data extend previous results on PADI4 and RA based on Asian populations, mainly in Japanese and Korean populations [8][9][10][11]15,33], to be observed in another population of Asian origin (that is, Malay, Chinese and Indian ethnicity). Notably, a possible novel association between the PADI2 genetic variant and RA risk was also found in the multiethnic Malaysian population.
The combined meta-analysis was performed regardless of the ACPA status, since such data were not available in the previous published papers. However, our present study showed a comparable minor allele frequency between the RA subsets defined by ACPA status, thus suggesting that the risk from PADI locus is likely to be common for the two main subgroups of RA.
Previously, the association studies of the PADI4 polymorphism and RA found in Asian populations were predominantly focused on Japanese and Korean populations [8][9][10][11]15], which are geographically and historically closely related and genetically quite similar to each other [34]. The results, however, were inconsistent across two independent studies in Han Chinese populations [16,17]. Malaysia is a multiethnic country in South East Asia representing genetic diversity across multiple large ethnic populations of Asian origin (that is, Malays, Chinese and Indians). In our study, we were able to address the question of whether PADI4 polymorphisms confer a risk of RA in this diverse population. We found that the PADI4_94 polymorphism is associated with an increased risk of developing RA in this population. Interestingly, as can be seen from Figure 3, the effect for the Han Chinese population is probably lower than for other Asian populations, which may be an explanation for previous inconsistency in published data.
Haplotype analysis did not reveal a higher effect size in comparison with univariate SNP analyses in our study. Nevertheless, meta-analysis of PADI4 haplotypes with the current data implied that association with RA was  probably driven by the rs2240337 variant. This genetic variant, however, has low minor allele frequency in our materials (that is, minor allele frequency in Malay = 0.059, Chinese = 0.066, Indian = 0.032 and others = 0.042). Noteworthy is that this SNP variant, on the contrary, is in complete linkage disequilibrium (D' = 1.00) with the PADI4 SNP rs766499 variant, which was reported recently to be associated with the Japanese RA population at a genome-wide level of significance in meta-analysis [12].
An additional novel finding in our study is the discovery of a possible association between the PADI2 genetic variant (rs1005753) and RA, which was observed both in allele and genotype models. The genetic effect observed is possibly due to an indirect association, where the identified genetic variants (that is, PADI2) by themselves are not functional but are in linkage disequilibrium with a causal variant polymorphism such as the PADI4 gene. However, this would be unlikely as the PADI2 genetic variant (rs1005753) did not show any linkage disequilibrium with the numerous investigated SNPs spanning between PADI1 and PADI6 in the present study. We observed r 2 <0.01 for all ethnic groups studied. A previous study by Freudenberg and colleagues reported an association between the PADI2 genetic variant (rs2075696) and RA in a Korean study [32]. Interestingly, the rs1005753 variant associated in our study and the rs2075696 variant are located in two different linkage disequilibrium blocks in the PADI2 gene locus. The International HapMap CHB+JPT project showed no linkage disequilibrium relationship between these two SNPs (D' = 0.0050, r 2 = 0.0). Together, our data may suggest a possible risk effect between the rs1005753 variant and RA in the Malaysian population.
Worthy of mention is that PADI2 and PADI4 are the only two genes that are highly expressed in hemapoetic cells [35]. In RA, the expression levels of PAD2 and PAD4 were correlated with the intensity of inflammation, and both enzymes were demonstrable within or in the vicinity of citrullinated fibrins deposits [36]. The PADI2 gene encodes the PAD2 enzyme, is perhaps playing a role in RA pathogenesis on its own, independently from PADI4. Nevertheless, it is important to perform further extension of the study to gain better statistical power to investigate this association in a single-population analysis and further to replicate the association in independent cohorts of Asian origin. Since the difference between Caucasian and Asian populations may represent both genetic and environmental heterogeneity, it is logical to propose a gene-environmental interaction study for PADI genes in the development of RA.

Conclusion
This study demonstrates an association between PADI4 polymorphisms and RA in the Malaysian population.
The currently updated combined meta-analysis of Asian populations further supports the hypothesis that the PADI locus contributes to the development of RA in different Asian populations and that this genetic effect is generalized to multiple ethnic populations of Asian descent.

Additional material
Additional file 1: Table S1 presenting a list of PADI SNPs investigated in the MyEIRA study population. A complete set of 320 SNPs selected from the PADI locus on Immunochip and from other studies. The SNPs were genotyped either using the TaqMan SNP genotyping assay (Applied Biosystems, USA) or by Illumina iSELECT HD custom genotyping array (Immunochip). Table S2 presenting the haplotype frequencies and meta-analysis of PADI4 polymorphisms in the MyEIRA study by ACPA status. The haplotype analysis and meta-analysis of PADI4 polymorphisms in the MyEIRA study were performed in different subsets of RA defined by ACPA status. Bold results indicate significant association between the PADI haplotypes and subsets of RA. Figure S1 showing the regional association plots with recombination rate on the PADI genes for the three major ethnic groups from the MyEIRA study. Regional association plots on the PADI genes including PADI1, PADI2, PADI3, PADI4 and PADI6 for the three major ethnic groups from MyEIRA study showing the peak association in each ethnic group. Graphs centered on the most significant SNP in each ethnic group. The r 2 values (linkage disequilibrium between the most significant SNP and the rest of SNPs in the region) are calculated on the MyEIRA data and the recombination rates are based on the International HapMap CHB+JPT data.