Meta-analysis of genome-wide association study identifies FBN2 as a novel locus associated with systemic lupus erythematosus in Thai population

Background Differences in the expression of variants across ethnic groups in the systemic lupus erythematosus (SLE) patients have been well documented. However, the genetic architecture in the Thai population has not been thoroughly examined. In this study, we carried out genome-wide association study (GWAS) in the Thai population. Methods Two GWAS cohorts were independently collected and genotyped: discovery dataset (487 SLE cases and 1606 healthy controls) and replication dataset (405 SLE cases and 1590 unrelated disease controls). Data were imputed to the density of the 1000 Genomes Project Phase 3. Association studies were performed based on different genetic models, and pathway enrichment analysis was further examined. In addition, the performance of disease risk estimation for individuals in Thai GWAS was assessed based on the polygenic risk score (PRS) model trained by other Asian populations. Results Previous findings on SLE susceptible alleles were well replicated in the two GWAS. The SNPs on HLA class II (rs9270970, A>G, OR = 1.82, p value = 3.61E−26), STAT4 (rs7582694, C>G, OR = 1.57, p value = 8.21E−16), GTF2I (rs73366469, A>G, OR = 1.73, p value = 2.42E−11), and FAM167A-BLK allele (rs13277113, A>G, OR = 0.68, p value = 1.58E−09) were significantly associated with SLE in Thai population. Meta-analysis of the two GWAS identified a novel locus at the FBN2 that was specifically associated with SLE in the Thai population (rs74989671, A>G, OR = 1.54, p value = 1.61E−08). Functional analysis showed that rs74989671 resided in a peak of H3K36me3 derived from CD14+ monocytes and H3K4me1 from T lymphocytes. In addition, we showed that the PRS model trained from the Chinese population could be applied in individuals of Thai ancestry, with the area under the receiver-operator curve (AUC) achieving 0.76 for this predictor. Conclusions We demonstrated the genetic architecture of SLE in the Thai population and identified a novel locus associated with SLE. Also, our study suggested a potential use of the PRS model from the Chinese population to estimate the disease risk for individuals of Thai ancestry.


Background
The systemic lupus erythematosus (SLE) is a systemic autoimmune disease characterized by loss of tolerance to self-antigens, inappropriate immune activation, and inflammation [1]. The severity is various depending on affected organs [2]. Genetic susceptibility has been widely accepted as one of the critical factors driving disease development [2]. Recently, the genetic architecture of SLE has been examined worldwide [3]. Using GWAS, more than 90 loci have been found associated with SLE across at least four ethnic groups, including Han Chinese, European, North America, and Africa [4,5]. The strongest signal was identified at the HLA class II allele, which replicated in all of the different populations [4]. These findings indicate critical biological mechanisms underlying the disease, which will be the candidate in further functional studies [6].
However, heterogeneity of disease between different ethnicities drives a question of whether genetic background in different ancestries could affect the clinical manifestations. It is known that Asian SLE patients have higher disease severity compared to Europeans [2]. However, only a few studies on SLE associations that were based on candidate genes were performed in the Thai population [7][8][9][10][11]. In this study, we conducted GWAS using the SLE samples collected from two tertiary referral hospitals in Thailand. We aim to replicate known SLE-associated variants in the Thai population and identify novel SNPs associated with SLE.

Sample collection and preparation
We calculated the power of our study by using an online tool called Genetic association study (GAS) Power Calculator [12]. With 800 cases and 1600 controls at 5E−08 significant level, we obtained 0.934 expected power for the study. The EDTA blood samples from SLE patients (n = 487) were collected at King Chulalongkorn Memorial Hospital as cases for the discovery phase. All procedures were approved by the ethical committee from the Faculty of Medicine, Chulalongkorn University (COA no. 923/2017). For the replication cohort, the samples (n = 405) were collected from the Rheumatology clinic, Ramathibodi Hospital, with ethical approval from the Faculty of Medicine, Mahidol University (MURA no. 2015/731, EC no. 590223, Protocol-ID 12-58-12). All patients were carefully recruited regarding the criteria from the American College of Rheumatology (ACR) [13]. Patients' demographic data from both datasets have been summarized in Table 1. For healthy controls (n = 1606) and unrelated disease controls including breast cancer, periodontitis, tuberculosis, drug-induced liver injury, epileptic encephalopathy, dengue hemorrhagic fever, thalassemia, and cardiomyopathy (n = 1590), data were provided from the Department of Medical Science, Ministry of Public Health, Thailand.

DNA extraction
Buffy coats were extracted using the QIAGEN® EZ1® DNA blood kit (QIAGEN GmbH, Hilden, Germany). We used 200 μl of a buffy coat as recommended by the manufacturer's instruction. Buffy coat samples were transferred into tube or sample cartridge for EZ1 Advanced XL (QIAGEN GmbH, Hilden, Germany) and extracted using EZ1® Advanced XL DNA Buffy coat protocol. From this protocol, DNA was eluted at 200 μl. DNA was diluted and quantitated using Qubit™ dsDNA BR Assay Kit according to the manufacturing protocol (Invitrogen, Thermo Fisher Scientific, MA, USA).

Genotyping and quality control
Genotyping was performed using Infinium Asian Screening Array-24 v1.0 BeadChip with 659,184 SNPs (Illumina, San Diego, CA, USA) at the Department of Medical Sciences (DMSC, Ministry of Public Health, Thailand) based on the protocol recommended by the manufacturer. The Genome Studio data analysis software v2011.1 (Illumina, San Diego, CA, USA) was used for calling genotypes. Samples and SNP markers were tested for quality control (QC) using PLINK genomic analysis software (v1.90b5.4) [14]. Individuals with autosomal genotype call rate ≤ 0.98, gender inconsistency based on heterozygosity rate of X chromosome (mal-eTh = 0.8, femaleTh = 0.2), and genome-wide estimates of identity-by-descent (pihat) ≥ 0.185 (3rd generation) were excluded from analysis. SNPs with more than 5% missing genotyping rate or significant deviation of Hardy-Weinberg equilibrium (p value ≤ 1 × 10 −8 ) were also removed. After quality control (QC), we obtained a dataset of 2041 individuals with 421,909 variants for the discovery phase and 1955 individuals with 446,139 variants for replication. The flow diagram of the analysis process is shown in Fig. 1a.

GWAS data imputation
Pre-phasing was performed using SHAPEIT [16]. After that, genotype data for individuals was imputed to the density of the 1000 Genomes Project Phase 3 reference using IMPUTE2 [17]. After all the QC processing, 6,657, 806 were left for further analysis. The processed data were publicly available at http://2anp.2.vu/GWAS_SLE_ Thailand.

Association study, meta-analysis, and statistical analysis
The association studies were conducted by using SNPT EST [18], and the factored spectrally transformed linear mixed models (FaST-LMM v.0.2.32) program [19]. The results from FaST-LMM were analyzed and visualized by RStudio to obtain genomic inflation factor (λ), quantile-quantile plot, and Manhattan plot [20]. The SNPs with p value ≤ 1 × 10 −5 were plotted to obtain the regional plot by using LocusZoom [21]. Haplotype block and linkage disequilibrium structure were analyzed by Haploview software version 4.2 [22]. The characterized SLE susceptible loci were downloaded from a previous study [23] and GWAS catalogue (the NHGRI-EBI catalogue of published genome-wide association studies). Meta-analysis was studied based on the inverse variant strategy in the METAL program [24]. The genetic inheritance pattern was calculated from the frequency of different genotyping on risk alleles using R-Bioconductor. Simultaneously, functional annotation was predicted by using SNPnexus, which applied data from the Reactome database [25]. The histone markers and transcription factor binding sites were predicted from an online tool called HaploReg V4.1 [26].

Polygenic risk score calculation
Lassosum [27] was used to calculate PRS for individuals. The summary statistics for SLE association in East Asians [28], involving 2618 cases and 7446 controls with Chinese ancestry, were used to train the model. The area under the ROC curve (AUC) was calculated using R package pROC [29].

Known SLE associations found in the Thai population
In the discovery dataset, the association studies were initially performed using healthy controls (n = 1606) and SLE patients (n = 487) collected from King Chulalongkorn Memorial Hospital. Regarding the result, we found that variants at the HLA class II regions were strongly associated with SLE (p value < 5E−08). Similarly, GWAS from 405 SLE cases and 1590 non-immune-mediated disease controls found variants at the HLA class II regions reached the genome-wide significant threshold (p value < 5E−08). Our findings were consistent with previous reports in other ethnic groups [30]. Inflation factors from both datasets were calculated as reported in Supplementary figure 1. Subsequently, a meta-analysis of the two Thai GWAS was carried out, and we systematically examined associations across the 90 known SLE-associated loci, which were collected from the GWAS catalogue (https://www.ebi.ac. uk/gwas/) and previous review articles [23]. Of these loci, the HLA-DQA1, HLA-DRB1, STAT4, FAM167A-BLK, and GTF2I loci have reached the genome-wide significant threshold (p value < 5E−08; Fig. 1b, Table 2) in Thai population, and the variants at the PROS1C1, NOTCH4, HCP5, C6orf10, TAP2, TNFSF4, RasGRP3, TERT, TNPO3-IRF5, CXCR5, GPR19, SLC15A4, and ITGAM loci showed suggestive evidence of associations with SLE (p value < 5E −05, Supplementary Table 1). These loci have been found in several ancestries, including Han Chinese, Korean, North American, European, African, and Hispanic populations [31,32].

Identification of novel loci associated with SLE
Excluding the variants at the known SLE-associated loci, we discovered a novel variant on FNB2 (rs74989671, OR = 1.54, p value = 1.61E−08) specifically associated with SLE in Thai population (Figs. 1b and 2a, Table 2) when comparing the association in Europeans (OR = 0.998, p value = 0.979) and in Chinese populations (OR = 0.982, p value = 0.692) [28]. Further analyses based on different genetic inheritance models suggested that the disease risk was associated with the copy number of risk alleles that the individuals carried (additive model) ( Table 4). Three SNPs on FBN2 loci (rs74989671, rs35983844, rs6595836) showed linkage disequilibrium (LD r 2 = 0.82) (Fig. 2b, Supplementary Table 1). Of these variants, rs74989671 was found to locate within the peak of H3K36me3 derived from CD14-positive monocytes and H3K4me1 (associated with active enhancers) derived from the primary T cells (Fig. 2c).
In addition, we found variants at the ATP6V1B1,  Table 1). Though these polymorphisms are likely to associate with Thai SLE patients, an independent GWAS dataset of SLE patients with Thai background is needed for further validation.

In silico functional annotation of SLE-associated variants in Thai population
To understand the biological meaning underlying the SLE-associated loci in the Thai population, we performed the pathway analysis using the SNPnexus program [25]. Variants with p value < 5E−05 were involved in this study. Notably, we found that 50% of all variants were located within the coding region, by which 10% is nonsynonymous polymorphisms. Pathway analysis results revealed that SLE-associated variants were highly enriched in the regulation of interferon signaling, PD-1 signaling, MHC-class II antigen presentation, TCR/BCR signaling, cytokine signaling, TNF signaling, NOTCH4 signaling, calcium- Risk alleles f p value of heterogeneity activated potassium channels, and cell-cell junction organization pathways. Furthermore, we found that extracellular matrix organization was significant in our results (Fig. 3). It indicated that Thai SLE patients might have a higher risk of fibrosis-associated inflammation.

Polygenic risk score prediction for the individuals
To apply the GWAS result to predict the Thai SLE outcome, we also tested the hypothesis of whether the PRS models trained by individuals with Chinese ancestry could be applied for Thai SLE patients. We calculated PRS for individuals in the Thai GWAS, based on the training data from the Chinese population (2618 cases and 7446 controls) [28]. Significantly, the PRS for SLE cases were higher than controls (mean difference = 0.89; p value = 2.2E−16; Fig. 4a), and the area under the receiver-operator curve (AUC) achieved 0.76 for this predictor. This analysis indicated the potential application for the PRS in the Thai population, based on the results from other Asian populations. Regarding the analysis, this might be a clue for predicting an outcome of SLE clinical characteristics in Thai SLE patients, and it is a good source for further genetic analysis to identify actual SLE pathogenesis in the different ancestry.

Discussion
The present study is the first largest GWAS cohort conducted among Thai SLE patients. The highest significant association in the region of HLA class II was consistent with previous reports from other ethnic groups [30].
Since there are vast differences of HLA class II allele frequency among populations and sophisticated genetic structure, the study of the specific HLA class II haplotype is needed. Currently, there were two publications reported about specific HLA haplotypes in Thai SLE patients. First, the fine genetic mapping of the HLA allele from SLE patients in the northern part of Thailand has identified the association of HLA-DRB5*01:01 and HLA-DRB1*16:02 [34]. Secondly, HLA haplotype analysis found HLA-DRB1*15:02 and HLA-DQ*05:01 associated with Thai SLE patients [35]. Further study on the HLA class II allele using whole-genome sequencing or exome sequencing might be helpful to specify the impact of the HLA class II allele on Thai SLE patients. Apart from the HLA class II alleles, our study found variants in STAT4, GTF2I, and BLK regions. For BLK locus, this gene encoded for Src-tyrosine kinase, which is an important signaling molecule under B cell development [36]. This gene showed protein-protein interaction with BANK1 (B cell-specific cytoplasmic protein involved in B cell receptor signaling) and might plausibly involve in dysregulation of the B cell receptor, which is a common feature found in SLE patients [37]. For STAT4 and GTF2I alleles, these genes are encoded for the transcription factors that mediate many immune-related genes and inflammatory cytokine transcription machinery. Both BLK and STAT4 loci have been reported as SLE susceptible alleles in Thai SLE patients recently [7], whereas GTF2I locus has firstly identified in our study. Interestingly, the variants on STAT4 and GTF2I loci were correlated with lupus nephritis (LN) in the various SLE ancestries [32]. The GTF2I allele was likely to be specific in Asian background, mainly Han Chinese [38].
Our analysis found several LN-susceptible loci such as IRF5 [39,40], ITGAM [9,41], IKZF1 [42], and TNFSF4 [43]. While IKZF1 is a co-transcription factor with STAT-4 mediated Th1 lymphocyte differentiation and interferon pathways [44], the TNFSF4 locus, also called OX40L, encoded for the TNF superfamily ligand, which actively stimulates CD4+ T cell activation [43]. Study in the Finnish and Swedish SLE patients found the correlation of ITGAM with cutaneous discoid lupus erythematosus (DLE) and LN as well as Ro/SSA auto-antibody positive [45]. Not only LN, but we also found several loci that have been verified in the specific sub-phenotype of SLE patients. For example, our result found a variant on ETS1, which previously showed association with juvenile SLE, as well as a variant on RasGRP3, which was involved in malar rash or discoid rash [42]. The recent SLE susceptible loci identified in the cardiac manifestation of neonatal lupus, NOTCH4, was found in our results [46].
Note that we found some of the known SNPs which are nonsynonymous variants such as NCF2 [47], IFIH1 [48], TNFAIP3 [49], UHRF1BP1 [50], ATG16L2 [51], and PLD4 [52]. A few pieces of evidence have revealed the impact of those variants on various pathways including neutrophil extracellular traps (NETs) formation [53], sensor molecule to detect viral genome inside cells [54], a negative regulator for NF-kB transcription factors [55], and a negative regulator of cell growth [56]. These pathways resembled with our functional enrichment pathways analysis. Interestingly, our results found extracellular matrix organization (ECM) pathways associated with Thai SLE patients. Previously, single-cell transcriptome analysis in non-responder LN patients highlighted the upregulated genes in the ECM pathway correlated with treatment failure [57]. The ECM reflected the active fibrotic process, which was a marker of poor prognosis LN [58]. Remarkably, the prevalence of severe LN was high in the South East Asian ethnic Fig. 4 The graph shows the polygenic risk score calculation and the mean difference between SLE and healthy controls (a). The circular plot showed loci which identified in this study at individual chromosomes using package Rcircos [33] (b) included Thai [59]. As regards our SLE patients' demographic data, we found that the frequency of clinical phenotype was roughly similar to other ethnic [60]. The LN has the highest abundance found among Thai SLE patients. Thus, our results supported that genetic background was a pivotal factor driving a severe LN among Thai SLE patients. Taken together, these pieces of evidence could justify the link between genetic variants and clinical involvement in Thai SLE patients.
The study of known SNPs showed most of the polymorphisms resembled with previous reports in Thais, such as ARID5B, TNFSF4, BANK1, TNFAIP3, CXCR5 SLC15A, ITGAM, WDFY4, ETS1, and BLK [7][8][9][10][11]. It confirmed that our analysis processes were reliable. Noticeably, the allele frequency of ITGAM was higher among Thai SLE when compared to Chinese Hong Kong [9], but has no association with Japanese and Korean background [61]. Thus, this implies the specificity of these variants to the Thai SLE patients. Although we did not recognize polymorphisms on chromosome 11q23.3 (rs11603023 on PHLDB1 and rs638893 on DDX6), which has been identified in the Thais' SLE, our metaanalysis enhanced signal from rs10845606 on GPR19 allele which does not correlate with Thai SLE patients previously [8].
It is noteworthy that meta-analysis in the Thai population discovered novel SLE susceptible variants on FBN2. The FBN2 allele is located on a chromosome 5 encoded protein called fibrillin-2 [62]. Fibrillins-2 is one of the glycoprotein components incorporated extracellularly on microfibrils and is essential in bone, muscle, and extracellular matrix formation [63]. It is well known that mutation of FBN2 leads to dominant heritable connective tissue disorders [64]. Importantly, a recent review article has gained insight on fibrillin-2 as a critical mediator that binds to transforming growth factor-beta (TGF-β) during extracellular matrix formation [65]. The TLR9/ TGF-β1/PDGF-β pathway was excessively activated in peripheral mononuclear cells isolated from LN patients [66]. Besides, the upregulation of FBN2 correlated with fibrosis prevalence in the spontaneous LN developed mouse model (SWR X NZB1 F1) [67]. Although the function of FBN2 in SLE is unclear, collective evidence led us to hypothesize that this variant might drive either fibrosis-associated inflammation or inflammatory induction during disease pathogenesis. Due to whole-genome sequencing data in the Thai population is lacking, further study using FBN2 target sequencing, whole-genome sequencing, and variant functional characterization in a large cohort is needed. This knowledge could be useful to identify rare coding variants and genetic propensity eliciting SLE pathogenesis in Thais.
Note that some of the variants were previously characterized in other autoimmune diseases, including rheumatoid arthritis and primary Sjögren syndrome (pSS). It, therefore, indicates the sharing of underlying genetic factors between autoimmune disease. However, predisposing factors which could affect clinical manifestation driving different autoimmune disease outcome has not been elucidated yet. Recently, the GRS (genetic risk score) has been widely adopted to predict disease outcomes from genetic variants [68]. The previous studies in SLE showed that overall mortality was higher in the striking GRS SLE patients; also, the high cumulative genetic risk could predict the specific organ damages such as proliferative nephritis and cardiovascular disease [69]. Our study showed a high sensitivity for using polygenic risk scored as a marker for SLE disease development in the Thai population. It is exciting for further study to calculate the genetic risk score and specific clinical manifestation among Thai SLE patients.

Conclusions
In conclusion, our study reported susceptible loci of SLE patients in Thai ancestry, which were variants on the HLA class II allele, STAT4, GTF2I, and BLK. Additionally, we confirmed those variants which had been reported previously in the Thai populations, which were ARID5B, TNFSF4, BANK1, TNFAIP3, CXCR5 SLC15A, ITGAM, WDFY4, and ETS1. Interestingly, we identified novel variants associated with the Thai SLE patients, which were on the FNB2 allele. Summary loci associated with the Thai SLE were seen in Fig. 4b. Functional annotation analysis highlighted extracellular matrix organization pathways specific to the Thai population. The PRS using GWAS data is useful for SLE prediction with sensitivity and specificity of more than 70%. Further whole-genome sequencing study with a large sample size might help to validate our results. Finally, our finding provides the necessary genetic background susceptible to SLE disease, expanding the number of molecular targets for treatment options.