Genome-wide association study meta-analysis identifies five new loci for systemic lupus erythematosus

Background Systemic lupus erythematosus (SLE) is a common systemic autoimmune disease with a complex genetic inheritance. Genome-wide association studies (GWAS) have significantly increased the number of significant loci associated with SLE risk. To date, however, established loci account for less than 30% of the disease heritability and additional risk variants have yet to be identified. Here we performed a GWAS followed by a meta-analysis to identify new genome-wide significant loci for SLE. Methods We genotyped a cohort of 907 patients with SLE (cases) and 1524 healthy controls from Spain and performed imputation using the 1000 Genomes reference data. We tested for association using logistic regression with correction for the principal components of variation. Meta-analysis of the association results was subsequently performed on 7,110,321 variants using genetic data from a large cohort of 4036 patients with SLE and 6959 controls of Northern European ancestry. Genetic association was also tested at the pathway level after removing the effect of known risk loci using PASCAL software. Results We identified five new loci associated with SLE at the genome-wide level of significance (p < 5 × 10− 8): GRB2, SMYD3, ST8SIA4, LAT2 and ARHGAP27. Pathway analysis revealed several biological processes significantly associated with SLE risk: B cell receptor signaling (p = 5.28 × 10− 6), CTLA4 co-stimulation during T cell activation (p = 3.06 × 10− 5), interleukin-4 signaling (p = 3.97 × 10− 5) and cell surface interactions at the vascular wall (p = 4.63 × 10− 5). Conclusions Our results identify five novel loci for SLE susceptibility, and biologic pathways associated via multiple low-effect-size loci. Electronic supplementary material The online version of this article (10.1186/s13075-018-1604-1) contains supplementary material, which is available to authorized users.


Background
Systemic lupus erythematosus (SLE [MIM: 152700]) is a common systemic autoimmune disease characterized by the production of autoantibodies and a complex genetic inheritance. The prevalence of the disease varies according to the population ancestry, with European populations ranging between 30 and 90 cases per 100,000 individuals [1]. SLE afflicts women at a rate nine times higher than men, and most often appears during childbearing ages. Concordance rate studies in monozygotic and dizygotic twins and recurrence risk estimates in siblings of probands (λ s ), have clearly shown the importance of genetic factors in the development of the disease [2].
Despite the evidence for a strong genetic contribution, until recently, very few loci were convincingly associated with SLE risk [3]. With the concurrent identification of common genome variation and the development of genome-wide genotyping technologies, genome-wide association studies (GWAS) have dramatically changed the ability to identify risk variants. In SLE, GWAS have allowed the identification of more than 50 risk loci at a genome-wide significance level (p value <5 × 10 − 8 ) [4][5][6]. These findings are of great relevance since they pinpoint specific biological mechanisms that are relevant for the disease and that otherwise would not have been prioritized for research [7]. In a severe disease like SLE that is lacking efficacious treatments, genetic studies provide a unique way to expand the number of molecular targets for drug discovery [8].
To date, the explanation for the inherited risk of SLE is largely unresolved. Including all known risk variants, less than 30% of disease heritability is currently accounted for [9], In order to identify additional risk variants, GWAS meta-analyses from different countries have proven to be a highly successful approach [9]. Currently, most Southern European populations have been underrepresented in GWAS of SLE. In Spain, epidemiological studies have shown that there is an increased prevalence of the disease compared to other European regions [1]. Consequently, the analysis of the genetic variation in this population could be highly useful to identify new genetic variation for SLE risk.
Biological pathways integrate the function of multiple genes and, therefore, provide a higher level of detection of the relevant genetic risk [10,11]. To date, different statistical methods have been developed that exploit the biological knowledge in order to leverage the power of GWAS. These analysis methods are designed to aggregate the genetic evidence from multiple risk loci into a single association statistic. The use of cumulative evidence can be a powerful way to detect genetic associations and biological mechanisms that otherwise would have been missed due to low effect size at the single-marker level. Using this complementary GWAS approach, relevant biological insights have been gained in different complex diseases, including autoimmune diseases [12].
The aim of the current work was to identify new genetic risk loci for SLE by a GWAS meta-analysis using a case-control cohort from a previously untargeted population. After excluding known risk genes, pathway meta-analysis was also performed to identify biologic pathways for SLE risk associated by risk loci as yet unaccounted for.

Study cohorts
Patients and controls from the Spanish population were recruited through the Immune-Mediated Inflammatory Disease (IMID) Consortium [13]. Patients with SLE were recruited via the rheumatology departments of 17 university hospitals in Spain. All included patients fulfilled the 1982 revised American College for Rheumatology diagnosis criteria for SLE [14]. All patients were > 16 years old at the time of recruitment, although disease could have started earlier. A minimal disease evolution period of 3 years since diagnosis was also required for inclusion in this study. All Patients with SLE were Caucasian with all four grandparents born in Spain. Patients with an additional rheumatologic disease (e.g. rheumatoid arthritis, systemic sclerosis or mixed connective tissue disease) except antiphospholipid syndrome or Sjögren's syndrome were excluded from the study. Also, patients with concomitant psoriasis or inflammatory disease (Crohn's disease or ulcerative colitis) were also excluded from the study. A total of 907 patients with SLE were recruited for the GWAS. Additional file 1: Table S1 summarizes the main features of the Spanish GWAS cohort.
Healthy control individuals were also recruited through the IMID Consortium as described previously [15]. All controls were Caucasian and > 18 years old at the time of recruitment. Individuals with one or more grandparents born outside of Spain were excluded. Controls with an autoimmune disease or with a family history of autoimmune disease were also excluded from this cohort. A total of 1524 healthy control individuals were finally included in the present GWAS. All the procedures were followed in compliance with the principles of the Declaration of Helsinki and informed consent was obtained from all participants. The study and the consent procedure were reviewed and approved by the local institutional review boards.
GWAS data from European-ancestry cohorts were obtained from a recent meta-analysis [4]. For the present study, GWAS association data were obtained from 4036 patients with SLE and 6959 controls of Caucasian European ancestry. The details on the data quality control, imputation and statistical association analyses have been previously described [4]. Association data on a total of 37,577,690 markers from the 22 autosomal chromosomes were available for meta-analysis.

Genotyping, quality control and imputation
In the Spain cohort, genome-wide genotyping was performed using the Illumina Quad610 Beadchips (Illumina, San Diego, CA, USA) at the National Genotyping Center (CeGen, Madrid, Spain). This array genotyping platform includes information on > 550,000 single nucleotide polymorphisms (SNPs). A whole blood sample (5 mL) was collected from all patients and genomic DNA extracted using the Chemagic Magnetic Separation Module I (PerkinElmer, Waltham, MA, USA). Genotyping was performed following the protocol recommended by Illumina. Genotype calling was performed using the GenomeStudio data analysis software v2011.1 (Illumina, San Diego, CA, USA). Genotyping quality control was performed using PLINK genomic analysis software [16]. Principal components of variation were estimated using Eigensoft (v4.2) software [17]. The genomic inflation factor was λ GC = 1.16 in the European-ancestry GWAS, and λ GC = 1.06 in the Spain GWAS (Additional file 1: Figure S2). After quality control analysis, 864 patients with SLE and 1513 controls were available for imputation.
Genome-wide imputation was performed using GUID-ANCE, an integrated framework for haplotype phasing and genotype imputation of genotypes [18]. Markers and samples were first tested for quality control. SNPs with a genotyping call rate < 95% or a significant deviation from Hardy-Weinberg equilibrium in controls (p value ≤1 × 10 − 6 ) were removed. Individuals with a genotype call rate < 95% or outlier genetic background (i.e. > 6 SD in any of the 10 principal components of variation), were also excluded. After quality control, pre-phasing of genotypes was performed using SHA-PEIT2 [19] and genotype imputation using IMPUTE2. The 1000G Phase1 integrated haplotypes was used as the reference panel [20,21].
A total of 30,038,143 markers were finally imputed from the Spain GWAS cohort. From these, after filtering for high imputation quality (info score > 0.8, n = 9,168,673) and minor allele frequency (MAF) > 1%), 7,195,283 markers were available for GWAS. Association testing was performed using the logistic regression model implemented in SNPTEST v2 software adjusting for the first two principal components of variation [22].
Meta-analysis of the common markers between the two GWAS datasets was performed using METAL [23]. In this approach, z values are computed to summarize both the direction of effect and the significance level for each genetic marker. These z values are then combined in a weighted sum that incorporates the sample size of each cohort. The complete results from the Spain GWAS and from the GWAS meta-analysis are available for download at http://urr.cat/data/GWAS_SLE_summary-Stats.zip. Association plots for each of the associated loci were prepared using LocusZoom (http://locuszoom.org/).

Genome-wide pathway analysis
Several approaches are actually available to perform pathway-based GWAS. However, most of these methods do not account for the linkage disequilibrium (LD) structure in the genome. The variable structure of LD, particularly of highly correlated chromosomal regions containing multiple genes, can negatively impact the results from genome-wide pathway analysis [24]. In order to integrate this information into the pathway association analysis, we used the method implemented in PASCAL [25]. In this approach, genetic markers are first mapped to genes in each pathway (here, all markers inside the gene ± 20 flanking kb). Correlated markers are then identified using the LD structure estimated from a reference population (in this study, from the Caucasian European population from the 1000 Genomes Project (1KG)). Combining the single-marker association values with the LD structure, association scores are then computed for all genes in the pathway. In those cases where genes from the same pathway are located close in a chromosome and in strong LD, a joint score is calculated. Finally, the scores of all genes within a pathway are normalized, transformed and integrated to generate a single association statistic that can be used to determine the statistical significance of the association between the pathway and the trait of interest. In this study, the default parameter values were used, including maximum number of SNPs per gene (n = 3000). The SNP p value to gene score estimation was performed using the sum gene score approach, and gene score transforming into the pathway score was performed using the chi-squared approach. The pathway analysis method implemented in PASCAL has been shown to perform better than other methods, particularly since it has better control of type I error.
Pathways and their corresponding gene annotation was obtained from the MSigDB molecular signatures repository (http://software.broadinstitute.org/gsea/msigdb). A total of 1077 biological pathways from the Reactome (n = 674), Kyoto Expression of Genes and Genomes (n = 186) and BioCarta (n = 217) databases were selected. The association p values obtained using PASCAL in the two GWAS cohorts were combined using Fisher's method, and the significance was corrected for multiple testing using Bonferroni's adjustment.
In order to capture biologic pathways associated with SLE through as yet unaccounted for genetic risk variants, all regions previously associated with SLE risk were removed from this analysis. For this objective, we filtered out all markers within ± 250 kb distance from an established SLE risk SNP and with an LD r 2 > 0.2. Given the strong association between the HLA region and SLE risk, we excluded this region from the analysis (chr6: (bp 28,500,000-33,500,000).

GWAS meta-analysis
After quality control, a total of 7,195,283 autosomal markers with MAF > 0.01 were available for association testing in the Spain case-control cohort. From these, 7,110,321 variants were also present in the European ancestry GWAS: 51 of the 52 previously known SLE risk loci were in the same effect-size direction as originally described. From these, 31 had nominal evidence of replication (p < 0.05, Additional file 1 Meta-analysis of the two GWAS cohorts identified five new risk loci for SLE (Fig. 1, Table 1). None of the new genome-wide significant loci showed evidence of heterogeneity of effect between the two cohorts (p > 0.05). Three of the associated markers are SNPs in introns of the genes encoding for growth factor receptor bound protein 2 (GRB2, rs36023980), SET and MYND domain containing 3 protein (SMYD3, rs1780813), and ST8 alpha-N-acetyl-neuraminide alpha-2,8-sialyltransferase 4 (ST8SIA4, rs55849330). Associated SNPs rs150518861 and rs114038709 are located in the flanking regions of linker for activation of T cells family member 2 (LAT2) and Rho GTPase activating protein 2 (ARHGAP27) genes, respectively. Figure 2 shows the detailed association results for each of the five new SLE risk loci.

Genetic pathway association study in SLE
After excluding the association signal from known risk loci, pathway analysis identified 100 and 157 pathways associated with SLE in the Spain and European-ancestry cohorts, respectively, at the nominal level (p < 0.05). From these, 30 pathways (3% of total) were found to be associated in both cohorts, which is more than would be expected by chance (p < 5 × 10 − 4 ). After adjustment for multiple testing, four biologic pathways were significantly associated with SLE risk ( Table 2).

Conditional and sex-specific association
In order to explore the presence of secondary signals at each associated locus, we performed conditional analysis in the Spain cohort. From all five loci, we only identified one independent signal within 1 Mb of the most strongly associated SNP that continued to show evidence of association (conditional p < 1 × 10 − 4 ). This independent association was identified at the GRB2 locus and maps to RNF157 gene (SNP rs9891273, p = 4.99 × 10 − 5 , Additional file 1: Figure S3).
The presence of sex-specific associations was tested by comparing the coefficients for SNP association estimated independently in women and men. We identified a significant difference only for LAT2 SNP rs150518861 (p = 0.032). This risk variant was found to be more strongly associated in the male cohort compared to the female cohort.

Discussion
In the present study we have identified five new risk loci for systemic lupus erythematosus. Performing a metaanalysis on 4943 patients with SLE and 8483 controls from different European ancestries, we have identified variants at GRB2, SMYD3, ST8SIA4, LAT2, and ARH-GAP27 loci associated with SLE susceptibility. At the pathway level, we have also found four biological pathways associated with SLE risk independently of previously known risk genes.
In the present meta-analysis, we found an association between an intronic SNP in the gene encoding for the growth factor receptor-bound protein GRB2 and SLE (rs36023980, p = 4.7 × 10 − 9 ). Analysis of the tissuespecific epigenetic data from the NIH Roadmap Epigenomics Project [26] for rs36023980 SNP showed a strong regulatory activity in different immune cells, including enhancer evidence in both T and B lymphocytes (Additional file 1: Table S2). GRB2 encodes for a receptor tyrosine-kinase (RTK) adaptor protein composed of a single SH2 domain and two SH3 domains [27]. SLE is a disease characterized by the activation of B cells that recognize self-antigens via their B cell receptors (BCR). In B cells, GRB2 functions as an expression adaptor molecule, attenuating the signals that are transduced by the BCR [28]. Together with Dok-3 and SHIP1, GRB2 forms a trimer protein complex that binds directly to the BCR and prevents downstream signaling by inhibiting PI3K signaling [29]. Gene expression at different stages of B cell differentiation shows that GRB2 expression increases in more mature forms, particularly on antigen-experienced memory B cells (Additional file 1: Figure S4). Inadequate control of memory B cell differentiation into plasma cells has been proposed as a trigger for autoimmunity in SLE [30]. Our results therefore are in line with the relevance of this causal disease mechanism.
In a close functional relationship with GRB2, we also found a significant association between linker for activation of T cells family member 2 gene (LAT2) locus and SLE (rs150518861, p = 4.1 × 10 − 8 ). LAT2 encodes for an adaptor molecule that binds GRB2 and, therefore, is also involved in BCR signaling [31]. The association at the genetic level between SLE and two directly interacting proteins strongly supports the implication of this biological mechanism in SLE risk. B cell dysfunction is a hallmark of SLE pathology [9], and our study supports downstream regulation after antigen binding as a crucial event in the disease etiology. In the evaluation of sexspecific effects, we found this locus to be differentially associated with SLE risk. The risk variant was associated with SLE in men (p = 0.0074, β (95% CI) = 1.3 (0.25-2.2)), and it was non-significant in women (p = 0.58, β (95% CI) = 0.13 (− 0.33 to 0.62)). Previous studies have shown that men require a higher genetic load to develop the disease [32]. If replicated in an independent cohort, this result would be in line with these findings, confirming the importance of sex in mediating the effect of some genetic risk factors in SLE. SMYD3 encodes for an H3-Hk histone methyltransferase that has been associated with increased cell proliferation in cancer [33]. Altered epigenetic patterns have been strongly associated with SLE, mostly at the DNA level [34]. More recently, however, methylated histones have also been identified as targets of autoantibodies expressed in patients with SLE [35]. Similar to other frequent nuclear autoantigens in SLE, like double-stranded DNA or ribonucleoproteins, methylated H3-Hk histones are able to trigger autoreactive B cells after antigenic-exposure processes like apoptosis. According to the Roadmap Epigenomics Project data, the associated SNP rs1780813 lies in a site that is DNAse hypersensitive for > 30 different tissues, supporting its role in gene regulation.
To date little is known about the functional role of SLE-associated genes STS8IA4 and ARHGAP27. In order to infer the potential biological role of these two genes,  we used the GeneNetwork approach, a functional-inference method based on the gene co-expression patterns extracted from microarray data from > 80,000 samples [36]. With this approach, we found strong evidence that STS8IA4 is involved in T cell activation (p value = 2.7 × 10 − 13 , Additional file 2: Table S4), and that ARHGAP27 is implicated in mitogen-activated protein kinase (MAPKinase) signaling (p value = 3.33 × 10 − 8 , Additional file 2: Table S5). Both biological processes have been previously associated with SLE etiology, and our results not only support their involvement in disease risk but also suggest new gene functions. Furthermore, expression   [37] shows a strong association between variation at rs114038709 and ARHGAP27 expression (p = 4.1 × 10 − 134 ), and the most significant eQTL evidence for rs55849330 is associated to STS8IA4 expression in immortalized B cells [38] (p = 5.6 × 10 − 10 ). Using a pathway-based analysis we have identified four biological pathways associated with SLE. Since the objective was to identify new genetic risk variation for SLE, our approach excluded all association signals from previously known SLE genes. We showed that by using biological pathway knowledge, it is still possible to capture genetic variation that is relevant for the disease. One limitation of this approach is that it relies on the specific knowledge of gene functions and pathway definitions, which is still relatively low for a substantial fraction of the genome [39]. Another limitation is that pathway association is performed on variants within or close to genes. Distant cis regulation and also trans regulation variants are also plausible mechanisms of action [40]. With better knowledge of regulatory effects, particularly on isolated cell types, pathway-based analysis will become an even more powerful approach to detect the missing disease heritability. Despite these limitations, our results are robust since they are based on strongly supported biological knowledge. Also, we provide statistical evidence of pathway association from two independent GWAS cohorts which, to our knowledge, has not been previously performed in SLE.
The BCR signaling pathway had the strongest association with SLE. This result is in agreement with the results found at the single-marker level, where variants at BCR signaling genes GRB2 and LAT2 were found to be associated with disease susceptibility. Within the BCR signaling pathway, however, there are multiple other single-marker hits in other genes indicating nominally significant association with disease susceptibility in both cohorts. Given that they belong to an associated biological pathway, these signals are strongly suggestive risk variants for SLE (Table 3). Of relevance, several of the proteins encoded by the genes in this pathway, like BTK or CTLA4, are currently being evaluated as therapeutic targets for SLE [41,42]. Finding an efficacious treatment in SLE has proven extremely difficult and our results support the importance of targeting this pathway. Genetic evidence, either direct or through associated gene networks, has been shown to improve drug efficacy prediction [43]. Based on the association signals found in the two cohorts, for example, the proteins encoded by LYN (p = 1.17 × 10 − 6 ) and NFATC1 (p = 5.26 × 10 − 6 ) could also be considered as new drug targets for SLE.
Two other associated pathways -the CTLA4 costimulatory signal and IL4 pathways -are strongly related to B cell activation. CTLA4 is a co-inhibitory molecule expressed on activated helper T -TH2 and follicular -cells. Inhibition of CTLA4 increases B cell activation after antigen binding, resulting in the production of antibodies [44]. IL-4 is a cytokine that is also expressed in helper T cells and it is essential in the activation of antigen-bound naïve B cells. Similar to the BCR signaling pathway, these two genetically associated biological processes that are deeply related to B cell activation could be the source of new effective drug targets for the disease [45]. In this regard, a fusion protein including the extracellular domain of CTLA4 (abatacept) is being currently evaluated as a therapy for more severe forms of SLE [46].

Conclusions
In the present study we have performed a GWAS metaanalysis approach to identify new genetic variation in SLE. We have found five new genome-wide significant risk loci and four biologic pathways associated with SLE risk. Single-marker associations involve BCR downstream signaling mechanisms with disease susceptibility, and autoantigen generation and immune cell activity regulation. Pathway-based analysis confirmed the relevance of BCR signaling pathway and other B cell activation mechanisms in the disease etiology. The results from this study significantly expand the knowledge of the biological processes implicated in susceptibility to SLE.

Additional files
Additional file 1: Table S1. Epidemiological features from the Spain GWAS cohort. Figure S1. Principal component analysis of the Spain GWAS cohort. Figure S2. Quantile-quantile(Q-Q) plots of observed and expected -log10(p values) of association between SNP genotype and SLE risk. Table S2. Epigenetic regulatory data associated with GRB2 risk locus. Figure S3. Regional association plot for the association with SLE risk independent of GRB2 SNP rs36023980. Figure S4. GRB2 gene expression during human B cell differentiation. (DOCX 627 kb) Additional file 2: Table S3. Association results for the 52 previously known SLE risk loci in the Spain GWAS. Table S4. Pathway association results after combining the two SLE cohorts (combined raw p value <0.05). Table S5. List of biological pathways significantly associated with ST8SIA4 gene network. This work has been also sponsored by the grant SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government. This work was supported by an EFSD/Lilly research fellowship. Josep M. Mercader was supported by Sara Borrell Fellowship from the Instituto Carlos III. Sílvia Bonàs was awarded an FI-DGR Fellowship from FI-DGR 2013 from Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR, Generalitat de Catalunya). The study sponsors had no role in the design of the study, collection, analysis or interpretation of the data.

Availability of data and materials
The complete results from the Spain GWAS and from the GWAS meta-analysis are available for download at http://urr.cat/data/GWAS_SLE_summaryStats.zip. Ethics approval and consent to participate All the procedures were followed in compliance with the principles of the Declaration of Helsinki and informed consent was obtained from all participants. The study and the consent procedure were reviewed and approved by the local institutional review boards.

Competing interests
The authors declare that they have no competing interests.

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author details