Genome-wide pathway analysis identifies VEGF pathway association with oral ulceration in systemic lupus erythematosus

Background Systemic lupus erythematosus (SLE) is a genetically complex rheumatic disease characterized by heterogeneous clinical manifestations of unknown etiology. Recent studies have suggested the existence of a genetic basis for SLE heterogeneity. The objective of the present study was to identify new genetic variation associated with the clinically relevant phenotypes in SLE. Methods A two-stage pathway-based approach was used to identify the genetic variation associated with the main clinical phenotypes in SLE. In the discovery stage, 482 SLE patients were genotyped using Illumina Human Quad610 microarrays. Association between 798 reference genetic pathways from the Molecular Signatures Database and 11 SLE phenotypes was tested using the set-based method implemented in PLINK software. Pathways significantly associated after multiple test correction were subsequently tested for replication in an independent cohort of 425 SLE patients. Using an in silico approach, we analyzed the functional effects of common SLE therapies on the replicated genetic pathways. The association of known SLE risk variants with the development of the clinical phenotypes was also analyzed. Results In the discovery stage, we found a significant association between the vascular endothelial growth factor (VEGF) pathway and oral ulceration (P value for false discovery rate (P FDR) < 0.05), and between the negative regulation signaling pathway of retinoic acid inducible gene-I/melanoma differentiation associated gene 5 and the production of antinuclear antibodies (P FDR < 0.05). In the replication stage, we validated the association between the VEGF pathway and oral ulceration. Therapies commonly used to treat mucocutaneous phenotypes in SLE were found to strongly influence VEGF pathway gene expression (P = 4.60e-4 to 5.38e-14). Analysis of known SLE risk loci identified a strong association between PTPN22 and the risk of hematologic disorder and with the development of antinuclear antibodies. Conclusions The present study has identified VEGF genetic pathway association with the risk of oral ulceration in SLE. New therapies targeting the VEGF pathway could be more effective in reducing the severity of this phenotype. These findings represent a first step towards the understanding of the genetic basis of phenotype heterogeneity in SLE. Electronic supplementary material The online version of this article (doi:10.1186/s13075-017-1345-6) contains supplementary material, which is available to authorized users.


Study patients
In the discovery stage, a total of 482 SLE patients were recruited. SLE patients were collected from the outpatient's clinics of the rheumatology departments of 15 Spanish University Hospitals belonging to the Immune-Mediated Inflammatory Disease (IMID) Consortium [1]. The IMID Consortium is a Spanish network of researchers working on the genetic basis of IMIDs. All SLE patients were diagnosed using the 1982 revised ACR diagnosis criteria [2]. All patients included in this study were >16 years old at the time of sample collection and had >3 years of evolution from the diagnosis date. SLE patients with psoriasis, inflammatory bowel disease (Crohn's disease or ulcerative colitis) or other rheumatic diseases like rheumatoid arthritis, or multiple sclerosis were excluded from the study. All SLE patients were Caucasian European with all four grandparents born in Spain.
In the validation stage, an independent cohort of 425 SLE patients was used to replicate the genetic pathways that were significantly associated with the SLE phenotypes in the discovery stage. All patients from the validation cohort fulfilled the ACR diagnostic criteria for SLE and were also collected from the IMID Consortium, following the same inclusion and exclusion criteria as for the discovery cohort.

DNA extraction and genome-wide genotyping in the discovery and validation patients
Whole blood samples (5 mL) were collected from all SLE patients from the discovery and replication cohorts. Genomic DNA was then isolated using the Chemagic Magnetic Separation Module I (PerkinElmer, Waltham, MA).
<0.05, >5% of missing data (85.01%) as well as those SNPs that were not in Hardy-Weinberg equilibrium (0.03% SNPs, P<1e-4) in a cohort of 1,558 controls from the same population [4]. All 1,347 SNPs from the two genetic pathways associated in the discovery stage passed the quality control. In order to estimate the principal components (PCs) of variation, we analyzed the genome-wide genetic variation using the EIGENSOFT (v4.2) software. Based on the first 10 PCs of variation over 10 iterations, we identified 4 samples showing an outlier genetic background that were subsequently discarded from the pathway-based analysis. A total of 394 SLE patients and all 1,347 SNPs from the two genetic pathways passed the quality control and were available for the pathway-based analysis of the validation stage.

Imputation of genetic variants associated with SLE risk
From the established autosomal SLE risk SNPs (N=41 SNPs), the genetic variants that were not directly genotyped by the GWAS Quad610 genotyping array (N=17 SNPs) were imputed. SHAPEIT V2-644 (Oxford, UK) software was used to pre-phase the haplotypes of the SNPs that passed the genotyping quality control analysis [6], and these haplotypes were subsequently analyzed using IMPUTE V2 (Oxford, UK) software [7] to impute the missing SLE risk SNPs. The reference genotyping data used for imputation was obtained from the European cohort (N=379 samples) of the latest release of the 1000 Genomes Project (phase 1, version 3) [8]. In the imputation quality control analysis, we excluded those SNPs that had an IMPUTE2 info quality metric < 0.8 (N=1 SNP). After the imputation quality control, a total of 40 SLE risk SNPs were finally available for the association analysis with SLE phenotypes in the discovery cohort. In the validation cohort, the same procedure was followed to obtain the genotypes of the risk variants to be tested for replication.

Pathway-based association analysis
The set-or pathway-based association analysis implemented in the PLINK software is composed by four steps. In the first step, the raw genotype data is used to compute the linkage disequilibrium (r 2 ) between pathway SNPs in order to identify truly independent SNPs (r 2 <0.2). In the second step, the association between each SNP and SLE clinical phenotype is tested using the allelic χ2 test. The independent SNPs that are nominally associated with the SLE phenotype are then selected for each genetic pathway. In the third step, the pathway statistic is computed as the average of the χ2 statistics of the selected SNPs. In the last step, the statistical significance of the pathway association with the SLE phenotype is computed using a permutation-based approach. In the permutation procedure, the SLE phenotype is randomly assigned to the patient cohort and the χ2 statistic is subsequently computed for each pathway. This analysis is then repeated multiple times (n=1,000,000 permutations). Finally, the empirical P-value of each pathway is computed as the proportion of permuted χ2 statistics that are higher than the observed χ2 statistic.

Differential expression analysis of VEGF pathway genes after topical immunotherapies
from non-cancer individuals or cellular cultures (5th November 2015). From these, we looked for tissue or cell cultures treated with any of the common steroid and non-steroid topical immunotherapies most widely used in SLE. The former group was composed by 6 drugs (betamethasone valerate, clobetasol propionate, flucocinolone acetonide, fluocinonide, hydrocortisone butirate and triamcinolone acetonide) [10] and the latter group included a total of 12 drug (ABT-281, anthralin, calcipotriol, diphencyprone, imiquimod, intralesional interferon, pimecrolimus, sirolimus, squartic acid dibutyl ester, tacrolimus, topical interferon and topical zinc) [11]. Given that sample size is one of the major concerns to confidently identify differentially expressed genes between two experimental groups [12], small datasets were excluded from the search. We found a total of three gene expression datasets generated after treatment with four common immunotherapies for cutaneous SLE: betamethasone valerate and pimecrolimus (GSE32473), diphencyprone (GSE52360) and imiquimod (GSE68182). For each gene expression dataset, we performed quality control analysis and subsequent normalization on the log2-scale using the quantile normalization method ( Figure S3) [13]. The differential expression analysis for the VEGF pathway genes between treated and nontreated samples was performed using Student's t-test. The statistical significance of the global perturbation of the VEGF pathway was assessed using the Binomial test. All analyses were performed using the R statistical software [14]. Tables   2.1 Table S1. Association of SLE risk genes with SLE clinical phenotypes in the discovery cohort.
Abbreviations: Gene, gene from the VEGF genetic pathway (only the most significant probe was considered for those genes mapped by more than one probe); P BETAMETHASONE VALERATE , P-value obtained from the differential expression analysis comparing samples before (N=10) and after (N=10) betamethasone valerate treatment; P DIPHENCYPRONE , Pvalue obtained from the differential expression analysis comparing samples before (N=22) and after (N=22) diphencyprone treatment; P IMIQUIMOD , P-value obtained from the differential expression analysis comparing samples before (N=6) and after (N=54) imiquimod treatment; P PIMECROLIMUS , P-value obtained from the differential expression analysis comparing samples before (N=10) and after (N=10) pimecrolimus treatment.