An Immunochip-based interrogation of scleroderma susceptibility variants identifies a novel association at DNASE1L3

Introduction The aim of the study was to interrogate the genetic architecture and autoimmune pleiotropy of scleroderma susceptibility in the Australian population. Methods We genotyped individuals from a well-characterized cohort of Australian scleroderma patients with the Immunochip, a custom array enriched for single nucleotide polymorphisms (SNPs) at immune loci. Controls were taken from the 1958 British Birth Cohort. After data cleaning and adjusting for population stratification the final dataset consisted of 486 cases, 4,458 controls and 146,525 SNPs. Association analyses were conducted using logistic regression in PLINK. A replication study was performed using 833 cases and 1,938 controls. Results A total of eight loci with suggestive association (P <10-4.5) were identified, of which five showed significant association in the replication cohort (HLA-DRB1, DNASE1L3, STAT4, TNP03-IRF5 and VCAM1). The most notable findings were at the DNASE1L3 locus, previously associated with systemic lupus erythematosus, and VCAM1, a locus not previously associated with human disease. This study identified a likely functional variant influencing scleroderma susceptibility at the DNASE1L3 locus; a missense polymorphism rs35677470 in DNASE1L3, with an odds ratio of 2.35 (P = 2.3 × 10−10) in anti-centromere antibody (ACA) positive cases. Conclusions This pilot study has confirmed previously reported scleroderma associations, revealed further genetic overlap between scleroderma and systemic lupus erythematosus, and identified a putative novel scleroderma susceptibility locus. Electronic supplementary material The online version of this article (doi:10.1186/s13075-014-0438-8) contains supplementary material, which is available to authorized users.


Introduction
Systemic sclerosis (SSc) or scleroderma is a multisystem, autoimmune disorder characterised by progressive vascular, inflammatory and fibrotic dysfunction. Skin and visceral complications of cardiac, pulmonary, gastrointestinal, muscle and renal disease can have devastating effects on quality of life and life expectancy [1].
Scleroderma has a well-established genetic component [2][3][4]. Most of the identified SSc susceptibility loci overlap with those of other autoimmune diseases, in particular the rheumatic disorders such as rheumatoid arthritis and systemic lupus erythematosus (SLE) [5]. For example, Carmona et al. recently confirmed an association between SSc and the SLE risk haplotype at the IRF5 locus [6] and Martin et al. recently performed a pan-meta-analysis of SSc and SLE to look at the susceptibility overlap between the two diseases [7]. Using 6,835 cases and 14,274 controls they identified a novel pleiotropic locus at KIAA0319L on chromosome 1 and identified two SLE loci (near PXK and JAZF1) that also contribute to SSc [7].
To identify further susceptibility loci and to explore the genetic overlap with other antibody-mediated immune diseases, we undertook an SSc association study using the Immunochip, a custom array including SNPs of interest in a wide variety of autoimmune disorders [8]. A replication study was then performed using previously published case data from a genome-wide association study (GWAS) of scleroderma [9], and control data from a GWAS of bone density variation [10].

Samples
We selected 532 cases for genotyping from the Australian Scleroderma Cohort Study (ASCS) [11]; a prospective study of risk factors for clinically important outcomes in SSc. They fulfilled either the American College of Rheumatology (ACR) criteria for classification of SSc [12] or the Medsger criteria for limited SSc (to enable broad representation of the disease spectrum) [13].  [14]. Genotypes were called using the Illumina GenTrain clustering algorithm. Cases and controls were clustered separately.

Statistical analyses
Genotype data were analysed with PLINK [16] and R [17]. There were no duplicate or closely related cases. Case (n = 2) and control (n = 3) samples with call rates less than 90% were excluded. SNPs were excluded based on Hardy-Weinberg disequilibrium (P <10 −6 ), call rates less than 90%, fewer than two occurrences of the minor allele, and significantly different rates of missingness (P <10 −4 ) between cases and controls. Eigenstrat [18] was run on a pruned SNP set with default settings to exclude population ancestry outliers and ensure cases and controls were ethnically matched. Subjects lying more than six standard deviations from the mean of any principal component were excluded (Immunochip set 44 cases and 76 controls excluded; replication set 129 cases and 31 controls excluded).
Logistic regression was used for all association analyses using principal components derived from the Eigenstrat analysis as covariates to control for population stratification. A single principal component was used for the Immunochip analysis and two principal components for the replication dataset; including further principal components for either set did not reduce the genomic inflation factor further. A negative control set of 2,805 SNPs outside the major histocompatibility complex (MHC), associated with reading and learning, schizophrenia and psychosis [19], was used to estimate the genomic inflation factor and calculate adjusted P values. Genotype intensity cluster plots were manually examined for all suggestive associations with unadjusted P values less than 10 -4.5 , a threshold used in previous GWAS [21]. To test for secondary association signals at each locus, genotypes at the most significant variant were added to the logistic regression model as a covariate, and all other variants at the locus were tested. To correct secondary analyses for multiple testing, we estimated the effective number of independent tests at loci using the eigenvalues of the matrix of correlations between SNPs [22], as implemented in SNPSpD [23].
For variants associated with SSc at the nominally suggestive threshold (P <10 -4.5 ), we also tested for differences in allele frequencies between patients with different disease subtypes and MHC genotypes.
For replication, cases and controls genotypes were imputed as implemented in IMPUTE2 [24] with the use of the merged 1000 Genomes and UK10K reference dataset. All SNPs we were attempting to replicate had an info score of >0.7. Meta-analysis was performed using METAL weighted by inverse variance [25]. Power was calculated using the Genetic Power Calculator [26].

Results
After sample and SNP exclusions, in the Immunochip dataset genotypes were analysed at 145,921 autosomal SNPs in 486 cases and 4,458 controls, with a genomic inflation factor of 1.02 (Q-Q plot, Figure S1 in Additional file 1). Eighty-six percent of the cases were female, mean age was 60 years and mean disease duration 14 years. Twenty-five percent of cases had diffuse disease, 72% limited pattern, and 3% were intermediate; 43% were anticentromere antibody (ACA) positive and 15% anti-Scl-70 antibody positive. Considering the replication set, 700 scleroderma cases and 1,899 controls remained after quality control. Of the replication cases, 36% had diffuse disease, 64% limited pattern; 32% were ACA positive. The genomic inflation factor for the replication set was 1.045. The Immunochip study has 80% power to detect associations at P <10 -4.5 for a variant with minor allele frequency (MAF) of 0.3 with D' = 0.8 with a SNP with heterozygote odds ratio (OR) of 1.65.
We detected suggestive associations in the Immunochip data (uncorrected P <10 -4.5 ) at eight loci ( Table 1), five of which showed association in the replication cohort (HLA-DRB1, DNASE1L3, STAT4, TNPO3-IRF5, and VCAM1) ( Table 1, Manhattan plot Figure 1). There was also evidence of association at the previously reported genomewide significant CD247 SNP [9] (rs2056626, OR = 0.76, P = 1.1 × 10 −4 ), but no evidence of association at the genome-wide significant TNIP1 locus [27] (rs2233287, P = 0.94). The overlap between previously reported genome-wide significant SSc loci (outside the MHC) and our data, at an unadjusted P <0.01, is shown in Table 2. Not all previously associated SSc loci could be investigated owing to the limitations of markers available on the array.
The strongest SNP association (Table 1) was with rs2857130 in the MHC. Testing imputed MHC alleles for association, the strongest signal was for DRB1*11:04 (Table 3). There was also a protective association with DRB1*07:01, but no other alleles or SNPs showed evidence of association after conditioning on these two. In particular, there was little evidence of association with the rare DRB1*11:03 allele, which only differs from DRB1*11:04 at one site encoding part of a hypervariable, peptide-binding region (R71E; OR 1.46, P =0. 35), and a model with DRB1*11:03 and DRB1*11:04 dosage combined does not fit as well as a model with DRB1*11:04 dosage alone.
Consistent with previous findings, DRB1*11:04 is a particularly strong risk factor for Scl70-positive SSc [28] ( Table 3). However, compared to controls the frequency of this allele is also elevated in ACA-positive cases and in cases negative for both antibodies. The protective effect of DRB1*07:01 is strongest against ACA-positive disease, and there was no evidence that this allele protects against anti-Scl70 antibody-positive disease.
Association was observed and replicated at chromosome 3p14, spanning DNASE1L3 to AXOX2 (including PXK; Figure 2), a region that has previously been associated with both SLE [29] and SSc [7]. While the peak association with SLE was originally identified at an intronic SNP rs6445975 in PXK (a PX domain containing serine/threonine kinase), the strongest association with SSc on the Immunochip was at a missense SNP rs35677470 (R206C) in DNASE1L3 (deoxyribonuclease I-like 3); 187 kb distal of rs6445975. Linkage disequilibrium between the two SNPs is modest (r 2 = 0.13), and there is weak evidence of a secondary association with rs6445975 in our data after conditioning on rs35677470 (OR = 1.19, P = 0.03). No other associations at this locus are significant after correction for multiple testing. The association with rs35677470 is confined to ACA-positive cases (estimated OR 2.36, P = 2.3 × 10 −10 ), with no association in ACAnegative cases (P = 0.76). rs35677470 also showed association in the replication set (P =0.027), and in the combined analysis (P = 1.2 × 10 −6 ). As in the Immunochip analysis, in the replication set association was much stronger in the ACA-positive group (P = 3.0 × 10 −4 ), and overall (P = 8.71 × 10 −13 ). Association was particularly significant in limited scleroderma (P = 3.36 × 10 −9 in overall dataset) compared with diffuse scleroderma (P = 0.57), consistent with the association of ACA antibody status with limited disease. The non-synonymous DNASE1L3 variant is predicted to be deleterious to the protein product using in silico functional prediction tools including both SIFT [30] and PolyPhen [31]. Apart from the DNASE1L3 locus, no other associations showed evidence of heterogeneity by antibody status, or between limited and diffuse SSc.

Discussion
The major novel finding of this study is the significant association of a functional SNP (rs35677470) in DNASE1L3 Figure 1 Manhattan Plot for association study finding from Immunochip genotyped cases and controls. with ACA-positive SSc (P = 2.3 × 10 −10 ). This locus has been previously reported in a SLE GWAS [29] and metaanalysis [32], however the peak associations were reported for SNPs in the nearby gene PXK. While the PXK and DNASE1L3 associations may be independent, no SNPs on the Illumina HumanHap300 chip used in the SLE GWAS are good tags for the missense SNP in DNASE1L3 (maximum r 2 of 0.21). More recently, a functional variant in DNASE1L3 was implicated in a familial form of SLE [33], and a pan-meta-analysis of SLE and SSc also confirmed the locus near PXK, but in particular for ACA-positive SSc (rs2176082 [7]; P = 1.4 × 10 −4 in our data) strengthening the evidence that this locus plays a role in both diseases. These findings were independently identified in a study published whilst the current manuscript was in review [34]. The associated missense DNASE1L3 variant in our data, (rs35677470 encoding R206C) affects a highly conserved residue and there is very strong evidence that this results in loss of function of the protein [35]. DNASE1L3 encodes a member of the DNase family and functions as an endonuclease capable of cleaving DNA, mediating the breakdown of DNA during apoptosis. Al-Mayouf et al. [33] hypothesised that, in the context of SLE, dysfunction of this gene may lead to impaired DNA breakdown and clearance from apoptotic cells, resulting in the formation of self-directed DNA-specific antibodies and immune complexes. Since the same kinds of DNA-driven immune complexes (such as anti-nuclear and ACA antibodies) are also characteristic of SSc, this hypothesis is also applicable.
Suggestive association was also observed between a novel SNP in VCAM1 and overall scleroderma and in limited disease cases, both of which were also associated in the replication dataset. VCAM1 has not previously been reported to be associated with scleroderma or SLE. VCAM-1 is a member of the Ig superfamily and encodes a cell surface sialoglycoprotein expressed by cytokine-activated endothelium. It mediates leukocyte-endothelial cell adhesion and   signal transduction. VCAM-1 levels have previously been shown to be elevated in early, inflammatory-phase scleroderma [36] and in limited scleroderma [37]. There were no stand-out functional variants at the STAT4 and IRF5/TNPO3 loci. These associations included many highly correlated SNPs, indicating that larger sample sizes and/or functional studies will be needed to understand and dissect these associations. It would, however, be prudent to investigate any functional variation at these loci identified in related autoimmune diseases, particularly SLE. The other novel associations are merely suggestive and require confirmation in additional datasets.

Conclusions
There is a significant association of a functional SNP in DNASE1L3 with anti-centromere antibody-positive SSc, previously reported in SLE. There is strong evidence for a loss of function of the protein. A novel association was also observed and replicated with an intergenic SNP in VCAM1.
This study serves to highlight that, even with a small but well-characterised disease cohort, significant associations can be obtained by tools such as the Immunochip, which are targeted towards analysis of disease-relevant and occasionally functional variation.