Identification of potential susceptibility genes in patients with primary Sjögren’s syndrome-associated pulmonary arterial hypertension through whole exome sequencing

Background Pulmonary arterial hypertension (PAH) is a rare complication of primary Sjögren’s syndrome (pSS). Several genes have proven to be associated with pSS and PAH. However, there is no study specifically addressing the genetic susceptibility in pSS combined with PAH. Methods Thirty-four unrelated patients with pSS-PAH were recruited from April 2019 to July 2021 at Peking Union Medical College Hospital. Demographic and clinical data were recorded in detail, and peripheral blood samples were collected for whole-exome sequencing (WES). Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed to predict the functional effect of mutant genes. Genetic variants identified by WES were confirmed by polymerase chain reaction (PCR)-Sanger sequencing. Results We totally identified 141 pathogenic variant loci of 129 genes in these 34 pSS-PAH patients, using WES analysis. Patients with a family history of rheumatic diseases are more likely to carry FLG mutations or carry gene variations related to the biosynthesis of the amino acids pathway (p < 0.05). According to Sanger sequencing confirmation and pathogenicity validation, we totally identified five candidate pathogenic variants including FLG c.12064A > T, BCR c.3275_3278dupCCGG, GIGYF2 c.3463C > A, ITK c.1741C > T, and SLC26A4 c.919-2A > G. Conclusion Our findings provide preliminary data of exome sequencing to identify susceptibility loci for pSS-PAH and enriched our understanding of the genetic etiology for pSS-PAH. The candidate pathogenic genes may be the potential genetic markers for early warning of this disease. Supplementary Information The online version contains supplementary material available at 10.1186/s13075-023-03171-y.


Introduction
Primary Sjögren's syndrome (pSS) is an autoimmune connective tissue disease (CTD) characterized by exocrine gland dysfunction, resulting predominately in dryness of the mouth and eyes [1].Pulmonary arterial hypertension (PAH) is a major cause of death in CTD patients, with a 5-year survival of 62.9% in China [2].CTD-associated PAH (CTD-PAH) is classified as group I pulmonary hypertension, which also includes idiopathic PAH (IPAH), heritable PAH (HPAH), PAH due to drugs or toxins, PAH associated with human immunodeficiency virus infection, portal hypertension, congenital heart diseases and schistosomiasis [3].The most common underlying diseases in Chinese patients with CTD-PAH were systemic lupus erythematosus (SLE), systemic sclerosis (SSc), and pSS [2].PAH is a rare and severe complication of pSS with poor prognosis [4] and the pathogenesis of pSS-associated PAH (pSS-PAH) is unclear yet.
The aim of this study was to explore the genetic susceptibility of pSS-PAH and to establish a preliminary understanding on the association between genotypes and clinical phenotypes.

Study population
A total of 34 pSS-PAH patients were recruited based on a clinical registry in Peking Union Medical College Hospital (PUMCH) between April 2019 and July 2021, a national referral center for CTD-PAH patients.All subjects satisfied the 2002 American-European Consensus Group classification criteria [13] and the 2016 American College of Rheumatology (ACR)/European League Against Rheumatism (EULAR) classification criteria for pSS [14].Diagnoses of PAH were based on right heart catheterization (RHC), defined as a mean pulmonary arterial pressure (mPAP) ≥ 25 mmHg at rest, a pulmonary artery wedge pressure (PAWP) ≤ 15 mmHg, and a pulmonary vascular resistance of ≥ 3 Wood units (WU) [15].The exclusion criteria included the presence of any other CTD, left heart disease, interstitial lung disease, and chronic thromboembolic disease confirmed by ventilation perfusion scintigraphy (V/Q) or computed tomographic pulmonary angiography (CTPA).Written informed consent was obtained from all subjects.This study was approved by the Institutional Review Board of PUMCH (JS-2038).

Data and sample collection
The demographic characteristics, medical history, physical examination findings, laboratory profiles, echocardiography results, RHC data, and treatment information were recorded.The evaluation of pSS was achieved through pSS disease damage index (SSDDI) [16].Peripheral blood samples were collected from all subjects.DNA was extracted from the peripheral blood by standard procedure based on sodium dodecyl sulfateproteinase K-phenol/chloroform extraction [17].

DNA sequencing
Genomic DNA from 34 patients underwent WES.Purified DNA was fragmented, end-repaired, A-tailed, and underwent adaptors ligation and DNA fragments enrichment.Next-generation sequencing was carried out on HiSeq 4000 System (Illumina).Sequencing analysis was performed in all patients using an in-house developed analytical pipeline [18].The sequencing reads were mapped to the GRCh37/hg19 human reference sequence using the Burrows-Wheeler Aligner (BWA)-MEM alignment algorithm.The BAM files were manipulated by Picard.HaplotypeCaller was used to call potential variant sites.The annotation and filtration of gene variants, including de novo variants, compound heterozygotes, and recessive inherited variants, were generated based on Gemini (version 0.19.1).The functional assessments, including functional prediction algorithms, conservation scores, and ensemble scores, were computed using GERP + + [19], CADD [20], SIFT [21], and Polyphen-2 [22].PCR-Sanger sequencing was performed to validate the candidate disease-related variants detected by WES based on Applied Biosystems 3730xl DNA Analyzer (Thermo Fisher Scientific, Waltham, MA, USA).The PCR program was followed: 95 °C for 3 min; 94 °C for 30 s, 58 °C for 30 s, 72 °C for 40 s (38 cycles); 72 °C for 8 min.The sequencing results were aligned to reference sequences through CodonCode Aligner (version 6.0.2.6; CodonCode, Centerville, MA, USA).

Bioinformation analysis
Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted for all the candidate variants detected by WES.The online software and human genome databases, including 1000 Genomes Project Phase 3 (Han Chinese in Beijing China), Mutation Taster, Polyphen-2, ACMG, and Mendelian Clinically Applicable Pathogenicity (M-CAP) Score, were applied to identify mutation frequencies and predict the functional effects of the variants.

Statistical analysis
Categorical variables were presented as number (percentage), and continuous variables were presented as median (interquartile range, IQR).Comparisons of categorical data were made by chi-square test.Comparisons of continuous data were made by the Wilcoxon rank-sum test.A two-side p < 0.05 was considered as statistically significant.All statistical analyses were performed using SPSS V26.0 for statistics and R V4.2.0 for visualization.

Clinical characteristics of pSS-PAH patients
The demographic and clinical manifestations of the patients are shown in Table 1.The patients were mainly female (97.06%), with a median age at onset of symptoms attributable to pSS of 33.50 years (range, 29.25-40.00years) and a median age at onset of symptoms attributable to PAH of 34.00 years (range, 30.50-41.50 years).The profiles of autoantibodies included the presence of anti-SSA in 31 (91.18%) and anti-SSB in 11 (32.25%)cases.The majority of patients were consistent with the WHO functional class II (82.35%).Seven (20.59%) patients had a family history of rheumatic diseases.

Correlation between genotype and phenotype
Correlation analysis between genotypes and observed phenotypes in the patients with pSS-PAH (Fig. 3) found that patients carrying FLG mutations (r = 0.491, p < 0.01) and those with gene variations involved in the purine pathway (r = 0.405, p < 0.01) were prone to having family history of rheumatic diseases.BCR variations (r = 0.429, p < 0.01) and gene variations involved in the extracellular exosome pathway (r = 0.404, p < 0.01) were associated with higher SSDDI scores.Patients carrying PRKRA variations (r = 0.412, p < 0.01) were prone to have a higher WHO cardiac function class.

Validation of susceptibility genes and pathogenicity prediction
Genes identified in more than one patient or identified in patient(s) with family history were confirmed in the Sanger sequencing.A total of 28 susceptibility variant loci from 24 genes were confirmed (Fig. 4, Additional file 2).The following pathogenic variants were identified in more than one patient: FLG c.12064A > T (n = 4), BCR c.3275_3278dupCCGG (n = 3), GIGYF2 c.3463C > A (n = 3), ITK c.1741C > T (n = 2), and SLC26A4 c.919-2A > G (n = 2).These variants, except for SLC26A4, were all located in exons and resulted in amino acid substitutions or truncation (Table 2).In addition, MutationTaster programs predicted c.12064A > T in FLG, c.3275_3278dupCCGG in BCR, c.1741C > T in ITK, and c.919-2A > G in SLC26A4 were disease-causing mutations.According to the ACMG criteria, these four variants were moderate pathogenic variants.

Discussion
This is the first WES study aiming to find genetic variants associated with pSS-PAH.In the present study, we identified pathogenic variants in FLG, BCR, ITK, and SLC26A4, and one likely pathogenic variant in GIGYF2 through WES and subsequent Sanger sequencing confirmation.Furthermore, patients with variants in FLG are more likely to have a family history of rheumatic diseases.
The subjects enrolled in our study were incident or prevalent pSS-PAH patients with regular medical followup in our center.PAH is a rare and severe complication of pSS, characterized by hypertrophy and remodeling of the right ventricle [4,28].With the development of genetic technology such as whole-genome and wholeexome sequencing, several key genes were identified in patients with familial PAH and IPAH, especially BMPR2.Further analysis from cohorts of patients with CTD-PAH, mainly with SSc-PAH, has identified additional susceptibility genes including TBX4, ABCC8, KCNA5, and GDF2/BMP9 [9,29].To the best of our knowledge, the genetic features have not been reported in pSS-PAH patients worldwide.This pilot study is the first to explore genetic susceptibility of this severe complication of Sjögren's syndrome.Our study demonstrated that several novel genes, but not susceptible genes in IPAH and other CTD-PAH, may determine the genetic susceptibility of developing pSS-PAH.
Solute carrier family 26 member 4 (SLC26A4), which maps to chromosome 7 at q22.3, encodes a membrane protein (pendrin) responsible for the anion (especially chloride) exchange between the cytosol and extracellular space in the inner ear and thyroid gland.Moreover, its genetic and epigenetic abnormalities have been identified in cancers such as prostate cancer [34], thyroid cancer [35], and acute myoid leukemia [36].Another study illustrated that the mutant SLC26A4 results in the excessive accumulation of chloride in the cytoplasm and thus induces cell apoptosis by inhibiting PI3K/Akt/mTOR pathway phosphorylation [37].PI3K/Akt/mTOR pathway has a strong link with the occurrence of PAH [38].
In the present study, we observed that a pathogenic variant of the SLC26A4 gene may be involved with the risk of developing pSS-PAH.Replication in other CTD-PAH cohorts will be important to estimate the contribution of SLC26A4.
We also reported disease-causing variants in the gene BCR activator of RhoGEF and GTPase (BCR) and the gene encoding filaggrin (FLG), and a probably damaging variant in the Grb10 interacting GYF protein 2 (GIGYF2) gene.Furthermore, it was demonstrated that variations in the gene BCR were significantly associated with organ damage accrual in patients with pSS-PAH.We also detected a significant phenotype-genotype correlation between the gene FLG and the family history of rheumatic and musculoskeletal diseases among these pSS-PAH patients.The BCR gene, located on chromosome 22, is most known as the breakpoint for chromosomes 22 and 9 reciprocal translocation, which produces the Philadelphia chromosome and is common in patients with chronic myelogenous leukemia [39].Although the fusion gene has been extensively studied in the pathogenesis of leukemia, the function of BCR and whether it is a potential trigger to other tumors and diseases are not clear yet.FLG variants are the most replicated and strongest genetic risk factors for eczema and eczema-associated asthma [40].Furthermore, FLG variants participate in susceptibility to psoriasis, as well as other autoimmune and skin disorders [41,42].GIGYF2 variants are of interest for their important role in familial Parkinson's disease [25,43].In addition, GIGYF2 protein was identified as an adapter protein that binds activated IGF-I and insulin receptors [44].Thus, our study suggests for the first time the roles of BCR, FLG, and GIGYF2 in the pathogenesis of pSS-PAH.
The study on susceptibility genes of multifactorial diseases, like pSS-PAH, remains challenging.Although our sample size was relatively small and we lack the data of the control group, this is the first WES study which clarifies the genotype-phenotype correlations in patients with pSS-PAH.Further studies are necessary to recruit healthy controls, pSS patients without PAH and IPAH patients, and large cohort of patients with pSS-PAH to conduct site-based association analysis for common variants and gene-based burden analysis for rare variants.Furthermore, more experiments are needed to illuminate the expression and related functions of these candidate genes.

Conclusion
Using WES on rare diseases cohort, our work firstly identified novel susceptibility genes associated with pSS-PAH.These variants in FLG, BCR, GIGYF2, ITK, and SLC26A4 may serve as potential biomarkers in Chinese pSS-PAH patients.