Human patient samples
Developed in 2003 and 2007, respectively, the NRCOS (University of Pittsburgh, PI - Torok) and MAC (University of Texas Southwestern, PI – Jacobe) cohorts collect and link patient data with biological specimens through well-developed, standardized methods for recruitment, data capture, data management, and handling of biological specimens, with the personnel and infrastructure to support these activities [8, 10,11,12, 25,26,27,28]. Research participants in both cohorts are required to meet the diagnostic criteria of LS [29] and undergo an IRB-approved consent process to collect predefined case report forms and biospecimens (blood, skin, saliva) [8, 10,11,12, 25,26,27,28]. Pediatric-onset disease is defined as onset at < 18 years of age. Participants enrolled in either cohort that meet inclusion criteria are eligible for the collection of skin for research purposes. For this pilot study, an additional IRB-approved consent was obtained for the collection of two 3-mm-punch biopsies of the areas affected by LS from adult patients in the MAC cohort.
Skin biopsy collection and shipment
Three LS subjects with lesional biopsy specimens were analyzed, each with a 3-mm-punch biopsy preserved in CryoStor® CS10 (frozen) and an adjacent 3-mm-punch biopsy preserved in RPMI (fresh). The average depth of the biopsies was 3.4 mm. These two biopsies were allocated for scRNAseq; one whole biopsy was placed in Roswell Park Memorial Institute (RPMI) 1640 Medium (Gibco®, Gaithersburg, MD) and put on ice while the other whole biopsy was placed in chilled CryoStor® CS10 cell preservation media (BioLife Solutions®, Bothell, WA), incubated at 4 °C for 10 min then put on dry ice per manufacturer’s instructions. Samples were then shipped overnight via FedEx for scRNAseq processing and analysis at the University of Pittsburgh.
Skin processing and single-cell RNA sequencing
Upon arrival at the University of Pittsburgh, Single Cell Genomics Core, Sequencing Facility, sample dissociation and processing was performed identically. Samples were thawed or kept on ice (dependent on shipping type) before being enzymatically digested (Miltenyi Biotec Whole Skin Dissociation Kit, human) for 2 h and further dispersed using the Miltenyi gentleMACS Octo Dissociator. The resulting cell suspension was filtered through 70-μm cell strainers twice and re-suspended in PBS containing 0.04% BSA.
Cells from biopsies were mixed with reverse transcription reagents then loaded into the Chromium instrument (10× Genomics), a commercial application of Drop-Seq [30]. This instrument separated cells into mini-reaction “partitions” formed by oil micro-droplets, each containing a gel bead and a cell, known as Gel Bead-In-Emulsions (GEMs). GEMs contain a gel bead, scaffold for an oligonucleotide that is composed of an oligo-dT section for priming reverse transcription, and barcodes for each cell (10×) and each transcript (unique molecular identifier, UMI), as described [31]. Approximately 1000-fold excess of partitions compared to cells ensured low capture of duplicate cells. ~ 2600–4300 cells were loaded into the instrument to obtain data on ~ 1100–2300 cells, anticipating a multiplet rate of ~ 1.2% of partitions. V2 single-cell chemistries were used per manufacturer’s protocol (10× Genomics).
Briefly, the reaction mixture/emulsion was removed from the Chromium instrument, and reverse transcription performed. The emulsion was then broken using a recovery agent, and following Dynabead and SPRI clean up, cDNAs were amplified by 11–12 cycles of PCR (C1000, Bio-Rad). cDNAs were sheared (Covaris) into ~ 200 bp length. DNA fragment ends were repaired, A-tailed, and adaptors ligated. The library was quantified using KAPA Universal Library Quantification Kit KK4824 and further characterized for cDNA length on a Bioanalyzer using a High Sensitivity DNA kit. RNA-seq. Libraries were sequenced (~ 200 million reads/sample), using the Illumina NextSeq-500 platform.
Data preprocessing and bioinformatics analysis
Sequencing reads were examined by quality metrics, transcripts mapped to reference human genome (GRCh38), and assigned to individual cells according to cell barcodes, using Cell Ranger (10× Genomics). Data from the study is deposited on NCBI Gene Expression Omnibus (GSE160536).
Further data analysis was performed using R (version 3.5), specifically the Seurat 3.0 package for normalization of gene expression and identification and visualization of cell populations [32, 33]. Briefly, the UMI matrix was filtered such that only cells expressing at least 200 genes were utilized in downstream analysis. Unwanted sources of variation were regressed out of the data by constructing linear models to predict gene expression based on the number of UMIs per cell as well as the percentage of mitochondrial gene content. Additional filtering was applied to tables examining differential gene expression between cells or groups of cells within clusters, by filtering cells out that were expressed in less than 10% of the cells showing upregulated expression.
The data was further normalized between samples using SCTransform which models technical noise using a regularized negative binomial regression model [34]. Principal component analysis (PCA) was subsequently performed on the scaled data of the identified highly variable genes.
Finally, cells were clustered using a smart local moving algorithm (SLM). Cell populations were identified based on gene markers in the associated transcriptomes and visualized by t-distributed stochastic neighbor embedding (t-SNE) [35]. Out of 33,538 detected genes, 5000 were determined to be variable between identified clusters. AddModuleScore was utilized to calculate the average expression levels of each input (either gene or cluster) on a single-cell level, subtracted by the aggregated expression of control feature sets.
Statistical analysis
Differential gene expression between sample types was assessed using Seurat’s implementation of the non-parametric Wilcoxon rank-sum test and a Bonferroni correction was applied to the results. Spearman’s correlations and other statistical tests run in R utilized the packages devtools and ggplot. GraphPad Prism version 8.0 was used to compare cellular populations between processing types using Wilcoxon matched-pairs signed-rank tests.