Spatial multi-omic map of human myocardial infarction – Nature

Ethics

The local ethics committee of the Ruhr University Bochum in Bad Oeynhausen, the RWTH Aachen University, Utrecht University and WUSTL approved all human tissue protocols (no. 220-640, EK151/09, 12/387 and no. 201104172 respectively). Human myocardial tissue was collected from non-transplanted donor hearts, patients after myocardial infarction undergoing heart transplantation, implantation of a total artificial heart or left ventricular assist device (LVAD) implantation. The study met all criteria of the code of conduct for responsible use of human tissue that is used in the Netherlands. The collection of the human heart tissue was approved by the scientific advisory board of the biobank of the University Medical Center Utrecht, The Netherlands (protocol no. 12/387). All patients provided informed consent and the study was performed in accordance with the Declaration of Helsinki. Written informed consent for collection and biobanking of tissue samples was obtained prior to LVAD implantation.

Human tissue processing and screening

Heart tissues were sampled by the surgeon and immediately frozen in liquid nitrogen. Tissues were homogenized in liquid nitrogen and 7–10 mm3 pieces were embedded in OCT compound (Tissue-Tek) and frozen on dry-ice. Ten-micrometre tissue cryosections were stained with H&E and the appropriate tissue regions were selected for further processing. In total 52 human tissue samples were screened this way and evaluated by a cardiac pathologist. For RNA quality control we minced a 3 × 3 mm3 heart tissue piece in liquid nitrogen and isolated the RNA using Qiagen RNeasy Mini kit (Qiagen) using a proteinase K digestion step as suggested in RNeasy Fibrous Tissue Mini Kit (Qiagen, 74704). RNA integrity number (RIN) analysis (Agilent) was performed using Bioanalyzer RNA 6000 Nano kits (Agilent, No. 5067). RIN ranged from >2 to a maximum of 8.8.

Spatial gene expression assay

Frozen heart samples were embedded in OCT (Tissue-Tek) and cryosectioned (Thermo Cryostar). The 10-µm section was placed on the pre-chilled Optimization slides (Visium, 10X Genomics, PN-1000193) and the optimal lysis time was determined. The tissues were treated as recommended by 10X Genomics and the optimization procedure showed an optimal permeabilization time of 12 or 18 min of digestion and release of RNA from the tissue slide. Spatial gene expression slides (Visium, 10X Genomics, PN-1000187) were used for spatial transcriptomics following the Visium User Guides. Brightfield histological images were taken using a 10X objective on the Nikon Eclipse TiE and a Leica Aperio Versa 200 scanner. Stitching of the raw images was performed using the NIS-Elements software. Next generation sequencing libraries were prepared according to the Visium user guide. Libraries were loaded at 300 pM and sequenced on a NovaSeq 6000 System (Illumina) as recommended by 10X Genomics.

Single-nuclei isolation of human hearts

Single-nuclei isolation was performed as previously described51. Briefly, heart tissue was cut into small pieces (0.5 mm3) in a sterile petri dish on ice and transferred to a tissue homogenizer. Nuclei isolation buffer 0.5 ml (EZ lysis buffer, NUC101, Sigma-Aldrich) plus RNase inhibitor (Protector RNase Inhibitor, Roche) were added to the tissue, and 10-15 strokes with pestle A were applied followed by 10–15 strokes of pestle B. The nuclei were stained with DAPI and FANS sorted using a Sony SH800 to enrich the nuclei. Nuclei isolation from three acute myocardial infarction samples from the WUSTL biobank was performed as described52.

scRNA-seq

Nuclei suspensions with a concentration ranging from 400–1000 nuclei per μl were loaded into the chromium controller (10X, Genomics, PN-120223′) on a Single Cell B chip (10X Genomics, PN-120262) and processed following the manufacturer’ original protocol to generate single-cell gel beads in the emulsion. The sequencing library was generated using the Chromium Single cell 3′ reagent Kit v3 (10X, PN-1000092) and Chromium i7 Multiplex Kit (10X Genomics, PN-120262). Quality control for the constructed library was performed by Tape Station. Libraries were sequenced on NovaSeq targeting 50,000 reads per cell 2 × 150 paired-end kits using the following read length: 28 bp Read1 for cell barcode and UMI, 8-bp I7 index for sample index, and 91-bp Read2 for transcript.

sc-ATAC-seq

The remaining nuclei after processing for 3′ scRNA-seq assay were centrifuged at 500g at 4 °C for 5 min and resuspended in 10 μl of nuclei suspension buffer. After tagmentation the nuclei suspension was loaded on the Chromium Chip E (10X Genomics, PN-1000082) in the Chromium controller according to the manufacturer’s protocol. The library was sequenced on an Illumina NovaSeq 6000 using the following read length: 50 bp Read1 for DNA fragments, 8 bp for i7 index for sample index, 16 bp i5 index for cell barcodes, and 50 bp Read2 for DNA fragments.

RNA in situ hybridization and image quantification

In situ hybridization was performed using formalin-fixed paraffin embedded tissue samples and the RNAscope Multiplex Detection KIT V2 (RNAscope, cat. no. 323100) and RNAscope 4-Plex Ancillary Kit (RNAscope, cat. no. 323120) following the manufacturer’s protocol with minor modifications. The antigen retrieval was performed for 30 min at 99 °C in a water bath (VWR). Tissue pretreatment and washing was performed as suggested by the RNAscope staining protocol. The following probes were used for the RNAscope assay: Hs-CD163 cat. no. 417061-C1, Hs-CCR2 cat. no. 438221-C1, Hs-ANKRD1 cat. no. 524241-C1, Hs-POSTN cat. no. 575941-C1, Hs-Col15a1 cat. no. 484001-C2, Hs-Col1a1 cat. no. 401891-C2, Hs-PECAM1-O2 cat. no. 487381-C2, Hs-NPPB cat. no. 448511-C2, Hs-TREM2 cat. no. 420491-C2, Hs-SPP1 cat. no. 420101-C2, Hs-NPR3 cat. no. 431241-C3, Hs-POSTN cat. no. 409181-C3, Hs-SCARA5 cat. no. 574781-C3, Hs-TNNT2 cat. no. 518991-C3, Hs-SPP1 cat. no. 420101-C4 and Hs-NFE2L1 cat. no. 53850.

Nuclei quantification of H&E stained Visium slides

To quantify nuclei from the H&E staining, we used VistoSeg53, an automated MATLAB pipeline for image analysis. Using this pipeline, the individual TIFF files were used for nuclei segmentation using k-means colour-based segmentation in the image processing toolbox. Next, the binary images were refined with the refineVNS() function for accurate detection of the nuclei. Then a CSV and JSON file was generated that contained the metrics to reconstruct the spot grid to allow for nuclei quantification per 10X Visium detection spot. Counting of nuclei was performed with the countNuclei() function. The images were checked individually with the spotspotcheck() function. Code is available at http://research.libd.org/VistoSeg.

Animal model of myocardial infarction

Myocardial infarction was performed as previously described54. In brief, 12-week-old male and female C57Bl/6J Pdgfrb-creER;tdTomato mice were subjected to chronic left anterior descending artery ligation. The mice were anaesthetized using isoflurane (2–2.5%). The mice were injected 30 min before surgery with metamizole (200 µg g−1 body weight) subcutaneously. Then they were intubated and ventilated with oxygen using a mouse respirator (Harvard Apparatus). Before incision, we injected bupivacaine (2.5 µg g−1 body weight) subcutaneously and intercostally for local analgesia. Then a left thoracotomy was performed, and myocardial infarction was induced by ligature of the left anterior descending artery with 0/7 silk (Seraflex, IO05171Z). The ribs, muscle layer and skin incision were closed. Metamizole was administered for three days via drinking water (1.25 mg ml−1, 1% sucrose) post-surgery. All mice were housed under standardized conditions in the Animal Facility of the University Hospital Aachen (Germany). The operating procedure was in accordance with European legislation and approved by local German authorities (LANUV, reference no. 81-02.04.2017.A410.). Mice were euthanized at different time points (sham, 4 days, 7 days and 14 days). As control, hearts from sham-operated, age-matched mice were taken (2 sham female and 2 sham male mice).

Inducible fate-tracing experiments

For inducible fate tracing, male and female Pdgfrb-creER;tdTomato mice (8 weeks of age) received tamoxifen (3 mg intraperitoneally) 4 times followed by a washout period of 21 days and were then subjected to myocardial infarction surgery or sham (12 weeks of age) as described above and euthanized at 4 days, 7 days and 14 days after surgery.

Echocardiography

Left ventricular heart function was determined by echocardiography performed on a small-animal ultrasound imager (Vevo 3100 and MX550D transducer, FUJIFILM Visualsonics). Recordings of short and long cardiac axis were taken in B mode (2D real-time) using a 40 MHz transducer (MX550D). During the procedure, mice were anaesthetized with 1–2% isoflurane. Ejection fraction (EF) and global longitudinal strain (GLS) were recorded and analysed with the VevoLab Software. The Simpson method was used to assess EF. The GLS was measured in the B-mode of the long axis.

smFISH spot quantification and nuclear segmentation

Images for smFISH were exported in native Nicon format (.nd2). Images were split by channel using bfconvert55 for further processing. RNA spots were quantified using the command line version of Radial Symmetry-FISH (RS-FISH)56. The sigma parameter from RS-FISH, determining spot size, was set to 2.9 for all images. Threshold settings in RS-FISH were manually determined for each channel and were set to the following values for cardiomyocyte state analysis: channel 1 (TNNT2) = 0.0107, channel 2 (ANKRD1) = 0.005, channel 3 (NPPB) = 0.0066. To remove spot counts resulting from low signal, high background images, we removed spots with an intensity lower than the 25th percentile of the channel intensity distribution across all images and applied a minimum intensity threshold of 600. For the quantification of CD163+SPP1+ macrophages, while we were not able to perform full cell segmentation, we performed nuclear segmentation using Mesmer57 with pre-trained nuclear segmentation models to identify all detectable nuclei in each image based on DAPI staining. We subsequently assigned spots to the closest nuclei based on euclidean distance and classified cells as positive or negative for the different markers (POSTN, CD163 and SPP1). Cells with more than 2 spots for a given marker were considered positive for that marker.

Masson trichrome staining

Masson’s trichrome staining was conducted using a ready-to-use kit (Trichrome Stain (Masson) Kit, HT15, Sigma-Aldrich) as described by the manufacturer.

Antibodies and immunofluorescence staining

Heart tissues were fixed in 4% formalin for 4 h at room temperature and then embedded in paraffin. For staining slides were blocked in 5% donkey serum followed by 1 h of incubation with the primary antibody, washing 3 times for 5 min in PBS, and subsequent incubation of the secondary antibodies for 45 min. Following DAPI (4′,6′-diamidino-2-phenylindole) staining (Roche, 1:10.000) the slides were mounted with ProLong Gold (Invitrogen, cat. no. P10144). The following antibodies were used: anti-ACTA2(aSMA)-Cy3 (C6198,1:250, Sigma-Aldrich), anti-SEMA3G (HPA001761, 1:100, Sigma-Aldrich), AF647 donkey anti-rabbit (1:200, Jackson Immuno Research).

Confocal imaging

Acquisition of images was performed using a Nikon A1R confocal microscope using 40× and 60× objectives (Nikon). Image processing was performed using the Nikon Software or ImageJ58.

Generation of a human PDGFRB
+ cardiac cell line

PDGFRB+ cells were isolated from a 69-year-old male patient, undergoing left ventricular assist device surgery. To generate a single-cell suspension, the tissue was homogenized in a gentleMACS dissociator (Miltenyi) and digested with liberase (200 µg ml−1, Roche cat. no. 5401020001) and DNase (60 U ml−1), for 30 min at 37 °C. After filtering the cell suspension (70 µm mesh), cells were stained in two steps using a specific PDGFRB antibody (R&D cat. no. MAB1263 antibody, dilution 1:100) followed by Anti-Mouse IgG1-MicroBeads solution (Miltenyi, cat. no. 130-047-102). Following MACS isolation, cells were cultured in DMEM media (Thermo Fisher cat. no. 31885) for 20 days and immortalized using SV40-LT and HTERT. Retroviral particles were produced by transient transfection of HEK293T cells using TransIT-LT (Mirus). Two types of amphotropic particles were generated by co-transfection of plasmids pBABE-puro-SV40-LT (Addgene #13970) or xlox-dNGFR-TERT (Addgene #69805) in combination with a packaging plasmid pUMVC (Addgene #8449) and a pseudotyping plasmid pMD2.G (Addgene #12259). Retroviral particles were 100x concentrated using Retro-X concentrator (Clontech) 48 h post-transfection. Cell transduction was performed by incubating the target cells with serial dilutions of the retroviral supernatants (1:1 mix of concentrated particles containing SV40-LT or rather hTERT) for 48 h. Subsequently at 72 h after transduction, the transduced PDGFRb+ cells were selected with 2 μg ml−1 puromycin for 7 days.

Lentiviral overexpression of RUNX1

The human cDNA of RUNX1 was PCR amplified using the primer sequences 5′- atgcgtatccccgtagatgcc −3′ and 5′- tcagtagggcctccacacgg −3′. Restriction sites and N-terminal 1xHA-Tag were introduced into the PCR product using the primer 5′- cactcgaggccaccatgtacccatacgatgttccagattacgctcgtatccccgtagatgcc −3′ and 5′- acggaattctcagtagggcctccacac −3′. Subsequently, the PCR product was digested with XhoI and EcoRI and cloned into pMIG (pMIG was a gift from W. Hahn) (Addgene plasmid #9044 ;http://n2t.net/addgene:9044; RRID:Addgene_9044). Retroviral particles were produced by transient transfection in combination with packaging plasmid pUMVC (pUMVC was a gift from B. Weinberg (Addgene plasmid #8449)) and pseudotyping plasmid pMD2.G (pMD2.G was a gift from D. Trono (Addgene plasmid #12259 ; http://n2t.net/addgene:12259; RRID:Addgene_12259)) using TransIT-LT (Mirus). Viral supernatants were collected at 48–72 h post-transfection, clarified by centrifugation, supplemented with 10% FCS and Polybrene (Sigma-Aldrich, final concentration of 8 µg ml−1) and filtered with a 0.45-µm PES filter membrane (Millipore; SLHP033RS). Cell transduction was performed by incubating the PDGFB+ cells with viral supernatants for 48 h. eGFP-expressing single cells were sorted with a SH800 Cell Sorter.

Quantitative PCR with reverse transcription

Cell pellets were collected and washed with PBS followed by RNA extraction using the RNeasy Mini Kit (Qiagen) according to the manufacturer’s instructions. Two-hundred nanograms total RNA was reverse transcribed with High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems) and quantitative PCR with reverse transcription was carried out as described29 Data were analysed using the 2Ct method. The primers used are listed in Supplementary Table 18.

Preprocessing of snRNA-seq, snATAC-seq and spatial transcriptome data

For snRNA-seq data, CellRanger software (v6.0.2) was used to perform the alignment with default options. Since the input consists of nuclei, we enabled the option ‘–include-introns’ to include intronic reads. For snATAC-seq data, the CellRanger ATAC pipeline (v2.2.0) was used with the default settings. For spatial transcriptome data, the SpaceRanger software (v1.3.2) was used to pre-process the sequencing data. The option ‘–reorient-images’ was enabled to allow for automatic image alignment. hg38 was used as the reference genome for human data alignment.

snRNA-seq data processing

To identify the major lineages representative of all of our specimens, we created a single-nuclei atlas analysing and integrating each snRNA-seq dataset using Seurat59 (v4.0.1).

Each dataset went through identical quality control processing. We discarded nuclei (1) in the top 1% in terms of the number of genes, (2) with less than 300 genes and less than 500 UMIs, (3) with more than 5% of mitochondrial gene expression, and (4) doublets as estimated using scDblFinder (v1.4.0)60 with default parameters. Count matrices were log-normalized for downstream analyses using a scaling factor of 10,000. We calculated a dissociation score for each cell using Seurat’s module score functions with a gene set provided by O’Flanagan et. al.61 and discarded the nuclei that belonged to the top 1%. To generate an integrated atlas of all samples, log-normalized expression matrices were merged and dimensionality reduction was performed on the collection of the top 3,000 most variable genes that were shared with most of the samples using principal component analysis (PCA). To select the collection of shared variable genes between samples, first we estimated the top 3,000 most variable genes per sample and then selected the top 3,000 most-recurrent genes from them across all samples. PCA correction was performed with harmony62 (v1.0) using as covariates the patient, sample, and batch labels. A shared nearest neighbour (SNN) graph was built with the first 30 principal components using Seurat’s FindNeighbors, and the cells were clustered with a Louvain algorithm with FindClusters. A high resolution (1) was selected to generate a large collection of nuclei clusters to capture representative major cell lineages, even if present in low proportions. Cluster markers were identified with Wilcoxon tests as implemented in Seurat’s FindAllMarkers function. Final assignment of cells to major cell lineages was based on literature marker genes. We filtered out small clusters (median number of nuclei across filtered clusters = 269) with low gene count distributions (median counts across filtered clusters = 756) or feature recovery (median number of genes across filtered clusters = 695), with marker genes that could not be assigned to known cell types of the heart. To visualize all nuclei in a two-dimensional embedding, a UMAP was created with Seurat’s RunUMAP function using the first 30 principal components of harmony’s PCA correction embedding. Major cell-type markers were estimated by performing differential expression analysis of cell-type and patient-specific pseudo-bulk profiles. Pseudo-bulk profiles were calculated by summing up the counts of all cells belonging to the same cell type and patient. Profiles coming from less than 10 cells or profiles from which the maximum gene expression was of less than 1,000 counts per library were discarded. Differentially expressed genes were calculated by fitting a quasi-likelihood negative binomial generalized log-linear model as implemented in edgeR (v3.32.1)63 (false discovery rate (FDR) < 0.15). Each cell type was compared against the rest.

Comparison with independent healthy and ischaemic human heart cell atlases

We compared our generated atlas with another reference human single-nuclei RNA-seq atlas4 at the molecular and compositional level. The counts matrix was downloaded directly from https://www.heartcellatlas.org and we selected the data coming from single-nuclei  and left ventricle samples. Nuclei with fewer than 200 genes, and genes expressed in less than 3 nuclei were excluded. Log normalization with a scaling factor of 10,000 was performed with scanpy’s64 (v1.7.0) normalize_total function.

To evaluate our major cell-type annotation, we calculated the enrichment of the top 200 marker genes based on log fold change of each cell type defined in the reference atlas in the list of the top 200 marker genes of each of our defined cell types with hypergeometric tests. Marker genes of the reference atlas were calculated with Wilcoxon tests as implemented in scanpy’s 64 (v1.7.0) rank_genes_groups (adj. P < 0.01). Each cell type was compared against the rest. To evaluate the compositional stability of our control samples, we calculated the Pearson correlation between the median proportion of each shared cell type of the reference atlas and our control, border zone, and remote zone samples. Similarly, we compared our atlas to an independent collection of human heart nuclei derived from three ischaemic specimens. First, we analysed and integrated the smaller collection of samples using identical procedures as the ones used in our provided atlas. After nuclei clustering, we assigned each cluster to a cell type using literature markers. Cell-type markers were calculated with Wilcoxon tests (adj. P < 0.01) and the top 200 genes based on log fold change were selected. Marker overlap and compositional stability comparison with ischaemic specimens from our atlas were performed as described previously.

snATAC-seq data processing

To control the data quality, the fragment files were used as input for the package ArchR (v1.0.1)65, and low-quality cells were filtered out based on transcription start site (TSS) enrichment (> 4) and the number of unique fragments (> 3,000 and < 100,000). Doublets were identified and removed by using the functions addDoubletScores and filterDoublets from ArchR with default settings. Next, peaks were identified by using the function addReproduciblePeakSet for each sample. All peaks were merged to create a union peak set of which each peak was annotated as distal, promoter, exonic and intronic. A count matrix was constructed with the function addPeakMatrix. For dimensionality reduction, the method scOpen (v1.0.0)35 was used to generate a low-dimensional matrix of the cells. The algorithm Harmony62 was applied to correct the batch effects and integrate the data and UMAP was used to generate a 2D embedding for visualization. Cells were clustered using the Leiden algorithm with a resolution of 1. To annotate the clusters, a gene activity score matrix was created using the function addGeneScoreMatrix and marker genes were detected for each cluster using the function getMarkerFeatures. The same markers from snRNA-seq data were used to annotate the clusters.

Comparison between snRNA-seq and snATAC-seq data

The Seurat66 label-transferring approach was used to compare the annotation of snRNA-seq and snATAC-seq. To do so, the snRNA-seq data were used as reference and the function FindTransferAnchors was applied to identify a set of anchors using gene expression from snRNA-seq and gene activity score from snATAC-seq. Next, the cell labels from snRNA-seq were transferred to snATAC-seq by running the function TransferData. An adjusted rand index was calculated to evaluate the agreement between annotated and predicted cell labels for snATAC-seq data.

Cell-type-specific transcription factor binding and regulon activity

To estimate transcription factor binding activity for each major cell type identified from snATAC-seq data, we first aggregated the reads within each cell type and created a pseudo-bulk profile. Next, we used MACS2 (v2.2.7)67 to perform peak calling and removed the peaks from chrY, mitochondrial and unassembled ‘random’ contigs. We then predicted the transcription factor binding sites and estimated transcription factor binding activity using HINT-ATAC (v0.13.2)68 based on the JASPAR2020 database69. We linked the transcription factor binding sites to the nearest genes to create a cell-type-specific transcription factor–gene interaction. The number of ATAC-seq reads in the region with 100 bp up-stream and downstream of the the transcription factor binding site were used to indicate how strong the interaction was: each transcription factor–gene interaction was weighted as the ratio between the number of ATAC-seq reads around the transcription factor binding site associated with that gene and the maximum number of reads observed in any binding site of the transcription factor. All interactions with a weight larger than 0.3 were considered in downstream analysis. This generated weighted and filtered cell-type-specific regulons. To infer a transcription factor regulon activity score, we estimated the mean expression of the target genes in each cell-type-specific regulon. Cell-type pseudo-bulk profiles were filtered to contain only genes with at least 10 counts in 5% of the samples, before the estimation of normalized weighted means using decoupleR’s70 (v1.1.0) wmean function with 1,000 permutations. Regulon activities were standardized and correlated with transcription factor binding activities using Spearman correlations. The minimum correlation of 0.5 was used as threshold and the top 5 transcription factors per cell type were selected for visualization.

Cell-type-specific GWAS signal enrichment

GWAS summary statistics for 4 MRI based left ventricle function parameters12 were downloaded from the Cardiovascular Disease Knowledge Portal (https://cvd.hugeamp.org/). For each phenotype, GWAS summary statistics were clumped with Plink (v1.9)71 to identify index SNPs (clump-p1 = 0.0001, clump-kb = 250, clump-r2 = 0.5) using the European samples from 1000 Genomes as a reference population.

Next, we lifted over the coordinates of index SNPs from hg19 to hg38 using the LiftOver tools. For each major cell type, we generated an average chromatin accessibility profile by using snATAC-seq data from all cells. The cell-type-specific GWAS signal enrichment was performed using gchromVAR (v0.3.2)72 and enrichment scores were normalized to z-scores. P-values were calculated based on the z-scores and were corrected by the Benjamini–Hochberg method.

Cell-type-specific integration of snATAC-seq and snRNA-seq data and sub-clustering

For each major cell type that was recovered by both snATAC-seq and snRNA-seq, we aimed to identify sub-clusters spanning multiple samples and modalities. To do so, we devised a multi-step approach to integrate and cluster the data by controlling quality from sample-, cell-type- and modality-specific aspects.

  1. (1)

    To minimize the sample-specific effects, we only considered samples with a minimum number of cells in both snATAC-seq and snRNA-seq: for cardiomyocytes and endothelial cells (n_cells_ATAC > 300 and n_cells_RNA > 400); for fibroblasts (n_cells_ATAC > 100 and n_cells_RNA > 400); and for myeloid (n_cells_ATAC > 50 and n_cells_RNA > 200). This step controls for samples with low recovery of cells in a particular modality.

  2. (2)

    To further filter cell-type-specific low-quality cells from snRNA-seq and snATAC-seq data, we integrated the samples as selected in step 1 using Harmony to correct batch effects from patients and regions based on PCA space (30 dimensions) for snRNA-seq and LSI space (30 dimensions) for snATAC-seq data. We then clustered the cells using Seurat (resolution = 0.4) for each modality independently. We next excluded the clusters that were (i) enriched in a single sample; (ii) showed a lower data quality; (iii) showed a higher doublet score compared with others. Specifically, for cardiomyocytes, we removed 3 clusters from snATAC-seq data: 2 clusters (481 cells) were enriched in a single sample and another cluster (171 cells) showed a low number of unique fragments (average = 8,102). For fibroblasts, we removed 1 cluster (49 cells) from snATAC-seq (98% of cells from a single sample) and 1 cluster (1,172 cells) from snRNA-seq (average doublet score of 0.12). This step controls for cell type and modality-specific low-quality cells.

  3. (3)

    We next integrated the cells from snATAC-seq and snRNA-seq data. To this end, we used the gene activity score matrix of snATAC-seq estimated by ArchR and the gene expression data from snRNA-seq data as input for canonical component analysis by Seurat. The integrated data were projected into a PCA space (30 dimensions) and Harmony was used to correct the batch effects from samples and modalities. This step generated a co-embedded and batch-corrected dataset composed of cells from snRNA-seq and snATAC-seq samples.

  4. (4)

    For each major cell type, we defined the sub-clusters based on the co-embedded data using the Seurat (resolution = 0.9 or 1). Marker genes were identified by using the function FindAllMarkers. We next filtered clusters that were mainly driven by a single sample or modality. Finally, we merged and annotated the clusters based on the markers. The final statistics of the sub-clustering results for each major cell type were provided in Supplementary Table 16.

Analysis of snRNA-seq data from mouse fibroblasts

Cellranger mkfastq and count functions (version v6.0.2) with default parameters were used for demultiplexing and aligning the reads, respectively. Reads were aligned to the mouse reference transcriptome (mm10, Version=2020-A). Prior to alignment, reads for tdTomato were added to the reference. Quantified counts from each sample were aggregated and cells with counts <1,500 and >20,000 were filtered out. Further, cells with >5% reads mapped to mitochondrial genes, as well as cells with <500 genes were removed. Scrublet73 was used to detect potential doublets and only the resulting 40,495 cells with <0.2 scrublet score were kept for further analyses. The highly_variable_genes() function with seurat_v3 flavour implemented in Scanpy (version 1.8.1) was used to obtain the top 2,000 most highly variable genes. Count data was log-normalized using sc.pp.normalize_total(target_sum=1e4) followed by sc.pp.log1p(). The data was subset to the 2,000 genes, unwanted sources of variation from n_umi and mito_fraction were regressed out using sc.pp.regress_out(), and the top 30 principal components were estimated using sc.tl.pca(). Harmony was then used to account for large differences across samples using ‘sample’ as the batch indicator. Network neighbourhood graph was constructed using the function sc.pp.neighbors() with 30 adjusted principal components, cosine distance and n_neighbors = 10. Leiden clustering with resolution 1.0 was used to cluster the cells into 17 clusters. Marker genes were identified using the Wilcoxon test implemented in sc.tl.rank_genes_groups() function in Scanpy. Clusters were manually annotated using the marker genes. We next cleaned up the data by removing clusters with low data quality and re-clustered the data with resolution of 0.2. To annotate the cells, we used the label transfer approach from Seurat based on the sub-clustering results from human fibroblasts.

Gene-regulatory network inference for cardiomyocytes and fibroblasts

We inferred an eGRN for cardiomyocytes and fibroblasts using a multi-step approach including modality pairing, transcription factors and genes selection, and network construction.

  1. (1)

    We first paired the cells between snATAC-seq and snRNA-seq based on the previously described co-embedding space using an optimal matching approach74. This method returns a matching of a snATAC-seq cell to a unique cell in snRNA-seq. Next, we produced a diffusion map75 and created trajectories in this space using the function addTrajectory from ArchR (v1.0.1)65. For cardiomyocytes, we inferred a trajectory from clusters vCM1, vCM2 and vCM3, where vCM1 were considered as roots and vCM3 as the terminal state. For fibroblasts, we built a trajectory with SCARA5+ fibroblasts as root and myofibroblasts as terminal state.

  2. (2)

    Next, we predicted a single-cell-specific transcription factor binding activity score using the R package chromVAR (v1.16)76 from the snATAC-seq data based on motif from the JASPAR2020 database69. In contrast to HINT-ATAC, chromVAR provides transcription factor activity scores at single-cell level. We next selected transcription factors that display concordant binding activity (snATAC-seq) and its gene expression (snRNA-seq) (Pearson correlation > 0.1). This analysis identified 65 transcription factors for cardiomyocytes and 44 transcription factors for fibroblasts. We considered these transcription factors to be potential regulators. We sorted the transcription factors along the trajectory as defined in step 1 and assigned a pseudotime label to each transcription factor. Next, we selected highly variable genes using the snRNA-seq data along the trajectories as described65. We kept the top 10% variable genes and considered them as potential transcription factor targets.

  3. (3)

    To associate regulators with targets (that is, transcription factors with genes), we explored the correlation of peak accessibility and gene expression to identify peak-to-gene links. Specifically, for each gene, we consider peaks that are within 125 kb on either side of the transcription start sites, while excluding the promoter regions. This analysis generated a list of enhancer-to-promoter links. We only considered significantly correlated links (FDR < 0.0001) with a positive correlation as before65. Finally, we associated a transcription factor with a target gene if this gene was linked to an enhancer and this enhancer was predicted to be found by this transcription factor.

  4. (4)

    To build a quantitative transcription factor–gene-regulatory network, we estimated the correlation of the transcription factor binding activity from snATAC-seq and target gene expression from snRNA-seq data and only considered those interactions with Pearson correlation >0.4. We visualized the network based on a force-layout, which places transcription factors (or target genes) with similar interactions close together. We coloured transcription factor nodes in the networks using the assigned pseudotime labels as inferred in step 2. To characterize the importance of transcription factors, we computed two measures: node betweenness (denoted by b)77 and pagerank (denoted by p)78. A final importance score for transcription factor i was calculated as:

    $${\rm{Importance}}\,i=\sqrt{({b}_{i}-min(b){)}^{2}+({p}_{i}-min(p){)}^{2}}$$

  5. (5)

    Finally, to map the inferred GRN into spatial transcriptomics data, we used the target genes for each transcription factor and calculated a module score by using the function AddModuleScore from Seurat (v4.1.0).

Characterization of spatial transcriptomics datasets

Single-slide processing

Filtered feature-barcode expression matrices from SpaceRanger (v1.3.2) were used as initial input for the spatial transcriptomics analysis using Seurat (v4.0.1). Spots with less than 300 measured genes and less than 500 UMIs were filtered out. Ribosomal and mitochondrial genes were excluded from this analysis. Individual count matrices were normalized with sctransform79, and additional log-normalized (size factor = 10,000) and scaled matrices were calculated for comparative analyses using default settings.

Cell-type compositions were calculated for each spot using cell2location80 (v0.05). Reference expression signatures of major cell types were estimated using regularized negative binomial regressions and our integrated snRNA-seq atlas. We fitted a model in six downsampled iterations of our snRNA-seq atlas (30%) and generated a final reference matrix by taking the mean estimation. Each slide was later deconvoluted using hierarchical bayesian models as implemented in run_cell2location. We provided the following hyperparameters: 8 cells per spot, 4 factors per spot, and 2 combinations per spot. Additionally, for each spot we calculated cell-type proportions using the cell-type-specific abundance estimations. Cell-type compositions of the complete slide were calculated adding the estimated number of cells of each type across all spots. To compare the stability of estimated cell compositions between our different data modalities, we calculated Spearman correlations between the estimated cell type proportions of each slide and the observed cell type proportions in its corresponding snRNA-seq and snATAC-seq dataset.

Estimation of functional information from spatial data

For each spot, we estimated signalling pathway activities with PROGENy’s81,82 (v1.12.0) model matrix using the top 1,000 genes of each transcriptional footprint and the sctransform normalized data. Spatially variable genes were calculated with SPARKX83 (v1.1.1) using log-normalized data (FDR < 0.001). To obtain overrepresented biological processes from each list of spatially variable genes, we performed hypergeometric tests using the set of canonical pathways provided by MSigDB84 (FDR < 0.05).

Estimation of cell death molecular footprints from spatial data

To associate the differences in nuclei capture in snRNA-seq between the different samples to cell death processes, we leveraged the information from spatial transcriptomics to estimate the general expression of genes associated to cell death for each sample. For each unfiltered slide we estimated per spot the normalized gene expression of BioCarta’s84 ‘death pathway’ and Reactome’s85 ‘regulated necrosis pathway’ using the decoupleR (v1.1.0) wmean method and the sctransform normalized data. To have a final pathway score per slide, we calculated for each slide the mean ‘pathway expression’ across all spots.

Mapping transcription factor binding activity and GWAS enrichment to spatial data

To visualize the transcription factor binding activity estimated from snATAC-seq data in space, we used the estimated cell type proportion calculated from cell2location scores for mapping. Specifically, for each spot i and transcription factor j, we calculated the transcription factor binding activity as follows:

$${{\rm{ACT}}}_{ij}\,={\sum }_{k=1}^{K}{{\rm{Proportion}}}_{ik}\times {{\rm{ACT}}}_{kj}^{{\prime} },$$

where Proportionik is the estimated proportion of cell type k, K is the number of cell types, and ACT′kj is the binding activity of transcription factor j in cell type k from snATAC-seq data. An equivalent approach was used to map GWAS scores into space.

Cell-state spatial mapping

To map the functional states of each cell type into spatial locations, we leveraged the deconvolution results of each slide and the set of differentially expressed genes of each recovered cell state. Given the continuous nature of cell states, we assumed that the collection of up and downregulated genes of a cell state represented its transcriptional fingerprint and could be summarized in a continuous score in locations where we could reliably identify the major cell type from which the state was derived. For a given major cell type of interest k, we identified spots where its inferred abundance was of at least 10%. To estimate state scores associated with cell type k, we used decoupleR’s (v1.1.0) normalized weighted mean method (wmean) and the set of the upregulated genes of each state defined with snRNA-seq and snATAC-seq (log fold change > 0; Wilcoxon tests, FDR < 0.05). The log fold change of each selected gene was used as the weight in the wmean function.

Analysis of ion channel-related genes

We related the expression of ion channel-related gene sets to the different cardiomyocyte cell states and their location in spatial transcriptomics. First we selected two different gene sets containing ion channel-related genes: (1) Reactome’s85 ‘ion channel transport’ and a curated list of transmural ion channels from Grant et al.86. Gene sets are provided in Supplementary Table 17. First, we calculated gene set scores for each spatial transcriptomics spot using decoupleR’s wmean function. Then we correlated these gene set scores to the spatial mapping of cardiomyocyte cell states in regions where we observed at least 10% of cardiomyocytes. Additionally, we evaluated if any of the genes belonging to these gene sets were differentially expressed between the vCM1 and stressed vCM3 population using Wilcoxon tests as implemented in scran’s (v1.18.5) findMarkers function (area under the curve (AUC) < 0.4, AUC > 0.6, FDR < 0.05).

Spatial map of cell dependencies

We used MISTy’s87 implementation in mistyR (v1.2.1) to estimate the importance of the abundance of each major cell type in explaining the abundance of the other major cell types. Cell-type cell2location estimations of all slides were modelled in a multi-view model using three different spatial contexts: (1) an intrinsic view that measures the relationships between the deconvolution estimations within a spot, (2) a juxta view that sums the observed deconvolution estimations of immediate neighbours (largest distance threshold = 5), and (3) a para view that weights the deconvolution estimations of more distant neighbours of each cell type (effective radius = 15 spots). The aggregated estimated standardized importances (median) of each view of all slides were interpreted as cell-type dependencies in different spatial contexts, such as colocalization or mutual exclusion. Nevertheless, the reported interactions did not imply any causal relation. Before aggregation, we excluded the importances of all predictors of target cell types whose R2 was less than 10% for each slide.

To associate tissue structures with tissue functions, we fitted a MISTy model to explain the distribution of PROGENy’s pathway activities standardized scores. The multi-view model consisted of the following predictors: (1) an intrinsic view to model pathway crosstalk within a spot, (2) a juxta view to model pathway crosstalk between neighbouring spots (largest distance threshold = 5), (3) a para view estimating pathway relations in larger tissue structures (effective radius = 15), (4) an intrinsic view and (5) a para view containing cell2location estimations (effective radius = 15). These last two views model explicitly the relations between cell-type compositions of spots and pathway activities. Cycling cells and TNF were not included in the described analyses. Before aggregation, we excluded the importances of all predictors of target pathway activities whose R2 was less than 10% for each slide.

Niche definitions from spatial transcriptomics data

To identify groups of spots in the different samples that shared similar cell-type compositions, we transformed the estimated cell-type proportions of each spatial transcriptomics spot and slide into isometric log ratios (ILR)88, and clustered spots into groups. These niches represent groups of spots that are similar in cell composition and represent potential shared structural building blocks of our different slides; we refer to these groups of spots as cell-type niches. Louvain clustering of spots was performed by first creating a shared nearest neighbour graph with k different number of neighbours (10, 20, 50) using scran’s89 (v1.18.5) buildSNNGraph function. Then, we estimated the clustering resolution that maximized the mean silhouette score of each cluster. We assigned overrepresented cell types in each structure by comparing the distribution of cell-type compositions within a cell-type niche versus the rest using Wilcoxon tests (FDR < 0.05). We tested if a given cell state was more representative of a cell-type niche by performing Wilcoxon tests between each niche and the rest (FDR < 0.05). Only positive state scores were considered in this analysis.

Additionally, to complement the repertoire of niches identified with cell-type compositions, we integrated and clustered the Visium spots of all slides using their log-normalized gene expression. We called these clusters molecular niches. Integration and clustering of spots was performed with the same methodology as the one used to create the snRNA-Seq atlas. A low resolution was used (0.2) to have a similar number of molecular niches as cell-type niches. Cell-type and cell-state enrichment was performed as mentioned before.

Differential expression analysis of molecular niches enriched with cardiomyocytes

Differential expression analysis between molecular niches enriched in cardiomyocytes (niche 0, niche 1, niche 3) was performed using the log-transformed expression of all spots belonging to a given niche. Wilcoxon tests were performed with scran’s89 (v1.18.5) findMarkers function. Genes with a summary AUC >0.55 and FDR <0.05 were considered upregulated genes.

Differential molecular profiles of the molecular niche 10 enriched with capillary endothelial cells

Differential expression analysis between ischaemic, fibrotic and myogenic-enriched spatial transcriptomic spots was performed with Wilcoxon tests as implemented in scran’s89 (v1.18.5) findMarkers function. To obtain overrepresented biological processes from upregulated genes, we performed hypergeometric tests using the set of hallmark pathways provided by MSigDB84. Normalized PROGENy’s pathway activities for each spot were calculated using decoupleR’s wsum method with 100 permutations on log-transformed data. Mean normalized pathway scores were calculated per slide and comparisons between groups were performed with Wilcoxon tests. Reported P-values were adjusted for multiple testing using the Benjamini–Hochberg procedure.

General differences in tissue organization

We annotated the different spatial transcriptomic slides into three groups based on histological differences with the help of pathologists: myogenic-enriched, fibrotic-enriched and ischaemic-enriched. A general comparison of the sampled patient specimens was performed at the compositional and molecular level.

Hierarchical clustering, with euclidean distances and Ward’s algorithm, was used to cluster the pseudo-bulk profiles of the spatial transcriptomics datasets (replicates where merged, n = 27). Genes with less than 100 counts in 85% of the sample size were excluded for this analysis. Log normalization (scale factor = 10,000) was performed. To visualize the general molecular differences between our samples, log-normalized pseudo-bulk profiles of the spatial transcriptomics datasets were projected in an UMAP embedding.

To identify compositional differences between our sample groups, we compared cell-type and niche compositions. To identify cell-type composition changes associated to the sample groups, mean cell-type compositions across single-cell and spatial datasets were compared with Kruskal–Wallis tests (FDR < 0.1). Pairwise comparisons of sample groups were performed with the Wilcoxon test. Additionally, to test which cell-type and molecular niches had different distributions between our group samples, we performed Kruskal–Wallis tests over the compositions of cell-type or molecular niches (FDR < 0.1). Additional pairwise comparisons were performed with Wilcoxon tests (P-values adjusted with Benjamini–Hochberg procedure). For this, we only consider slides where no single niche represents more than 80% of the spots. Also, we only consider niches representing more than 1% of the composition of at least 5 slides.

To identify differences between the structurally similar tissues captured in the myogenic-enriched group, we separated the samples into remote, border, and control zones and repeated the niche composition comparison described previously.

To identify patterns of tissue organization associated with a sample group, we tested if differential cell dependencies were captured by the MISTy models used to predict cell-type abundance (see ‘Spatial map of cell dependencies’). First, we filtered the standardized importance matrices of each sample’s MISTy model fitted to predict the abundance of major cell-types to contain only the values of target cell types predicted with an R2 greater than 0.05. Then, for each slide we created a spatial dependency vector where each element contains the importance of each possible pair of target and predicted cell types. Finally, we tested which cell interactions had higher importances in one of the sample groups compared to the rest using Wilcoxon tests (FDR < 0.25). To prioritize interactions, we only performed pairwise comparisons between sample groups for cell-type dependencies from which the maximum median importance across all groups was greater than 0.

Estimation of the effects of the spatial context on gene expression

We used mistyR (v1.2.1) to find the associations between the tissue organization and the spatial distribution of stressed cardiomyocytes and the different endothelial, myeloid and fibroblast cell states. We hypothesized that the distribution of specific cell states in the spatial transcriptomics slides could be modelled by the cell-type composition or cell-state presence of individual spots and their neighbourhood.

For a given collection of cell states of interest, we first defined regions of interest in every single slide as the collection of spots where the inferred abundance of the cell type from which the cell state was derived was at least 10%. These regions limit the target spots used in the MISTy model, however the whole slide is used to spatially contextualise the predictors. We used as predictors the abundances of cell types estimated with cell2location or cell states scores. To account only for the effects of the activation of a cell state, the state scores of predictor cell states were masked to 0 whenever their score was lower than 0. In all models we included two classes of spatially contextualized predictive views: an intrinsic (intra) and a local neighbourhood view (para, effective radius = 5).

Specifically we fitted the following models to answer four questions:

  1. (1)

    What are the main cell types whose abundance within a spot or in the local neighbourhood predict the stressed vCM3?

    vCM3 ~ intra(cell-type abundance) + para(cell-type abundance)

  2. (2)

    What are the main cell types whose abundance within a spot or in the local neighbourhood predict the endothelial subtypes? How do the different subtypes relate to each other?

    ECsubtypes ~ intra(ECsubtypes) + para(ECsubtypes) + intra(cell-type abundance) + para(cell-type abundance)

  3. (3)

    What are the myeloid cell states within a spot or in the local neighbourhood that better predict fibroblasts cell states? How do fibroblasts cell states relate to each other?

    FibroblastStates ~ intra(FibroblastStates) + para(FibroblastStates) + intra(MyeloidStates) + para(MyeloidStates)

  4. (4)

    What are the main cell types whose abundance within a spot or in the local neighbourhood predict the myeloid cell states? How do the different states relate to each other?

MyeloidStates ~ intra(MyeloidStates) + para(MyeloidStates) + intra(cell-type abundance) + para(cell-type abundance)

Specific view importances were compared between patient groups as described previously with an R2 filter of 0.1.

Cell–cell communication analysis

To estimate ligand–receptor interactions between the sub-populations of fibroblasts and myeloid cells, we extracted gene expression matrix from the integrated snRNA-seq and snATAC-seq data for each sample group (that is, myogenic, ischaemic and fibrotic) and combined the matrices from all sub-populations. We next used LIANA (v0.0.9)90, a framework that compiles the results of state-of-the-art cell communication inference methods, to infer ligand–receptor interactions. We focused on the CellPhoneDB91 ligand–receptor method with Omnipath’s ligand–receptor database92 implemented in LIANA90. This was done by combining snRNA-seq samples of myogenic, ischaemic and fibrotic groups and subsetting only the fibroblasts and myeloid cells sub-states. Next, we used CrossTalker (v1.3.1)93 to find changes in cell–cell communication by contrasting ligand–receptor interactions predicted in myogenic vs. ischaemic samples and myogenic vs. fibrotic samples. The interactions considered by CrossTalkeR were obtained by filtering the output of LIANA90 (P > 0.01).

Visualization, statistics, and reproducibility

In data represented as box plots (Figs. 2f, 4c,d,m,o, 5b and 6d,n) the middle line corresponds to the median, the lower and upper hinges describe the first and third quartiles, the upper whisker extends from the hinge to the largest value no further than 1.5 × inter-quartile range (IQR) from the hinge and the lower whisker extends from the hinge to the smallest value at most 1.5 × IQR of the hinge, and data beyond the end of the whiskers are outlying points that are plotted individually. In Figs. 4b and 5b,k, Colours refer to gene-weighted kernel density as estimated by using R package Nebulosa94. All reported P-values based on multi-comparison tests were corrected using the Benjamini–Hochberg method95. The depicted immunofluorescence micrographs are representative (Figs. 4c and 6n). The number of samples for each group was chosen on the basis of the expected levels of variation and consistency. The depicted RNAscope, immunofluorescence micrographs are representative and were performed at least twice, and all repeats were successful. Fig. 1a contains a panel from BioRender.com.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

#Spatial #multiomic #map #human #myocardial #infarction #Nature

Leave a Comment

Your email address will not be published.