Introduction

Seurat is an R package (Butler et al., Nature Biotechnology 2018 & Stuart, Butler, et al., Cell 2019) that offers various functions to perform analysis of scRNA-Seq data on the R console.

All methods provided by SCTK for Seurat workflow use a object both as an input and output.

Using a sample dataset:

sce <- importExampleData('pbmc3k')
sce
## class: SingleCellExperiment 
## dim: 32738 2700 
## metadata(0):
## assays(1): counts
## rownames(32738): MIR1302-10 FAM138A ... AC002321.2 AC002321.1
## rowData names(3): ENSEMBL_ID Symbol_TENx Symbol
## colnames(2700): pbmc3k_AAACATACAACCAC-1 pbmc3k_AAACATTGAGCTAC-1 ...
##   pbmc3k_TTTGCATGAGAGGC-1 pbmc3k_TTTGCATGCCTCAC-1
## colData names(12): Sample Barcode ... Date_published sample
## reducedDimNames(0):
## altExpNames(0):

1. Normalize Data

Once raw data is uploaded and stored in a SingleCellExperiment object, seuratNormalizeData function can be used to normalize the data. The method returns a SingleCellExperiment object with normalized data stored as a new assay in the input object.

Parameters to this function include useAssay (specify the assay that should be normalized), normAssayName (specify the new name of the normalized assay, defaults to “seuratNormData”), normalizationMethod (specify the normalization method to use, defaults to “LogNormalize”) and scaleFactor (defaults to 10000).

sce <- seuratNormalizeData(inSCE = sce, useAssay = "counts", normAssayName = "seuratNormData")
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique

2. Scale Data

Normalized data can be scaled by using the seuratScaleData function that takes inputs a SingleCellExperiment object that has been normalized previously by the seuratNormalizeData function. Scaled assay is stored back in the input object.

Parameters include useAssay (specify the name of normalized assay), scaledAssayName (specify the new name for scaled assay, defaults to “seuratScaledData”), model (specify the method to use, defaults to “linear”), scale (specify if the data should be scaled, defaults to TRUE), center (specify if the data should be centered, defaults to TRUE) and scaleMax (specify the maximum clipping value, defaults to 10).

sce <- seuratScaleData(inSCE = sce, useAssay = "seuratNormData", scaledAssayName = "seuratScaledData")
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
## Centering and scaling data matrix

3. Highly Variable Genes

Highly variable genes can be identified by first using the seuratFindHVG function that computes that statistics against a selected HVG method in the rowData of input object. The genes can be identified by using the .seuratGetVariableFeatures function. The variable genes can be visualized using the seuratPlotHVG(vals$counts) method. Parameters for seuratFindHVG include useAssay (specify the name of the scaled assay, defaults to “seuratScaledData”) and hvgMethod (specify the method to use for variable genes computation, defaults to “vst”).

sce <- seuratFindHVG(inSCE = sce, useAssay = "seuratScaledData", hvgMethod = "vst")
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
print(singleCellTK:::.seuratGetVariableFeatures(sce, 100))
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
##   [1] "PPBP"       "LYZ"        "S100A9"     "IGLL5"      "GNLY"      
##   [6] "FTL"        "PF4"        "FTH1"       "GNG11"      "FCER1A"    
##  [11] "HLA-DRA"    "S100A8"     "CD74"       "CLU"        "GZMB"      
##  [16] "NKG7"       "C1QA"       "CST3"       "CCL4"       "HLA-DPB1"  
##  [21] "SDPR"       "C1QB"       "AL928768.3" "TYMS"       "TUBB1"     
##  [26] "RRM2"       "STMN1"      "GUSB"       "MZB1"       "C10orf32"  
##  [31] "GP9"        "IGJ"        "LYPD2"      "HAGH"       "TK1"       
##  [36] "FUS"        "LYAR"       "DSCR3"      "SLA"        "SMIM7"     
##  [41] "KIF5B"      "PPP6C"      "HBP1"       "TMX2"       "HBA1"      
##  [46] "INTS12"     "HLA-DPA1"   "KIAA0101"   "CA2"        "APOBEC3B"  
##  [51] "CCL3"       "LILRA4"     "PRDX1"      "GIMAP5"     "CD79A"     
##  [56] "HLA-DRB1"   "SERPINF1"   "MYL9"       "PRKCB"      "C16orf13"  
##  [61] "MLLT11"     "LST1"       "BIRC5"      "TNFRSF17"   "NDUFA12"   
##  [66] "CCL5"       "THOC7"      "RABL6"      "SIVA1"      "SDHB"      
##  [71] "STK38"      "CLDN5"      "TREML1"     "UBLCP1"     "IFI27"     
##  [76] "SPARC"      "HIST1H2AC"  "PTGDS"      "C14orf1"    "RALY"      
##  [81] "IL8"        "PRF1"       "CLEC2B"     "TNFSF10"    "FCGR3A"    
##  [86] "ODC1"       "MED30"      "STAMBP"     "GMPR"       "GZMH"      
##  [91] "CWC15"      "DNASE1L3"   "TMEM208"    "ACTB"       "FGFBP2"    
##  [96] "CXCL3"      "IDH2"       "TYROBP"     "NXT2"       "FCER1G"
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
## Warning: Transformation introduced infinite values in continuous x-axis

4. Dimensionality Reduction

PCA or ICA can be computed using the seuratPCA and seuratICA functions respectively. Plots can be visualized using seuratReductionPlot, seuratElbowPlot, seuratComputeJackStraw, seuratJackStrawPlot, seuratComputeHeatmap.

sce <- seuratPCA(inSCE = sce, useAssay = "seuratScaledData", reducedDimName = "seuratPCA", nPCs = 20)
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
## PC_ 1 
## Positive:  CST3, TYROBP, LST1, AIF1, FTL, FCN1, LYZ, FTH1, S100A9, FCER1G 
##     TYMP, CFD, LGALS1, S100A8, CTSS, LGALS2, SERPINA1, SPI1, IFITM3, PSAP 
##     CFP, SAT1, IFI30, COTL1, S100A11, NPC2, LGALS3, GSTP1, PYCARD, NCF2 
## Negative:  MALAT1, LTB, IL32, CD2, ACAP1, STK17A, CTSW, CD247, CCL5, GIMAP5 
##     AQP3, GZMA, CST7, TRAF3IP3, MAL, HOPX, ITM2A, GZMK, MYC, GIMAP7 
##     BEX2, ETS1, LDLRAP1, ZAP70, LYAR, RIC3, TNFAIP8, NKG7, KLRG1, SAMD3 
## PC_ 2 
## Positive:  NKG7, PRF1, CST7, GZMA, GZMB, FGFBP2, CTSW, GNLY, GZMH, SPON2 
##     CCL4, FCGR3A, CCL5, CD247, XCL2, CLIC3, AKR1C3, SRGN, HOPX, CTSC 
##     TTC38, S100A4, ANXA1, IL32, IGFBP7, ID2, ACTB, XCL1, APOBEC3G, SAMD3 
## Negative:  CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DRA, HLA-DQB1, LINC00926, CD79B, HLA-DRB1, CD74 
##     HLA-DPB1, HLA-DMA, HLA-DQA2, HLA-DRB5, HLA-DPA1, HLA-DMB, FCRLA, HVCN1, LTB, BLNK 
##     KIAA0125, P2RX5, IRF8, IGLL5, SWAP70, ARHGAP24, SMIM14, PPP1R14A, FCRL2, C16orf74 
## PC_ 3 
## Positive:  HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1, CD74, HLA-DPA1, MS4A1, HLA-DRB1, HLA-DRB5 
##     HLA-DRA, HLA-DQA2, TCL1A, LINC00926, HLA-DMB, HLA-DMA, HVCN1, FCRLA, IRF8, BLNK 
##     KIAA0125, SMIM14, PLD4, P2RX5, IGLL5, SWAP70, LAT2, TMSB10, IGJ, MZB1 
## Negative:  PPBP, PF4, SDPR, SPARC, GNG11, NRGN, GP9, RGS18, TUBB1, CLU 
##     HIST1H2AC, AP001189.4, ITGA2B, CD9, TMEM40, CA2, PTCRA, ACRBP, MMD, NGFRAP1 
##     TREML1, F13A1, RUFY1, SEPT5, MPP1, TSC22D1, CMTM5, RP11-367G6.3, MYL9, GP1BA 
## PC_ 4 
## Positive:  HLA-DQA1, HIST1H2AC, PF4, CD79A, SDPR, CD79B, PPBP, GNG11, HLA-DQB1, SPARC 
##     MS4A1, CD74, GP9, HLA-DPB1, RGS18, NRGN, PTCRA, CD9, HLA-DQA2, AP001189.4 
##     CLU, TUBB1, CA2, HLA-DRB1, HLA-DPA1, ITGA2B, HLA-DRA, TCL1A, TMEM40, ACRBP 
## Negative:  VIM, S100A8, S100A6, S100A4, TMSB10, S100A9, IL32, GIMAP7, S100A10, LGALS2 
##     RBP7, MAL, FCN1, LYZ, CD2, S100A12, MS4A6A, FYB, S100A11, AQP3 
##     GIMAP4, FOLR3, ANXA1, MALAT1, AIF1, GIMAP5, IL8, IFI6, TRABD2A, ASGR1 
## PC_ 5 
## Positive:  LTB, VIM, AQP3, PPA1, MAL, KIAA0101, CD2, CORO1B, CYTIP, FYB 
##     IL32, TRADD, ANXA5, TUBA1B, HN1, PTGES3, TYMS, ITM2A, COTL1, GPR183 
##     ACTG1, TNFAIP8, ATP5C1, TRAF3IP3, GIMAP4, PRDX1, ZWINT, ABRACL, NGFRAP1, LDLRAP1 
## Negative:  GZMB, FGFBP2, NKG7, GNLY, PRF1, CCL4, CST7, SPON2, GZMA, GZMH 
##     CLIC3, XCL2, CTSW, TTC38, AKR1C3, CCL5, IGFBP7, XCL1, S100A8, CCL3 
##     TYROBP, HOPX, CD160, HAVCR2, S100A9, FCER1G, PTGDR, LGALS2, RBP7, S100A12
seuratReductionPlot(inSCE = sce, useReduction = "pca")
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique

seuratElbowPlot(inSCE = sce)
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
sce <- seuratComputeJackStraw(inSCE = sce, useAssay = "seuratScaledData", dims = 20)
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
## Warning: Invalid name supplied, making object name syntactically valid. New
## object name is Seurat..JackStraw.RNA.pca; see ?make.names for more details on
## syntax validity
seuratJackStrawPlot(inSCE = sce, dims = 20)
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
## Warning: Removed 32235 rows containing missing values (geom_point).

seuratComputeHeatmap(inSCE = sce, useAssay = "seuratScaledData", useReduction = "pca", nfeatures = 20, dims = 2)
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique

5. tSNE/UMAP

seuratRunTSNE/seuratRunUMAP can be used to compute tSNE/UMAP statistics and store into the input object. Parameters to both functions include inSCE(input SCE object), useReduction (specify the reduction to use i.e. pca or ica), reducedDimName (name of this new reduction) and dims (number of dims to use). seuratReductionPlot can be used to visualize the results.

sce <- seuratRunTSNE(inSCE = sce, useReduction = "pca", reducedDimName = "seuratTSNE", dims = 10)
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
seuratReductionPlot(sce, "tsne")
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique

sce <- seuratRunUMAP(inSCE = sce, useReduction = "pca", reducedDimName = "seuratUMAP", dims = 10)
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
seuratReductionPlot(sce, "umap")
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique

6. Clustering

seuratFindClusters function can be used to compute the clusters, which can later be plotted through the seuratReductionPlot method with cluster labels. The parameters to the function include inSCE (input SCE object), useAssay (name of the scaled assay), useReduction (specify which reduction to use i.e. pca or ica), dims (number of dims to use) and the algorithm (either louvain, multilevel or SLM).

sce <- seuratFindClusters(inSCE = sce, useAssay = "seuratScaledData", useReduction = "pca", dims = 10, algorithm = "louvain")
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 2700
## Number of edges: 98199
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8285
## Number of communities: 11
## Elapsed time: 0 seconds

seuratReductionPlot can then be used to plot all reductions previously computed with cluster labels:

## Warning: Non-unique features (rownames) present in the input matrix, making
## unique

seuratReductionPlot(sce, "tsne")
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique

seuratReductionPlot(sce, "umap")
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique

7. Find Markers

Marker genes can be identified using the seuratFindMarers function. This function can either use one specified column from colData of the input object as a group variable if all groups from that variable are to be used ( parameter) or users can manually specify the cells included in one group vs cells included in the second group ( and parameter).

sce <- seuratFindMarkers(inSCE = sce, allGroup = "Seurat_louvain_Resolution0.8")
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
print(head(metadata(sce)[["seuratMarkers"]]))
##   gene.id         p_val  avg_log2FC pct.1 pct.2     p_val_adj cluster
## 1   RPS27 7.886983e-113  -3.8080543 1.000 0.992 2.582041e-108       1
## 2    LDHB 3.508604e-102 -58.2224134 0.958 0.597  1.148647e-97       1
## 3   RPS25  5.117360e-95   0.7785624 1.000 0.974  1.675321e-90       1
## 4   RPS12  5.953365e-92  39.4769041 1.000 0.991  1.949013e-87       1
## 5   RPL31  7.054459e-92  -2.4053591 0.996 0.965  2.309489e-87       1
## 6    CCR7  4.031406e-90   1.1189456 0.491 0.118  1.319802e-85       1

The marker genes identified can be visualized through one of the available plots from ridge plot, violin plot, feature plot, dot plot and heatmap plot. All marker genes visualizations can be plotted through the wrapper function , which must be supplied the SCE object (markers previously computed), name of the scaled assay, type of the plot (available options are , , , and ), features that should be plotted (character vector) and the grouping variable that is available in the colData slot of the input object. An additional parameter decides in how many columns should the visualizations be plotted.

seuratGenePlot(
        inSCE = sce,
        scaledAssayName = "seuratScaledData",
        plotType = "ridge",
        features = metadata(sce)[["seuratMarkers"]]$gene.id[1:4],
        groupVariable = "Seurat_louvain_Resolution0.8",
        ncol = 2
      )
## Warning: Non-unique features (rownames) present in the input matrix, making
## unique
## Picking joint bandwidth of 1.8
## Picking joint bandwidth of 0.366
## Picking joint bandwidth of 1.01
## Picking joint bandwidth of 1.99