For this section, we provide wrappers of four methods, listed in the table below. All functions takes an input
SingleCellExperiment (SCE) object and other settings as arguments and returns the same SCE object with results updated in
|wilcox||Aaron Lun and Jonathan Griffiths, 2016.||
|MAST||Greg Finak and et al., 2015||
|Limma||Gordon Smyth and et al., 2004||
|DESeq2||Michael Love and et al., 2014||
|ANOVA||Jeffrey T. Leek and et al., 2020||
A generic wrapper of all four methods is also provided, called
The differential expression analysis can be easily performed on any preprocessed SCE dataset. Here we introduce the workflow of using the generic wrapper
The most basic parameters include:
Besides these, there are also a few other required parameter sets:
The R script functions we have allow users to perform differential expression analysis with relatively flexible condition setting. In terms of the condition of interests and the condition for control, comparison groups can be set by giving either one or more categories under a column in
colData, or a prepared index vector as long as the indices are able to subset the input SCE object.
Additionally, only specifying the condition of interests is allowed and the control will then be set to all the other cells, which turns to a biomarker finding analysis.
Since the conditions can be set by using indices or annotations, we have two groups of parameters listed below. Note that only one way of setting can be used at one time.
index2for index style setting
classto specify the annotation vector by either directly giving a vector of proper length or giving a column name of
colData. Then use
classGroup2to specify which categories in
classare of interests.
Something that might be special in our workflow is that, users will be required to specify name strings for:
The reason is we assume that users are likely to perform multiple batches of analysis for a single dataset (e.g. group1 Vs. group2 and then group1 Vs. group3), and we hope to have everything stored in one SCE without leaving users any confusion when they look back on it. Also, the namings are also used by DE analysis related plotting functions, so that legends can be well annotated.
To demonstrate a simple and clear example, here we use the “PBMC-3k” dataset from “10X” which can be easily imported with SCTK functions. The preprocessing only includes necessary steps to get cluster labels (i.e. QC and filtering are excluded).
library(singleCellTK) pbmc3k <- importExampleData("pbmc3k") pbmc3k <- scaterlogNormCounts(pbmc3k, "logcounts") # Go through the Seurat curated workflow to get basic clusters pbmc3k <- seuratNormalizeData(inSCE = pbmc3k, useAssay = "counts") pbmc3k <- seuratFindHVG(inSCE = pbmc3k, useAssay = "seuratNormData") pbmc3k <- seuratScaleData(inSCE = pbmc3k, useAssay = "seuratNormData") pbmc3k <- seuratPCA(inSCE = pbmc3k, useAssay = "seuratScaledData") pbmc3k <- seuratRunUMAP(pbmc3k) pbmc3k <- seuratFindClusters(inSCE = pbmc3k, useAssay = "seuratScaledData") # Optional visualization plotSCEDimReduceColData(inSCE = pbmc3k, colorBy = "Seurat_louvain_Resolution0.8", conditionClass = "factor", reducedDimName = "seuratUMAP")
The results are saved in the
metadata slot of the returned SCE object, following the structure below:
matadata(pbmc3k) |- $info1 |- $info2 |- ... |- $diffExp |-$AnalysisName1 |-$AnalysisName2 |-... |-$c2_VS_c3_c8 |-$useAssay = "logcounts" |-$groupNames = c("c2", "c3_c8") |-$select | |-$ix1 = c(FALSE, TRUE, FALSE, ...) | |-$ix2 = c(FALSE, FALSE, FALSE, ...) | | (Two logical vectors both having `ncol(inSCE)` values, | specifies which cells are selected for "c2" or "c3_c8") | |-$annotation = "Seurat_louvain_Resolution0.8" |-$result = (the `data.frame` of top DEG table, shown below) |-$method = "MAST"
To fetch the result as a table of the Top differential expressed genes and the statistics:
DEG <- metadata(pbmc3k)$diffExp$c2_VS_c4_c5$result head(DEG)
To visualize the result in a heatmap:
plotDEGHeatmap(pbmc3k, useResult = "c2_VS_c4_c5", log2fcThreshold = 1)