Introduction

singleCellTK offers a convenient way to compute and select the most variable features that show the highest biological variability to use them in the downstream analysis. The available methods to compute the HVG include seuratFindHVG and scranModelGeneVar, both of which essentially compute the variability statistics and store them into the rowData of the input SingleCellExperiment object. The getTopHVG method can retrieve the names of the top variable genes from these statistics from the input object. Furthermore, plotTopHVG method can be used to plot the top most variable genes.

General Workflow

A general workflow for the Feature Selection sub-tab is summarized in the figure below:

Workflow Guide

  1. Compute statistics for the highly variable genes using the wrapper function runFeatureSelection as below:
sce <- runFeatureSelection(
          inSCE = sce,
          useAssay = "normalizedCounts",
          hvgMethod = "vst"
        )

In the above function, it is recommended to use a normalized assay for the useAssay parameter and the available options for the hvgMethod method include vst, mean.var.plot and dispersion from Seurat and modelGeneVar from Scran package.

  1. Get names of top genes using the getTopHVG function and specify the same method which was used for computation in the step 1:
topGenes <- getTopHVG(inSCE = sce, method = "vst", n = 1000)
  1. Visualize top genes using the plotTopHVG function and specify the same method which was used previously:
plotTopHVG(inSCE = sce, method = "vst", hvgList = topGenes, labelsCount = 10)