This section describes the usage of the user interface (UI) for singleCellTK (SCTK) differential expression (DE) analysis workflow. The underneath process is wrapped by R function
runDEAnalysis(). For the help of R console workflow, also refer to the help page for R console.
From anywhere of the UI, the panel for DE can be accessed from the top navigation panel at the circled tab shown below.
The UI is constructed with mainly four parts: 1. Assay input; 2. Condition selection; 3. Parameter settings; 4. Result and other visualization.
A SingleCellExperiment (SCE) object, which stores all the expression, reduction, and metadata, is active behind the interface. The assay needed here is a matrix that contains the expression information for all cells and all features (genes), and is saved in the
assay slot of the SCE object. The selection for this should be made at the top of the panel, as shown in the screenshot below.
Usually in terms of a DE analysis, we have two necessary conditions that have to be defined. One is the condition of interests and the other one is the condition of control. The condition of interests will then be compared to the condition of control via computational methods and statistical result will then be returned. The selection for a condition here means deciding which cells should be grouped together as one condition.
SCTK provides reasonably flexible approaches to setting the conditions:
colDataof SCE object).
NOTE that the “Name of Condition” fields, which will be shown later, are always required, though default text is placed. It is highly recommanded that users put easily understandable namings there, in order to avoid the confusion when multiple batches of analysis are performed, and to keep the automatic legends on DE specific plottings clean as well.
The first approach is designed for the fastest use, where conditions of interests and of control can be simply defined by categorical variables of the same class in the cell annotation (i.e. within one single column in
colData of the background SCE object).
First, users need to choose one single option from the selection list “Choose Annotation Class”. Then, in the UI, there will be two columns for each condition. For each of them, users can make selections for which categories should belong to the condition, at the selection input “Select Condition(s)”. One or more selections are acceptable. When selections are made for one condition while no selection for the other, all the categories but those already used for the former will go to the latter. The text boxes, “Cells selected”, are only for showing users what are selected. While the text span below the boxes summaries the total number of cells selected.
TODO check if text box actually is editable
In the example in the figure shown above, we intend to define the conditions based on the clustering result generated by Seurat’s louvain clustering, with resolution 0.8,
"Seurat_louvain_Resolution0.8". Then we define that the cells assigned to “cluster 2” are of our interests, while the cells assigned to both “cluster 4” and “cluster 5” should be a control against “cluster 2”. (Same as the example in the R console tutorial for DE)
The second approach allows users to make the most use of all the cell annotations available. Meanwhile, single selection on each cell is also enabled to ensure the highest flexibility, though can be relatively ineffective.
In this approach, we utilize a data table, where filters can be applied, for the flexible definition on each condition. By default, all classes of annotations are displayed. In the selection input “Columns to display”, users can select one or more classes annotations, which should be displayed and used for filtering. The blank box under each column title is where the filters are applied. For categorical columns, one or more selections on available variables can be made; for numeric columns (with continuous values), value range can be set. All filters applied are not technically making any selection on cells, but adjusting which cells to display in the table. This can be inferred from the first text span summary below the table area. After users applied all necessary filters, click on “Add all filtered” button below the table area to technically make the selection. Meanwhile, Each row in the table is clickable to make a single selection/cancellation on cells. The advantage of the approach is that conditions defined by necessarily multiple classes of annotations can be fulfilled.
Similarly as the first approach, in the figure, we are making an equivalent condition definition.
The final way allows users without any useful annotation but still know which cells are interesting to perform the analysis. It is just by easily pasting a list of cell identifiers into the text box for each condition. Note that the cell identifiers used must be the default cell IDs in the background SCE object (i.e.
colnames). The input text should be formatted by one ID per line, splitted with no symbol. A summary text span below each text box will be dynamically updated after pasting, for the number of valid IDs found in the input.
colDataof the SCE object). Multiple selections are acceptable.
In this tab, there will be a table with all the genes that are thought to be differentially expressed in the condition of interest against the condition of control, with high significance and passed all filter parameters. The gene names (default identifier in the background object, not necessarily a gene symbol), p-values, log2FC values, and FDR values will be present. The table will be saved in background by the name of the analysis, entered before running. Users can also download the table in comma-splitted value (CSV) format, by clicking “Download Result Table” button.
In the violin plot tab, the UI will present a standard violin plot showing how expression differs in the top N selected genes between cells in both conditions.
In the linear modeling plot tab, an analysis of covariance (ANCOVA) is done, showing how expression differs in the top N selected genes between cells in both conditions.
This tab enables a limited heatmap visualization for the DE analysis selected. Here SCTK will automatically group the cells to the two conditions where they are assigned to, and group the genes by the log2FC values, which indicate whether a gene is up-regulated or down-regulated.
Furthermore, if more sophisticated settings on the DE heatmap are needed, users can move to the generic heatmap viewer and make use of the “import from analysis” functionality.