Dimensionality Visualization
David Rach
University of Maryland, Baltimoredrach@som.umaryland.edu
1 June 2025
Source:vignettes/DimensionalityVisualization.Rmd
DimensionalityVisualization.Rmd
knitr::opts_chunk$set(eval = FALSE)
Introduction
Spectral Flow Cytometry data consist of many acquired cellular events, with increasing number of markers. Many unsupervised analysis approaches rely on clustering the markers on the basis of median fluorescent intensity. A common workflow step is to then project these cells onto a dimensionality visualized space using one of the various available algorithms. While this can provide useful information, we agree with Pachter labs points in their Specious Art Spatial Genomics paper highlighting ease of overinterpreting what may be sheer artefact noise.
Similarly, dimensionality visualization works on both unmixed files, as well as novel applications addressing unstained samples still in the raw form. To enable characterization of what actually ends up in these islands (whether brightness, unique signature, random noise) we have implemented dimensionality visualization for the most frequently used algorithms to facilitate their exploration in context of our paper.
While these are useful, we would like to reiterate, these wrappers are for exploratory/convenience purposes only, please refer to the original packages for any specialized argument implementation. We may modify them, so don’t build your own R package around our implementation. Instead, take advantage of the copyleft license nature of free and open-source software, fork it, modify it and include it in your own package.
Getting Started
Create a GatingSet
File_Location <- system.file("extdata", package = "Luciernaga")
FCS_Pattern <- ".fcs$"
FCS_Files <- list.files(path = File_Location, pattern = FCS_Pattern,
full.names = TRUE, recursive = FALSE)
head(FCS_Files[10:30], 5)
UnstainedFCSFiles <- FCS_Files[grep("Unstained", FCS_Files)]
UnstainedCells <- UnstainedFCSFiles[-grep(
"Beads", UnstainedFCSFiles)]
MyCytoSet <- load_cytoset_from_fcs(UnstainedCells,
truncate_max_range = FALSE,
transform = FALSE)
MyCytoSet
MyGatingSet <- GatingSet(MyCytoSet)
MyGatingSet
FileLocation <- system.file("extdata", package = "Luciernaga")
MyGates <- fread(file.path(path = FileLocation,
pattern = 'Gates.csv'))
gt(MyGates)
MyGatingTemplate <- gatingTemplate(MyGates)
gt_gating(MyGatingTemplate, MyGatingSet)
MyGatingSet[[1]]
Creating a Gating Set Unmixed
Unmixed_FullStained <- FCS_Files[grep("Unmixed", FCS_Files)]
UnmixedCytoSet <- load_cytoset_from_fcs(Unmixed_FullStained,
truncate_max_range = FALSE,
transform = FALSE)
UnmixedGatingSet <- GatingSet(UnmixedCytoSet)
UnmixedGatingSet
Markers <- colnames(UnmixedCytoSet)
KeptMarkers <- Markers[-grep(
"Time|FS|SC|SS|Original|-W$|-H$|AF", Markers)]
MyBiexponentialTransform <- flowjo_biexp_trans(channelRange = 256,
maxValue = 1000000,
pos = 4.5, neg = 0,
widthBasis = -1000)
TransformList <- transformerList(KeptMarkers, MyBiexponentialTransform)
UnmixedGatingSet <- transform(UnmixedGatingSet, TransformList)
FileLocation <- system.file("extdata", package = "Luciernaga")
UnmixedGates <- fread(file.path(path = FileLocation,
pattern = 'GatesUnmixed.csv'))
UnmixedGating <- gatingTemplate(UnmixedGates)
gt_gating(UnmixedGating, UnmixedGatingSet)
Helper Functions
Utility_ColAppend
Originally an internal function supporting our dimensionality
visualization functions, Utility_ColAppend()
appends new
data.frame columns to an .fcs file that contains the same number of rows
as a new marker/parameter. To do this, you feed the function a cytoset
object, a data.frame of the flowframe exprs data, and a data.frame of
the new data columns. Utility_ColAppend()
then adds the
appropriate parameters and returns a flowframe object with these new
parameters added.
Since it has been highly useful to us, we are mentioning it here in case you need to append additional information/alternate dimensionality visualizations to your own .fcs files. We have tried to keep the return .fcs file compliant for ease of use in other software applications, but expect bugs to arise outside internal function purpose, so if you encounter a bug with this function, please report it via GitHub.
ff <- gs_pop_get_data(UnmixedGatingSet[1], subsets="live",
inverse.transform = FALSE)
BeforeParameters <- ff[[1, returnType = "flowFrame"]]
BeforeParameters
# Main Expression Data From Original
MainDataFrame <- as.data.frame(exprs(ff[[1]]), check.names = FALSE)
# Creating Artificial Data To Mimic Metadata to Append
NewData <- MainDataFrame %>% mutate(
ExposureStatus = sample(1:3, n(), replace = TRUE))
NewData <- NewData %>% select(ExposureStatus)
AfterParameters <- Utility_ColAppend(ff=ff, DF=MainDataFrame,
columnframe = NewData)
AfterParameters
For anyone wanting to continue on and create an .fcs file from the flowframe, example code is provided below:
Utility_Downsample
For many dimensionality visualization protocols, it’s recommended
that you downsample so that each specimen has roughly similar
representation in the final plot. While this can be helpful when you
have some specimens with 10,000 cells, and others with a million cells,
it encounters issues when some specimens have millions but other samples
have just 200 cells. Consequently, deciding whether to down-sample is a
decision that you need to make and justify. We have implemented
Utility_Downsample()
to facilitate the process.
# plot(UnmixedGatingSet)
CountData <- gs_pop_get_count_fast(UnmixedGatingSet)
CountData %>% filter(Population %in% "/singletsFSC/singletsSSC/singletsSSCB/nonDebris/lymphocytes/live") %>%
select(name, Count)
#Single Sample Returned as a data.frame
removestrings <- c("DTR_2023_ILT_15_Tetramers-",
"-Ctrl_Tetramer_Unmixed", ".fcs")
SingleSample <- Utility_Downsample(UnmixedGatingSet[1],
sample.name = "GROUPNAME",
removestrings=removestrings,
subsets = "live", subsample = 2500,
internal = FALSE, export = FALSE,
inverse.transform=TRUE)
SingleSample
# Multiple Samples
MultipleSamples <- map(.x=UnmixedGatingSet, .f=Utility_Downsample,
sample.name = "GROUPNAME", removestrings=removestrings,
subsets = "live", subsample = 2500, internal = FALSE,
export = FALSE, inverse.transform=TRUE)
MultipleSamples
Utility_Concatinate
A common feature of dimensionality visualization approaches for cytometry data is concatenating different samples into a single file. This sometimes includes first downsampling to an equivalent number of cells in a particular cell subset, so as to not overly influence the result on the basis of a single cell subset. Similarly, this can be done before or after cells are normalized using one of the various algorithms.
To facilitate this process, we have implemented the
Utility_Concatinate()
function. Unlike other approaches, we
implemented it at the GatingSet level, allowing to merge only cells at
the gating node of interest into active memory. The final file can then
be saved as it’s own .fcs file. We have attempted to retain FCS format,
so that it can be used across software without issues. If you encounter
issues, please reach out!
removestrings <- c("DTR_", ".fcs")
StorageLocation <- file.path("C:", "Users", "JohnDoe", "Desktop")
#Return Types: "data.frame", "flow.frame", "fcs"
ConcatinatedReturn <- Utility_Concatinate(gs=UnmixedGatingSet,
sample.name = "GROUPNAME",
removestrings=removestrings,
subsets="live", subsample = 2000,
ReturnType = "flow.frame",
newName = "MyConcatinatedFile",
outpath = StorageLocation,
export = FALSE, inverse.transform=TRUE)
ConcatinatedReturn
Utility_tSNE
Now that we have a GatingSet containing our raw .fcs files, let’s
figure out what markers/parameters are present, and save the detectors
as “KeptMarkers”. From there, let’s identify the unstained file within
the GatingSet. After this we will run a tSNE on the provided data at the
lymphocyte gating level, using the Utility_tSNE()
function.
This function has the ability to return a .fcs file (which is what we
typically use). For the purpose of this vignette, it returns as a
FlowCore flowframe. For use with our visualization functions that work
on GatingSet objects, we transform to a cytoframe, add to a new cytoset
and then convert that into a gating set. Finally we visualize using
Utility_ThirdColorPlots()
Markers <- colnames(MyCytoSet)
KeptMarkers <- Markers[-grep("Time|FS|SC|SS|Original|-W$|-H$|AF", Markers)]
KeptMarkers
pData(MyGatingSet[[3]]) %>% pull(name)
nrow(MyGatingSet[[3]])
plot(MyGatingSet)
tSNE_Output <- Utility_tSNE(x=MyGatingSet[[3]], sample.name = "GUID",
removestrings=c("_Cells", ".fcs"),
subset = "nonDebris", columns=KeptMarkers,
export=FALSE)
cf <- flowFrame_to_cytoframe(tSNE_Output)
TheNewCS <- cytoset()
cs_add_cytoframe(cs=TheNewCS, sn="Test", cf=cf)
NewGatingSet <- GatingSet(TheNewCS)
TSNEPlot <- Utility_ThirdColorPlots(x=NewGatingSet[[1]], subset = "root",
xaxis="tSNE_1", yaxis = "tSNE_2",
zaxis ="B3-A", splitpoint = "continuous",
sample.name = "TUBENAME",
removestrings = c("Dimensionality", ".fcs"),
thecolor = "orange", tilesize = 0.6)
TSNEPlot
We can now repeat the previous step, but using our Unmixed file
GatingSet. The process is similar, identifying markers present in the
.fcs file (fluorophores in this case), selecting our markers of
interest, and from there passing to the Utility_tSNE()
function.
Markers <- colnames(UnmixedCytoSet)
KeptMarkers <- Markers[-grep("Time|FS|SC|SS|Original|-W$|-H$|AF", Markers)]
SubsetMarkers <- c("BUV496-A", "BUV805-A", "Pacific Blue-A", "BV711-A",
"BV786-A", "Spark Blue 550-A", "PE-A", "APC-Fire 750-A")
pData(UnmixedGatingSet[[3]]) %>% pull(name)
nrow(UnmixedGatingSet[[3]])
plot(UnmixedGatingSet)
removestrings <- c(".fcs")
tSNE_Output <- Utility_tSNE(x=UnmixedGatingSet[[3]], sample.name = "TUBENAME", removestrings=removestrings, subset = "live", columns=SubsetMarkers, export=FALSE)
#BUGGED: flowCore_$P42Rmax not contained in Text section!
cf <- flowFrame_to_cytoframe(tSNE_Output)
TheNewCS <- cytoset()
cs_add_cytoframe(cs=TheNewCS, sn="Test", cf=cf)
NewGatingSet <- GatingSet(TheNewCS)
Sample_TSNEPlot <- Utility_ThirdColorPlots(x=NewGatingSet[[1]], subset = "root",
xaxis="tSNE_1", yaxis = "tSNE_2",
zaxis ="Spark Blue 550-A",
splitpoint = "continuous",
sample.name = "GROUPNAME",
removestrings = removestrings,
thecolor = "orange", tilesize = 0.6)
Sample_TSNEPlot
Utility_UMAP
Now that we have a GatingSet containing our raw .fcs files, let’s
figure out what markers/parameters are present, and save the detectors
as “KeptMarkers”. From there, let’s identify the unstained file within
the GatingSet. After this we will run a tSNE on the provided data at the
lymphocyte gating level, using the Utility_UMAP()
function.
This function has the ability to return a .fcs file (which is what we
typically use). For the purpose of this vignette, it returns as a
FlowCore flowframe. For use with our visualization functions that work
on GatingSet objects, we transform to a cytoframe, add to a new cytoset
and then convert that into a gating set. Finally we visualize using
Utility_ThirdColorPlots()
Markers <- colnames(MyCytoSet)
KeptMarkers <- Markers[-grep("Time|FS|SC|SS|Original|-W$|-H$|AF", Markers)]
KeptMarkers
pData(MyGatingSet[[3]]) %>% pull(name)
nrow(MyGatingSet[[3]])
plot(MyGatingSet)
UMAP_Output <- Utility_UMAP(x=MyGatingSet[[3]], sample.name="GUID",
removestrings=c("_Cells", ".fcs"),
subset="nonDebris", columns=KeptMarkers,
export=FALSE)
cf <- flowFrame_to_cytoframe(UMAP_Output)
TheNewCS <- cytoset()
cs_add_cytoframe(cs=TheNewCS, sn="Test", cf=cf)
NewGatingSet <- GatingSet(TheNewCS)
UMAPPlot <- Utility_ThirdColorPlots(x=NewGatingSet[[1]], subset = "root",
xaxis="UMAP_1", yaxis = "UMAP_2",
zaxis ="B3-A", splitpoint = "continuous",
sample.name = "TUBENAME",
removestrings = c("Dimensionality", ".fcs"),
thecolor = "orange", tilesize = 0.3)
UMAPPlot
We can now repeat the previous step, but using our Unmixed file
GatingSet. The process is similar, identifying markers present in the
.fcs file (fluorophores in this case), selecting our markers of
interest, and from there passing to the Utility_UMAP()
function.
Markers <- colnames(UnmixedCytoSet)
KeptMarkers <- Markers[-grep("Time|FS|SC|SS|Original|-W$|-H$|AF", Markers)]
SubsetMarkers <- c("BUV496-A", "BUV805-A", "Pacific Blue-A", "BV711-A",
"BV786-A", "Spark Blue 550-A", "PE-A", "APC-Fire 750-A")
pData(UnmixedGatingSet[[3]]) %>% pull(name)
nrow(UnmixedGatingSet[[3]])
plot(UnmixedGatingSet)
removestrings <- c(".fcs")
UMAP_Output <- Utility_UMAP(x=UnmixedGatingSet[[3]], sample.name="GUID", removestrings=removestrings, subset="live", columns=SubsetMarkers, export=FALSE)
cf <- flowFrame_to_cytoframe(UMAP_Output)
TheNewCS <- cytoset()
cs_add_cytoframe(cs=TheNewCS, sn="Test", cf=cf)
NewGatingSet <- GatingSet(TheNewCS)
Sample_UMAPPlot <- Utility_ThirdColorPlots(x=NewGatingSet[[1]], subset = "root",
xaxis="UMAP_1", yaxis = "UMAP_2",
zaxis ="Spark Blue 550-A",
splitpoint = "continuous",
sample.name = "GROUPNAME",
removestrings = c("Dimensionality", ".fcs"),
thecolor = "orange", tilesize = 0.3)
Sample_UMAPPlot
Utility_PaCMAP
Unlike the other dimensionality visualization algorithms that are
implemented in R, both PaCMAP and PHATE are primarily implemented in
Python. Utilizing basilisk
package, we implement method to
facilitate the plot generation isolated within the Luciernaga
environment. To do this, you would need to install either the devel or
WithPython branches when downloading from GitHub.
Repeating similar steps as with the tSNE and UMAP examples above for both raw and unmixed data:
Markers <- colnames(MyCytoSet)
KeptMarkers <- Markers[-grep("Time|FS|SC|SS|Original|-W$|-H$|AF", Markers)]
KeptMarkers
pData(MyGatingSet[[3]]) %>% pull(name)
nrow(MyGatingSet[[3]])
plot(MyGatingSet)
PaCMAP_Output <- Utility_PaCMAP(x=MyGatingSet[[3]], sample.name="GUID",
removestrings=c("_Cells", ".fcs"),
subset="nonDebris", columns=KeptMarkers,
export=FALSE)
cf <- flowFrame_to_cytoframe(PaCMAP_Output)
TheNewCS <- cytoset()
cs_add_cytoframe(cs=TheNewCS, sn="Test", cf=cf)
NewGatingSet <- GatingSet(TheNewCS)
PaCMAPPlot <- Utility_ThirdColorPlots(x=NewGatingSet[[1]], subset = "root",
xaxis="PaCMAP_1", yaxis = "PaCMAP_2",
zaxis ="B3-A", splitpoint = "continuous",
sample.name = "TUBENAME",
removestrings = c("Dimensionality", ".fcs"),
thecolor = "orange", tilesize = 0.3)
PaCMAPPlot
Markers <- colnames(UnmixedCytoSet)
KeptMarkers <- Markers[-grep("Time|FS|SC|SS|Original|-W$|-H$|AF", Markers)]
SubsetMarkers <- c("BUV496-A", "BUV805-A", "Pacific Blue-A", "BV711-A",
"BV786-A", "Spark Blue 550-A", "PE-A", "APC-Fire 750-A")
pData(UnmixedGatingSet[[3]]) %>% pull(name)
nrow(UnmixedGatingSet[[3]])
plot(UnmixedGatingSet)
removestrings <- c(".fcs")
PaCMAP_Output <- Utility_PaCMAP(x=UnmixedGatingSet[[3]], sample.name="GUID",
removestrings=removestrings, subset="live",
columns=SubsetMarkers, export=FALSE)
cf <- flowFrame_to_cytoframe(PaCMAP_Output)
TheNewCS <- cytoset()
cs_add_cytoframe(cs=TheNewCS, sn="Test", cf=cf)
NewGatingSet <- GatingSet(TheNewCS)
Sample_PaCMAPPlot <- Utility_ThirdColorPlots(x=NewGatingSet[[1]], subset = "root",
xaxis="PaCMAP_1", yaxis = "PaCMAP_2",
zaxis ="Spark Blue 550-A",
splitpoint = "continuous",
sample.name = "GROUPNAME",
removestrings = c("Dimensionality", ".fcs"),
thecolor = "orange", tilesize = 0.3)
Sample_PaCMAPPlot
Utility_PHATE
And similarly for PHATE, with downsampling for brevity:
Markers <- colnames(MyCytoSet)
KeptMarkers <- Markers[-grep("Time|FS|SC|SS|Original|-W$|-H$|AF", Markers)]
KeptMarkers
pData(MyGatingSet[[3]]) %>% pull(name)
nrow(MyGatingSet[[3]])
plot(MyGatingSet)
Phate_Output <- Utility_Phate(x=MyGatingSet[[3]], sample.name="GUID",
removestrings=c("_Cells", ".fcs"),
subset="nonDebris", subsample=15000,
columns=KeptMarkers,
export=FALSE)
cf <- flowFrame_to_cytoframe(Phate_Output)
TheNewCS <- cytoset()
cs_add_cytoframe(cs=TheNewCS, sn="Test", cf=cf)
NewGatingSet <- GatingSet(TheNewCS)
PhatePlot <- Utility_ThirdColorPlots(x=NewGatingSet[[1]], subset = "root",
xaxis="Phate_1", yaxis = "Phate_2",
zaxis ="B3-A", splitpoint = "continuous",
sample.name = "TUBENAME",
removestrings = c("Dimensionality", ".fcs"),
thecolor = "orange", tilesize = 0.0001)
PhatePlot
Markers <- colnames(UnmixedCytoSet)
KeptMarkers <- Markers[-grep("Time|FS|SC|SS|Original|-W$|-H$|AF", Markers)]
SubsetMarkers <- c("BUV496-A", "BUV805-A", "Pacific Blue-A", "BV711-A",
"BV786-A", "Spark Blue 550-A", "PE-A", "APC-Fire 750-A")
pData(UnmixedGatingSet[[3]]) %>% pull(name)
nrow(UnmixedGatingSet[[3]])
plot(UnmixedGatingSet)
removestrings <- c(".fcs")
Phate_Output <- Utility_Phate(x=UnmixedGatingSet[[3]], sample.name="GUID",
removestrings=removestrings, subset="live",
subsample=5000,
columns=SubsetMarkers, export=FALSE)
cf <- flowFrame_to_cytoframe(Phate_Output)
TheNewCS <- cytoset()
cs_add_cytoframe(cs=TheNewCS, sn="Test", cf=cf)
NewGatingSet <- GatingSet(TheNewCS)
Sample_PhatePlot <- Utility_ThirdColorPlots(x=NewGatingSet[[1]], subset = "root",
xaxis="Phate_1", yaxis = "Phate_2",
zaxis ="Spark Blue 550-A",
splitpoint = "continuous",
sample.name = "GROUPNAME",
removestrings = c("Dimensionality", ".fcs"),
thecolor = "orange", tilesize = 0.0001)
Sample_PhatePlot