Skip to contents

Abstract

When working with high-dimensional cytometry data, whether from Spectral Flow Cytometry (SFC) or Mass Cytometry (MC), how many clusters of cells are present given certain markers is often a question that arises. “It depends” is the frequent non-commital answer. Coereba is an R package that attempts to provide tools to qualitatively and quantitatively assess the variation present for individual panels. It additionally leverages a semi-supervised analytical approach to enable working with datasets where batch effects limit current unsupervised approaches that rely heavily on consistent MFI across individual experiments and files.

Introduction

Historically, manual (supervised) gating has been widely used in cytometry, especially conventional flow cytometry (CFC) utilizing commercial software with graphical user interfaces, with hierarchical boolean gates drawn around cell populations of interest, sequentially filtering down to smaller subsets of cells that are of interest to the researcher. This approach has encountered limitations given the increase in resolvable markers (initially by mass cytometry (MC), more recently by spectral flow ctyometry (SFC)), and unsupervised methods utilizing algorithms to cluster cells based on median fluorescent intensity for individual markers have become increasingly widespread. By contrast, determining number of biological cell clusters present remains a concern for these methods, with individual methods outcomes dependent on the decisions driven by fully-algorithmic clusterings. Since these are dependent on the MFI values being consistent across individuals and experiments, they are susceptible to batch effects.

This is particularly the case with SFC data, which although comparable marker coverage as MC datasets, can collect far large number of cellular events. It is susceptible to loss resolution when improper unmixing controls are used, or when fluorophores degrade, additional autofluorescences are present. The combination of these result in unique challenges that we have noted elsewher. Alternative approaches implementing boolean gating templates checked by algorithms are in active development but struggle from individual variation due to biological heterogeneity, instrumental and experimental artifact.

Coereba is a collection of tools that takes a hybrid approach. Compared with traditional manual approaches that focus gating marker combinations at the single-individual before proceeding to the next, Coereba implements visualization across all specimens within a single view, and implements semi-supervised gating. It then allows for validation/correction at the individual level. The iteration is then over additional markers in the panel. From the sum of the gates across all markers on the basis of individual specimens, individual cells identities given markers of interest can be derrived and quantified using algorithmic tools. T

This extends the ability of manual gating to enable handling a greater number of markers, while allowing flexibility to handle batch effects and variation that algorithms dependent on normalized MFI would struggle to correctly classify. From this hybrid approach, we implement functions that showcase some novel uses in answering questions that neither fully supervised or unsupervised methods, both open-source or commercial, have been able to address. This package does not claim to be solution for all cytometry analysis, but we hope it can be a stepping stone to realizing the underlying problems that the current paradigms have, so that the next generation of developers can benefit from the knowledge base.

Getting Started

Installing Coereba

We are in the process of preparing Coereba for submission to Bioconductor later this year, for the moment however, it is available to install via GitHub.

if(!require("remotes")) {install.packages("remotes")}
remotes::install_github("https://github.com/DavidRach/Coereba")

# install.packages("BiocManager")
# BiocManager::install("Coereba")

Loading Libraries

Coereba relies on underlying infrastructure provided flowWorkspace and other Bioconductor cytometry packages. It leverages functional programming principles implemented in tidyverse packages available via CRAN. It is important to make sure that all the required packages are installed on your computer, and that library is called for each.

Locating .fcs files

To get started, you will first need to provide the location on your computer where the .fcs files of interest are being stored. An example of how the author does this is provided below and can be modified for your own user and desired computer folder.

File_Location <- file.path("C:", "Users", "JohnDoe", "Desktop", "TodaysExperiment")
FCS_Pattern <- ".fcs$"
FCS_Files <- list.files(path = File_Location, pattern = FCS_Pattern, full.names = TRUE, recursive = FALSE)

For this vignette, we will using several small .fcs files that can be found in Coereba’s extdata folder for our example.

File_Location <- system.file("extdata", package = "Coereba")
FCS_Pattern <- ".fcs$"
FCS_Files <- list.files(path = File_Location, pattern = FCS_Pattern, full.names = TRUE, recursive = FALSE)
head(FCS_Files, 3)
#> [1] "C:/Users/12692/AppData/Local/R/win-library/4.4/Coereba/extdata/DTR_2023_ILT_15_Tetramers-INF071-Ctrl_Tetramer_Unmixed.fcs"
#> [2] "C:/Users/12692/AppData/Local/R/win-library/4.4/Coereba/extdata/DTR_2023_ILT_15_Tetramers-INF149-Ctrl_Tetramer_Unmixed.fcs"
#> [3] "C:/Users/12692/AppData/Local/R/win-library/4.4/Coereba/extdata/DTR_2023_ILT_15_Tetramers-INF179-Ctrl_Tetramer_Unmixed.fcs"

The folder contains three unmixed full-stained samples from a 29-color SFC panel, characterizing Innate-like T cells (ILT) in cord blood mononuclear cells (CBMCs). As rare circulating cells, the original .fcs files contained many cellular events, which makes them useful to show off Coereba’s capabilities at characterizing cells that were not the original focus.

Creating a GatingSet for Unmixed .fcs files

Working with the unmixed full-stained samples mentioned above, we will proceed by bringing the .fcs files found within Coereba’s extdata folder into a GatingSet object where we can interact with them.

UnmixedFCSFiles <- FCS_Files[c(1:3)]
UnmixedCytoSet <- load_cytoset_from_fcs(UnmixedFCSFiles, truncate_max_range = FALSE, transformation = FALSE)
UnmixedGatingSet <- GatingSet(UnmixedCytoSet)
UnmixedGatingSet
#> A GatingSet with 3 samples

Now that we have created a GatingSet object, we will identify the markers/fluorophores present within the .fcs files, and exclude markers from the list don’t require transformation (as is the case with FSC, SSC). We will then bi-exponentially transform the data using flowWorkspace flowWorkspace::flowjo_biexp_trans() before applying a openCytogating template that was customized for this particular panel, which can be found in the extdata folder.

Markers <- colnames(UnmixedCytoSet)
KeptMarkers <- Markers[-grep("Time|FS|SC|SS|Original|-W$|-H$|AF", Markers)]

MyBiexponentialTransform <- flowjo_biexp_trans(channelRange = 256, maxValue = 1000000, pos = 4.5, neg = 0, widthBasis = -1000)
TransformList <- transformerList(KeptMarkers, MyBiexponentialTransform)
UnmixedGatingSet <- transform(UnmixedGatingSet, TransformList)

UnmixedGates <- fread(file.path(path = File_Location, pattern = 'GatesUnmixed.csv'))
UnmixedGating <- gatingTemplate(UnmixedGates)
#> Adding population:singletsFSC
#> Adding population:singletsSSC
#> Adding population:singletsSSCB
#> Adding population:nonDebris
#> Adding population:lymphocytes
#> Adding population:live
gt_gating(UnmixedGating, UnmixedGatingSet)
#> Gating for 'singletsFSC'
#> done!
#> done.
#> Gating for 'singletsSSC'
#> done!
#> done.
#> Gating for 'singletsSSCB'
#> done!
#> done.
#> Gating for 'nonDebris'
#> done!
#> done.
#> Gating for 'lymphocytes'
#> The prior specification has no effect when usePrior=no
#> Using the serial version of flowClust
#> The prior specification has no effect when usePrior=no
#> Using the serial version of flowClust
#> The prior specification has no effect when usePrior=no
#> Using the serial version of flowClust
#> done!
#> done.
#> Gating for 'live'
#> done!
#> done.
#> finished.
plot(UnmixedGatingSet)

We can additionally verify that the gating of the cell populations of interest was correct, by visualizing the gates using ggcyto or the Luciernaga package.

library(Luciernaga)

MyPlot <- Utility_GatingPlots(x=UnmixedGatingSet[1], sample.name=c("GROUPNAME", "TUBENAME"), removestrings=".fcs", subset="live", gtFile=UnmixedGates, DesiredGates=NULL, outpath = getwd(), export=FALSE, therows=3, thecolumns=2)

MyPlot
#> $`1`

Coereba_GateCutoffs

Having set up a GatingSet with transformations and gates applied, we can proceed to test the Coereba workflow. The first step is to create a .csv file, containing the individual specimens, and the individual markers of interest. Each cell within the .csv file will correspond to the MFI for a given marker for a given individual where positive expression of a marker transitions to becoming a negative (referred here to as a splitpoint).

These could be written out individually, or based on automated calculations. To simplify the process, in Coereba, we have implemented two approaches to estimating these splitpoints. One function is the in-house Coereba_GateCutoffs(), which does an “okay-ish but work-in progress”. The flip approach is leveraging the openCyto pipeline (in development) and retrieving the values.

Both approaches can be computationally heavy, but still faster than eye-balling everything and typing in the values one-by-one. Save the eyestrain for subsequent validation steps.

TheGateCutoff <- Coereba_GateCutoffs(x=UnmixedGatingSet[1],
  subset="live", sample.name="GROUPNAME", remove.strings=".fcs",
  marker=c("BUV805-A"))

TheGateCutoffs <- map(.x=UnmixedGatingSet[1:3], .f=Coereba_GateCutoffs,
  subset="live", sample.name="GROUPNAME", remove.strings=".fcs",
  marker=c("BUV805-A")) %>% bind_rows()
TheGateCutoffs
#>   specimen BUV805-A
#> 1   INF071      104
#> 2   INF149       94
#> 3   INF179       95

Visualizing Gate Cutoffs

Once we have generated a GateCutoffs.csv file, we can visualize how well the splitpoints were by plotting them as red-lines using the Luciernaga packages Utility_NbyNPlots() to visualize the markers for a given individual and Utility_UnityPlots() to visualize the markers across individuals. This a good way of understanding how well the competing gating-algorithms did in the context of your individual panel, and how much effort will be required for the next step in correcting the gates individually.

Coereba_App

Previously, all adjustments to the above splitpoints had to eye-balled in the pdf, and then adjusted in the .csv file, and repeated ad nauseum. This obviously did not scale well. We have subsequently created a ShinyApp that will bring in the GatingSet object, and plot an interactive version of Utility_UnityPlot() from the Luciernaga package, visualizing the splitpoint for a given marker across all individuals as red vertical lines.

To get started, first make sure you know where the Coereba_GateCutoffs() .csv output is saved at, as you will need to select the file within the App.

File_Location <- system.file("extdata", package = "Coereba")
TheCSV <- file.path(File_Location, "GateCutoffsForNKs.csv")
TheCSVData <- read.csv(TheCSV, check.names=FALSE)
head(TheCSVData)
#>   specimen BUV395-A BUV496-A BUV563-A BUV615-A BUV661-A BUV737-A BUV805-A
#> 1   INF071       90      105      155      100      100      105       95
#> 2   INF149       90      100      170      105       95      100      100
#> 3   INF179       95      110      165      105       95      115      100
#>   BV421-A Pacific Blue-A BV480-A BV510-A BV605-A BV650-A BV711-A BV750-A
#> 1     100            105      80     120      90     135     120     150
#> 2      95             95      80     110      90     140     115     150
#> 3     100             95      80     110      95     145     115     150
#>   BV786-A Alexa Fluor 488-A Spark Blue 550-A PerCP-Cy5.5-A PE-A PE-Cy5-A
#> 1     130                90               95           105  110      125
#> 2     120                90               95           100  105      110
#> 3     135                95              100           105  105      115
#>   PE-Dazzle594-A PE-Vio770-A APC-A Alexa Fluor 647-A APC-R700-A Zombie NIR-A
#> 1             95         120   115               150        142          106
#> 2             94         115   115               130        146          104
#> 3             95         120   110               130        142          104
#>   APC-Fire 750-A APC-Fire 810-A
#> 1            100            102
#> 2            100            100
#> 3            100            102

Next, make sure to remember the name of your GatingSet (“UnmixedGatingSet” for this case), and the sample.name keyword to identify individual specimens, as you will need to input these values within the app.

Then, run the following command in your console:

The first tab view will let you upload the GateCutoff.csv file generated by the previous function, by letting you navigate to the storage folder and clicking on it.

Once this is done, navigate to the Upload a GatingSet tab. Enter the information about your GatingSet (“UnmixedGatingSet” for this example), select your x and y parameters, enter the SampleName keyword. Desired bins allows adjustments to the visualization given the number of cells present, clearance multiplier adjust the margins wiggle-room, removestrings arguments. You specify margin and gate subsets like you would for Utility_UnityPlots() and click Display Estimated Gate Cutoffs. Finally, you click on Generate Plots and go retrieve your coffee/tea/beverage-of-choice.

Upon loading of the plots, scroll until you encounter a splitpoint line that was placed incorrectly, you can simply click on the correct location of the axis, which will save the coordinate as a splitpoint for that given marker and individual to a data.frame. This will later be ported to update the corresponding splitpoint value for a given marker and individual in the exisiting GateCutoff .csv. By just clicking, you can therefore quickly correct cases where the algorithm failed to gate properly due to batch effects, bad staining, or algorithmic incompetence (sorry, the coders are still learning as well).

Once you have corrected some/all the markers, navigate to the Click Data tab. You can then export the click-data information by specifying a filename and providing an outpath to the folder you want to save at. Unfortunately, due to security issues in ShinyApps, you will need to type out the designated outpath manually.

And repeat until the markers you are interested have been validated and you have a folder of clickpoint .csvs awaiting conversion into the existing GateCutoff folder.

Since the code is within a ShinyApp running in R, it has both the advantages and disadvantages of a R Shiny App. It scales reasonably well, but can take a while to load the next iterated marker. During testing, loading a new marker to inspect took anywhere from 3-10 additional minutes. This area is a active work in progress to speed up, but the author/maintainer hasn’t had the time to finish reading Mastering Shiny by Hadley Wickham yet or implement it in Rust, if you have, feel free to assist!

Coereba_UpdateGates

Once you have completed validating the GateCutoffs with Coereba_App(), it is time to convert your validated click-data and update the GateCutoff.csv. To fascilitate this, provide the location of the folder containing just the clickpoint .csv’s to Coereba_UpdateGates() and fill out the required arguments. It will proceed to update the GateCutoffs.

For this example, we will use ClickDataExamples.csv stored within Coereba’s extdata folder:

File_Location <- system.file("extdata", package = "Coereba")
TheOldCSV <- file.path(File_Location, "GateCutoffsForNKs.csv")
TheClickInfo <- file.path(File_Location, "ClickDataExample.csv")
TheClickData <- read.csv(TheClickInfo, check.names=FALSE)
TheClickData
#>   Plot_Name  X_Label X_Coordinate                       Time
#> 1    INF071 BUV496-A          106  2024-10-16 15:26:43.98856
#> 2    INF149 BUV496-A          104 2024-10-16 15:26:47.052693
#> 3    INF179 BUV496-A          114 2024-10-16 15:26:50.278519

The old data:

TheOldData <- read.csv(TheOldCSV, check.names=FALSE)
TheOldData
#>   specimen BUV395-A BUV496-A BUV563-A BUV615-A BUV661-A BUV737-A BUV805-A
#> 1   INF071       90      105      155      100      100      105       95
#> 2   INF149       90      100      170      105       95      100      100
#> 3   INF179       95      110      165      105       95      115      100
#>   BV421-A Pacific Blue-A BV480-A BV510-A BV605-A BV650-A BV711-A BV750-A
#> 1     100            105      80     120      90     135     120     150
#> 2      95             95      80     110      90     140     115     150
#> 3     100             95      80     110      95     145     115     150
#>   BV786-A Alexa Fluor 488-A Spark Blue 550-A PerCP-Cy5.5-A PE-A PE-Cy5-A
#> 1     130                90               95           105  110      125
#> 2     120                90               95           100  105      110
#> 3     135                95              100           105  105      115
#>   PE-Dazzle594-A PE-Vio770-A APC-A Alexa Fluor 647-A APC-R700-A Zombie NIR-A
#> 1             95         120   115               150        142          106
#> 2             94         115   115               130        146          104
#> 3             95         120   110               130        142          104
#>   APC-Fire 750-A APC-Fire 810-A
#> 1            100            102
#> 2            100            100
#> 3            100            102

We can then run Coereba_UpdateGates() and watch the values for BUV469-A are updated.

UpdatedCSV <- Coereba_UpdateGates(Clicks=TheClickInfo, Old=TheOldCSV,
  export=FALSE, outpath=NULL, fileName="UpdatedCSV")

UpdatedCSV
#>   specimen BUV395-A BUV496-A BUV563-A BUV615-A BUV661-A BUV737-A BUV805-A
#> 1   INF071       90      106      155      100      100      105       95
#> 2   INF149       90      104      170      105       95      100      100
#> 3   INF179       95      114      165      105       95      115      100
#>   BV421-A Pacific Blue-A BV480-A BV510-A BV605-A BV650-A BV711-A BV750-A
#> 1     100            105      80     120      90     135     120     150
#> 2      95             95      80     110      90     140     115     150
#> 3     100             95      80     110      95     145     115     150
#>   BV786-A Alexa Fluor 488-A Spark Blue 550-A PerCP-Cy5.5-A PE-A PE-Cy5-A
#> 1     130                90               95           105  110      125
#> 2     120                90               95           100  105      110
#> 3     135                95              100           105  105      115
#>   PE-Dazzle594-A PE-Vio770-A APC-A Alexa Fluor 647-A APC-R700-A Zombie NIR-A
#> 1             95         120   115               150        142          106
#> 2             94         115   115               130        146          104
#> 3             95         120   110               130        142          104
#>   APC-Fire 750-A APC-Fire 810-A
#> 1            100            102
#> 2            100            100
#> 3            100            102

We can further validate this by re-running Luciernaga Utility_UnityPlots() using the updated GateCutoff.csv and seeing if the previous issues have been corrected.

Note on Cell Populations and Splitpoint

During development, we have noticed that for SFC data, the splitpoint location can vary substantially for individual cell populations (B cells, NK cells, T cells, etc), even in the abscence of biology. We believe this is due to uncertainty in the unmixing, as we have observed a correlation of negative splitpoint MFI in relation to the individual cells kappa value (matrix complexity essentially). Try your best to have the gate reflect what you are interested in, reduce the amount of error as much as you can, and in numbers veritas. Keep track of the exceptions (more later) and write up a Cyto paper.

How Coereba works

Having a GateCutoff.csv containing the splitpoint information for each marker validated for each individual is immensely useful, allowing you to do many things you wouldn’t be capable of otherwise. Namely, we can for an individual anotate each cell within their .fcs file.

What do I mean? Let’s first think about all the cells circulating in an individuals bloostream. We don’t have any prior information in this example, but we want to profile these cells into clusters of similar cells. On one end of the spectra, there is a single cluster that contains every single cell from the sample, regardless of their marker expression. At the opposite end of the spectrum, every single cell clusters in an it’s own cluster with just itself. In between these two extremes, depending on our question, lies a meaningful cluster number that will match the underlying biology to reduce the amount of variance that lies in clumping/splitting populations needlessly.

Coereba takes the approacch by individually annotating an individuals cells one by one, on the basis of where their MFI value lands related to the validated splitpoint. So if the splitpoint for FITC is at 50, and the individual cell MFI for FITC is 80, it returns as FITC_pos. If the splitpoint for APC is at 70, and the individual cell MFI is 30, it returns as APC_Neg.

When you iterate over the splitpoints for each marker, each cell derrives an identity. FITC_pos-APC_Neg-BV421_Pos…. etc. We can then group cells with matching identies and derrive information from these terminal nodes.

Individually, this may not mean much. After all, dichotomous splitting of a 30-marker panel is around half-a billion potential terminal nodes or clusters. Similarly, cytometers have instrumental, batch, experimental noise, etc.

But when we scale to all the events collected within a typical sample, we get far fewer terminal nodes, in the hundred to lower thousand range for a 30-color panel. The reason is cells (and the panels by which we investigate them) are not uniformely distributed by markers across high-dimensional space, cells fall within specific phenotypes (cell types) that undergo division of similar clones.

Consequently, we can iteratively leverage the number of markers included to target questions of interest, broad or narrow scope, in context of our individual panels, and rely on methods described below to gain insight into biological systems.

Utility_Coereba

Utility_Coereba() is the function that implements the above approach, labelling for each individual specimen the individual cells on the basis of their splitpoint identity, and then returning the enumeration of the resulting clusters. It iterates over the different specimens in the GatingSet, returning information for every individual that can be used in subsequent steps. MultiReach() is the older version of the proccess.

Both functions are currently waiting on the ILT paper to be sent to collaborators so that David can implement his 5-subject binder worth of notes to implement a S4 OOP class to make manipulating the final outputs easier than the current .qmd files full of tidyverse data tidying.

CoerebaIDs <- Utility_Coereba(x=UnmixedGatingSet[1], subsets="live",
  sample.name="GROUPNAME", reference=TheCSV, starter="Spark Blue 550-A")
head(CoerebaIDs, 5)
#>     Time    SSC-W   SSC-H   SSC-A    FSC-W   FSC-H   FSC-A  SSC-B-W SSC-B-H
#> 1 489151 694896.9 1177500 1363735 662892.6 1401655 1548578 669076.8  709453
#> 2 635509 677491.4 1338055 1510868 657557.9 1233853 1352216 668161.4  884191
#> 3 120155 667338.2 1088296 1210436 674967.2 1192673 1341692 643509.0  824423
#> 4 618423 663132.6 1000850 1106160 672943.4 1502920 1685634 655194.3  624452
#> 5 819955 705859.9 1006763 1184389 667948.1 1483832 1651871 683451.8  695573
#>    SSC-B-A    BUV395-A   BUV496-A  BUV563-A  BUV615-A  BUV661-A   BUV737-A
#> 1 791130.9  1413.02148  1389.7384 19557.074 -638.5002  804.1327 1086.85168
#> 2 984637.2   555.09723  -836.7988 22731.787 -170.2382  409.6551  -67.00184
#> 3 884206.0  1078.77063 90767.8750  5996.069 -720.3354  724.7059 1675.47351
#> 4 681895.6 10403.56543 -1566.0366  4183.058 1039.9783  299.8669  474.69431
#> 5 792317.7    97.86935  -120.9047 30680.348 -559.2359 -444.4741 1276.98914
#>     BUV805-A   BV421-A Pacific Blue-A    BV480-A  BV510-A     BV605-A   BV650-A
#> 1  -209.6748  860.4299     22337.6641  -481.8431 91305.57   893.18616 10386.127
#> 2  -170.9810  827.4207     16259.1738  1185.9927 68115.00   539.20172 33165.855
#> 3   515.8447 6500.4741       174.5799  2141.1963 59365.43    20.77773 15070.970
#> 4 37031.8125 6435.6250       587.8873 -1033.0054  1139.49 -1223.56738  9815.647
#> 5 -1093.8087  502.5733     17365.7285  -210.7086 79221.05   304.48538 14291.300
#>       BV711-A   BV750-A   BV786-A Alexa Fluor 488-A Spark Blue 550-A
#> 1    559.8732 3667.3572 22802.184          699.9141         364.2365
#> 2   -302.7471 4064.6692 28469.223         -688.5756         720.5591
#> 3  50595.2422 2971.7854  2358.043          393.8389       17707.6602
#> 4 156903.0781  407.5031  5380.509        -1023.9891       27055.3672
#> 5   -183.1048 3351.0212 17993.840         -706.1895         258.5292
#>   PerCP-Cy5.5-A       PE-A PE-Dazzle594-A  PE-Cy5-A PE-Vio770-A     APC-A
#> 1     1075.7617   862.5351      1339.5979  505.8365   1125.7090  107.1248
#> 2     1223.7046  2884.0522       598.5916 4258.1323   2320.9243 2740.9094
#> 3     7140.4067 37509.9492       176.0248 1513.7799    801.8569  182.4485
#> 4     8106.1313  3367.9668       153.3160 1452.1102   2121.6941 -593.1661
#> 5      830.9669  1437.2294       769.9387 8262.6836   1377.1327  759.8859
#>   Alexa Fluor 647-A APC-R700-A Zombie NIR-A APC-Fire 750-A APC-Fire 810-A
#> 1          27.81903   3060.771    1198.2706       244.2509      51100.332
#> 2       -2298.00781  43118.539    3153.7075      -241.0270      10196.429
#> 3        -392.17029   5435.576    1092.3571     29247.9004       7603.124
#> 4         305.19461   7350.345     140.0913     44089.8281      38608.691
#> 5         547.90692   2891.267     868.9808       377.6864     123643.883
#>       AF-A Original_ID
#> 1 1539.555      993788
#> 2 4283.683     1295630
#> 3 1599.856      240648
#> 4 6442.587     1260488
#> 5 5996.440     1679432
#>                                                                                                                                                                                                                                                                                                Cluster
#> 1 SparkBlue550negBUV395negBUV496negBUV563negBUV615negBUV661negBUV737negBUV805negBV421negPacificBlueposBV480posBV510posBV605negBV650negBV711negBV750negBV786posAlexaFluor488negPerCPCy55negPEnegPEDazzle594negPECy5negPEVio770negAPCnegAlexaFluor647negAPCR700negZombieNIRnegAPCFire750negAPCFire810pos
#> 2 SparkBlue550negBUV395negBUV496negBUV563negBUV615negBUV661negBUV737negBUV805negBV421negPacificBlueposBV480posBV510posBV605negBV650posBV711negBV750negBV786posAlexaFluor488negPerCPCy55negPEnegPEDazzle594negPECy5negPEVio770negAPCnegAlexaFluor647negAPCR700posZombieNIRnegAPCFire750negAPCFire810pos
#> 3 SparkBlue550posBUV395negBUV496posBUV563negBUV615negBUV661negBUV737negBUV805negBV421posPacificBluenegBV480posBV510posBV605negBV650negBV711posBV750negBV786negAlexaFluor488negPerCPCy55posPEposPEDazzle594negPECy5negPEVio770negAPCnegAlexaFluor647negAPCR700negZombieNIRnegAPCFire750posAPCFire810pos
#> 4 SparkBlue550posBUV395posBUV496negBUV563negBUV615negBUV661negBUV737negBUV805posBV421posPacificBluenegBV480posBV510negBV605negBV650negBV711posBV750negBV786negAlexaFluor488negPerCPCy55posPEnegPEDazzle594negPECy5negPEVio770negAPCnegAlexaFluor647negAPCR700negZombieNIRnegAPCFire750posAPCFire810pos
#> 5 SparkBlue550negBUV395negBUV496negBUV563negBUV615negBUV661negBUV737negBUV805negBV421negPacificBlueposBV480posBV510posBV605negBV650negBV711negBV750negBV786posAlexaFluor488negPerCPCy55negPEnegPEDazzle594negPECy5posPEVio770negAPCnegAlexaFluor647negAPCR700negZombieNIRnegAPCFire750negAPCFire810pos
#>   GROUPNAME
#> 1    INF071
#> 2    INF071
#> 3    INF071
#> 4    INF071
#> 5    INF071

Filtering Coereba Clusters

Some things are literally poisson noise. Others are cell clusters with varying abundance across heteregenous human patients. Others are individually unique terminal nodes that can be the result of individual biology…. or unmixing errors … or your lab tech messing up the staining panel. You can leverage the filter functions to identify all of these quickly.

Coereba_MarkerExpressions

Having generated a Marker by Cluster data.frame (binaryData) and a Cluster by Specimen data.frame (dataData), we can leverage the combination of the two data.frames to aggregate the clusters on the basis of marker presence, returning the proportion of of cells that are positive for a given marker.

For this demonstration, we will focus on NKTs, using stored .csv files for their binary and data outputs stored within the Coereba extdata folder.

File_Location <- system.file("extdata", package = "Coereba")

panelPath <- file.path(File_Location, "ILTPanelTetramer.csv")
panelData <- read.csv(panelPath, check.names=FALSE)

binaryPath <- file.path(File_Location, "HeatmapExample.csv")
binaryData <- read.csv(binaryPath, check.names=FALSE)

dataPath <- file.path(File_Location, "ReadyFileExample.csv")
dataData <- read.csv(dataPath, check.names=FALSE)

We will first just return the marker expressions for all markers:

library(Coereba)

AllMarkers <- Coereba_MarkerExpressions(data=dataData, binary=binaryData,
  panel=panelData, starter="SparkBlue550")
  
head(AllMarkers, 5)
#>      bid  ptype ART   Viral artexposure infant_sex screen_vl isc_artreg CD3
#> 1 INF018     HU  HU NoViral     Limited       Male         0          0   1
#> 2 INF028     HU  HU NoViral     Limited       Male         0          0   1
#> 3 INF039 HEU-hi HEU   Viral     Limited     Female     41638          1   1
#> 4 INF041 HEU-lo HEU NoViral  Prolongued     Female         0          1   1
#> 5 INF052 HEU-hi HEU   Viral     Limited       Male     15569          1   1
#>          CD16      CD27      CD38     CD107a hCD1d       hMR1      CD62L
#> 1 0.000000000 0.9565217 0.2608696 0.00000000     1 0.00000000 0.08695652
#> 2 0.005586592 0.9720670 0.4301676 0.03910615     1 0.08938547 0.06703911
#> 3 0.000000000 0.8947368 0.5263158 0.00000000     1 0.00000000 0.10526316
#> 4 0.012500000 0.9875000 0.4375000 0.02500000     1 0.00000000 0.15000000
#> 5 0.000000000 0.9259259 0.2716049 0.00000000     1 0.00000000 0.17283951
#>          CD8       CD69      CCR4         Vd2      CXCR3       CD4     CD127
#> 1 0.00000000 0.00000000 0.6956522 0.000000000 0.04347826 1.0000000 0.8695652
#> 2 0.04469274 0.01117318 0.7877095 0.005586592 0.01117318 0.8994413 0.9608939
#> 3 0.10526316 0.05263158 0.5789474 0.000000000 0.00000000 0.8947368 0.9473684
#> 4 0.03750000 0.03750000 0.6875000 0.000000000 0.01250000 0.9000000 0.9125000
#> 5 0.02469136 0.02469136 0.6543210 0.000000000 0.01234568 0.9506173 0.9259259
#>       CD161     CD45RA   CD56       CCR7       CD7        IFNg       CCR6
#> 1 0.6956522 0.04347826 0.0000 0.08695652 1.0000000 0.000000000 0.04347826
#> 2 0.5977654 0.11173184 0.0000 0.02793296 1.0000000 0.005586592 0.13966480
#> 3 0.4210526 0.26315789 0.0000 0.15789474 1.0000000 0.105263158 0.10526316
#> 4 0.6750000 0.21250000 0.0125 0.01250000 1.0000000 0.000000000 0.13750000
#> 5 0.4691358 0.06172840 0.0000 0.08641975 0.9876543 0.000000000 0.08641975
#>        NKG2D      CD25   TNFa       PD1 DumpChannel      CD26   Viability
#> 1 0.08695652 0.8695652 0.0000 0.7391304   0.0000000 0.6521739 0.000000000
#> 2 0.11173184 0.8156425 0.0000 0.6871508   0.0000000 0.5977654 0.005586592
#> 3 0.31578947 0.5263158 0.0000 0.5789474   0.1052632 0.3157895 0.000000000
#> 4 0.08750000 0.6250000 0.0375 0.7875000   0.0000000 0.6500000 0.012500000
#> 5 0.03703704 0.6790123 0.0000 0.6790123   0.0000000 0.2345679 0.000000000

We can additionally by specifying returnType = “Combinatorial” derrive proportions within quadrant gates for two markers of interest. To do so, we specify the fluorophores corresponding to the markers as a list in the CombinatorialArgs.

MemoryQuadrants <- Coereba_MarkerExpressions(data=dataData, binary=binaryData,
  panel=panelData, starter="SparkBlue550", returnType = "Combinatorial", 
  CombinatorialArgs = c("BV510", "BUV395"))

head(MemoryQuadrants, 5)
#>      bid  ptype ART   Viral artexposure infant_sex screen_vl isc_artreg
#> 1 INF018     HU  HU NoViral     Limited       Male         0          0
#> 2 INF028     HU  HU NoViral     Limited       Male         0          0
#> 3 INF039 HEU-hi HEU   Viral     Limited     Female     41638          1
#> 4 INF041 HEU-lo HEU NoViral  Prolongued     Female         0          1
#> 5 INF052 HEU-hi HEU   Viral     Limited       Male     15569          1
#>   CD45RA-CD62L+ CD45RA+CD62L+ CD45RA+CD62L- CD45RA-CD62L-
#> 1    0.04347826    0.04347826    0.00000000     0.9130435
#> 2    0.03910615    0.02793296    0.08379888     0.8491620
#> 3    0.05263158    0.05263158    0.21052632     0.6842105
#> 4    0.06250000    0.08750000    0.12500000     0.7250000
#> 5    0.14814815    0.02469136    0.03703704     0.7901235

Utility_MarkerPlots

Having returned aggregated data above with Coereba_MarkerExpressions, we can plot this data using Utility_MarkerPlots(). These are combination of ggplot2 geom_boxplot() and ggbeeswarm geom_beeswarm plots. We provide some arguments that allow for filtering and reordering the markers shown in these plots, some basic customizations to the plots, and the ability to save as a .png file to a designated folder.

To do so for your own individual spots, you will need to specify a metadata column by which to factor by, and provide a list of shape and fill values corresponding to each factor level.

shape_ptype <- c("HU" = 22, "HEU-lo" = 21, "HEU-hi" = 21)
fill_ptype <- c("HU" = "white", "HEU-lo" = "darkgray", "HEU-hi" = "black")

We can start by viewing all markers:

ThePlot <- Utility_MarkerPlots(data=AllMarkers, panel=panelData,
  myfactor="ptype", shape_palette = shape_ptype, fill_palette = fill_ptype)

ThePlot
#> Warning: In `position_beeswarm`, method `center` discretizes the data axis (a.k.a the
#> continuous or non-grouped axis).
#> This may result in changes to the position of the points along that axis,
#> proportional to the value of `cex`.
#> This warning is displayed once per session.

Let’s abbreviate the number of columns by specifying the marker names to include (filterForThese), and rearrange them in desired X-axis order

ThePlot <- Utility_MarkerPlots(data=AllMarkers, panel=panelData,
  myfactor="ptype", shape_palette = shape_ptype, fill_palette = fill_ptype,
  filterForThese=c("CD7", "CD4", "CD8"), XAxisLevels = c("CD7", "CD4", "CD8"))

ThePlot

And to ggsave directly to a desired folder:

Utility_Stats

The Utility_Stats() and the Utility_Behemoth() functions are either: “Great!!!”, or guaranteed to make your friendly-local statistician cry (Opinions vary).

Basically, Utility_Stats() is a coding-attempt to replicate the typical immunologist workflow, cutting out the need to copy-paste data from a .csv file into GraphPad Prism (trademark), run a normality test, determine number of groups, and follow up with the corresponding t-test/anova/non-parametric equivalents. Utility_Behemoth() then takes these outputs, and adds the p-value results to a ggbeeswarm/ggplot2 boxplot of the underlying data. In combination with purrr and Luciernaga Utility_Patchwork(), these can be quickly formatted into a .pdf.

Does this workflow rapidly profile all the columns in your data.frame. Yes it does. Is it emblematic of potential issues arising from basing science entirely on null-hypothesis statistical testing? Also yes, seeing how many plots return “significant” due to a couple outliers. In the end, it mimics the current workflow of many researchers, just speeding it up. Google Statistical Rethinking on YouTube for food-for-thought.

To begin, we will use the Coereba_MarkerExpressions data we generated above to test out Utility_Stats().

Result <- Utility_Stats(data=AllMarkers, var="CD62L",
  myfactor="ptype", normality="dagostino", correction="none")

Result
#>   marker             pvalue     normality
#> 1  CD62L 0.0768724663243597 nonparametric

In combination with purrr we can rapidly iterate over all the markers, but will need to skip over the metadata columns that are present at the begining of the dataframe.

library(purrr)
library(dplyr)
# colnames(AllMarkers)
TheLength <- ncol(AllMarkers)

TheData <- map(names(AllMarkers)[c(9:TheLength)], ~ Utility_Stats(
  data = AllMarkers, var = .x, myfactor = "ptype", 
  normality = "dagostino")) %>% bind_rows()
#> Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
#> exact p-value with ties
#> Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
#> exact p-value with ties
#> Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
#> exact p-value with ties

The results can be filtered using regular dplyr functions.

MaybeSignificant <- TheData %>% filter(pvalue < 0.05)

MaybeSignificant
#>   marker             pvalue     normality
#> 1   IFNg 0.0260580686409208 nonparametric

Utility_Behemoth

Utility_Behemoth() continues from where Utility_Stats() left-off, appending the resulting p-value information onto a ggplot2 plot. The plot is a combination of ggplot2 geom_boxplot() and ggbeeswarm geom_beeswarm plots, and accepts arguments to customize these and rearrange the group order. As with most Coereba plots, we need to provide the shape and fill for our respective factors levels.

To do so for your own individual spots, you will need to specify a metadata column by which to factor by, and provide a list of shape and fill values corresponding to each factor level.

shape_ptype <- c("HU" = 22, "HEU-lo" = 21, "HEU-hi" = 21)
fill_ptype <- c("HU" = "white", "HEU-lo" = "darkgray", "HEU-hi" = "black")
SinglePlot <- Utility_Behemoth(data=AllMarkers, var="CD62L",
  myfactor="ptype", normality="dagostino", correction="none",
  shape_palette=shape_ptype, fill_palette=fill_ptype,
  XAxisLevels = c("HU", "HEU-lo", "HEU-hi"))
SinglePlot

In combination with purrr we can rapidly iterate over all the markers (skipping the initial metadata colums) and generate all the plots. These can then be passed to Luciernaga Utility_Patchwork() to rearrange into patchwork objects with our desired layout, and passed to a .pdf file.

# colnames(AllMarkers)
TheLength <- length(AllMarkers)

AllPlots <- map(names(AllMarkers)[c(9:TheLength)], ~ Utility_Behemoth(data=AllMarkers, var=.x,
  myfactor="ptype", normality="dagostino", correction="none",
  shape_palette=shape_ptype, fill_palette=fill_ptype, corral.width=0.7,
  XAxisLevels = c("HU", "HEU-lo", "HEU-hi")))
#> Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
#> exact p-value with ties
#> Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
#> exact p-value with ties
#> Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
#> exact p-value with ties

These in turn can be passed to Luciernaga `Utility_Patchwork() function to arrange desired layout and output to a .pdf file for late reference.

library(Luciernaga)

StorageLocation <- file.path("C:", "Users", "JohnDoe", "Desktop")

TheAssembledPlot <- Utility_Patchwork(x=AllPlots, filename="NKT_Markers",
 outfolder=NULL, thecolumns=2, therows=3, width=7, height=9,
 returntype="patchwork")
TheAssembledPlot[1]
#> $`1`

Utility_Heatmap

The original visualization for the Coereba clusters, in a Bananaquit color-scheme. Next time someone insist there are just 5 clusters in their FlowSOM, point them here.

ThePlot <- Utility_Heatmap(binary=binaryData, panel=panelPath,
  export=FALSE, outpath=NULL, filename=NULL)
ThePlot

Diffcyt port

Wrote the function to send the outputs to diffcyt for the edgeR/limma/GLMM modeling of the results (vs the above t-test approach). It works, but does it mean it’s less p-hacky? TBD. Also, given diffcyt’s current maintenance status (looking at you Luke), may not be best approach vs a new implementation.

#> R version 4.4.1 (2024-06-14 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 22631)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] Luciernaga_0.99.1    htmltools_0.5.8.1    plotly_4.10.4       
#>  [4] gt_0.11.1            stringr_1.5.1        purrr_1.0.2         
#>  [7] dplyr_1.1.4          data.table_1.16.2    ggcyto_1.32.0       
#> [10] ncdfFlow_2.50.0      BH_1.84.0-0          ggplot2_3.5.1       
#> [13] openCyto_2.16.1      flowWorkspace_4.16.0 flowCore_2.16.0     
#> [16] Coereba_0.99.0       BiocStyle_2.32.1    
#> 
#> loaded via a namespace (and not attached):
#>   [1] RBGL_1.80.0           gridExtra_2.3         rlang_1.1.4          
#>   [4] magrittr_2.0.3        matrixStats_1.4.1     ggridges_0.5.6       
#>   [7] compiler_4.4.1        dir.expiry_1.12.0     png_0.1-8            
#>  [10] systemfonts_1.1.0     vctrs_0.6.5           reshape2_1.4.4       
#>  [13] pkgconfig_2.0.3       fastmap_1.2.0         backports_1.5.0      
#>  [16] labeling_0.4.3        utf8_1.2.4            promises_1.3.0       
#>  [19] rmarkdown_2.28        graph_1.82.0          ggbeeswarm_0.7.2     
#>  [22] ragg_1.3.3            xfun_0.48             zlibbioc_1.50.0      
#>  [25] cachem_1.1.0          jsonlite_1.8.9        highr_0.11           
#>  [28] later_1.3.2           SnowballC_0.7.1       broom_1.0.7          
#>  [31] parallel_4.4.1        R6_2.5.1              bslib_0.8.0          
#>  [34] stringi_1.8.4         RColorBrewer_1.1-3    reticulate_1.39.0    
#>  [37] lubridate_1.9.3       jquerylib_0.1.4       figpatch_0.2         
#>  [40] Rcpp_1.0.13           bookdown_0.41         knitr_1.48           
#>  [43] zoo_1.8-12            httpuv_1.6.15         Matrix_1.7-0         
#>  [46] timechange_0.3.0      tidyselect_1.2.1      rstudioapi_0.17.0    
#>  [49] yaml_2.3.10           viridis_0.6.5         timeDate_4041.110    
#>  [52] lattice_0.22-6        tibble_3.2.1          plyr_1.8.9           
#>  [55] withr_3.0.1           shiny_1.9.1           Biobase_2.64.0       
#>  [58] basilisk.utils_1.16.0 evaluate_1.0.1        Rtsne_0.17           
#>  [61] desc_1.4.3            xml2_1.3.6            pillar_1.9.0         
#>  [64] lsa_0.73.3            BiocManager_1.30.25   filelock_1.0.3       
#>  [67] stats4_4.4.1          generics_0.1.3        S4Vectors_0.42.1     
#>  [70] munsell_0.5.1         scales_1.3.0          timeSeries_4041.111  
#>  [73] xtable_1.8-4          glue_1.8.0            lazyeval_0.2.2       
#>  [76] tools_4.4.1           hexbin_1.28.4         spatial_7.3-17       
#>  [79] fBasics_4041.97       fs_1.6.4              XML_3.99-0.17        
#>  [82] grid_4.4.1            flowClust_3.42.0      tidyr_1.3.1          
#>  [85] RProtoBufLib_2.16.0   colorspace_2.1-1      patchwork_1.3.0      
#>  [88] basilisk_1.16.0       beeswarm_0.4.0        vipor_0.4.7          
#>  [91] cli_3.6.3             textshaping_0.4.0     fansi_1.0.6          
#>  [94] cytolib_2.16.0        viridisLite_0.4.2     uwot_0.2.2           
#>  [97] Rgraphviz_2.48.0      gtable_0.3.5          sass_0.4.9           
#> [100] digest_0.6.37         BiocGenerics_0.50.0   htmlwidgets_1.6.4    
#> [103] farver_2.1.2          pkgdown_2.1.1         lifecycle_1.0.4      
#> [106] httr_1.4.7            mime_0.12