Principal Investigator:
Bioinformatics Analysis: Dr. Heather Kates (hkates@ufl.edu)
This analysis examines changes in B cell population composition and gene expression related to Treatment and to age comparing young and aged mice lymph node and tumor tissues.
Sample Groups:
Treatment Conditions:
Sequencing: 10X Genomics single-cell RNA sequencing with cell hashing and VDJ-B/T sequencing was performed on all samples.
Analysis Pipeline: Raw sequencing data was processed through Cell Ranger Multi, followed by standard Seurat-based single-cell analysis including quality control, normalization, clustering, and cell type annotation. All sample groups were integrated using Harmony batch correction.
Input Dataset: The integrated dataset contains 6.241210^{4} high-quality cells across all conditions, with 26914 BCR-positive B cells identified through VDJ sequencing.
Complete datasets, intermediate objects, and reproducible code are available at: https://data.rc.ufl.edu/secure/cancercenter-dept/zhangw/GE-8124/
Contact (hkates@ufl.edu) for access credentials.
B cells were identified using BCR (B cell receptor) information from the VDJ-B sequencing data. This approach is more specific than using gene expression markers alone, as it captures cells that have successfully undergone V(D)J recombination.
## Original dataset: 62412 cells
## B cells identified: 26914 cells
## B cell percentage: 43.12 %
Because B cells represent a heterogeneous population with distinct functional states, this subset was re-analyzed to identify B cell sub-populations with better resolution than in the full dataset including re-normalization, dimensionality reduction, and clustering.
Multiple clustering resolutions (0.05-0.8) were tested, yielding 4-17 clusters. Resolution 0.1 was selected for producing 6 biologically interpretable clusters that capture major B cell functional states without over-clustering.
b_cells <- NormalizeData(b_cells)
b_cells <- FindVariableFeatures(b_cells, selection.method = "vst", nfeatures = 2000)
b_cells <- ScaleData(b_cells)
b_cells <- RunPCA(b_cells, features = VariableFeatures(object = b_cells))
b_cells <- RunHarmony(b_cells, group.by.vars = "sample_id")
b_cells <- RunUMAP(b_cells, reduction = "harmony", dims = 1:30)
resolutions <- c(0.05, 0.1, 0.2, 0.3, 0.5, 0.8)
b_cells <- FindNeighbors(b_cells, reduction = "harmony", dims = 1:30)
# Store clustering results
clustering_results <- data.frame(
Resolution = numeric(),
Number_of_Clusters = numeric()
)
invisible(capture.output({
for(res in resolutions) {
b_cells <- FindClusters(b_cells, resolution = res)
n_clusters <- length(unique(b_cells[[paste0("RNA_snn_res.", res)]][,1]))
clustering_results <- rbind(clustering_results,
data.frame(Resolution = res,
Number_of_Clusters = n_clusters))
}
}))
Idents(b_cells) <- b_cells$RNA_snn_res.0.1
b_cells$bcell_clusters <- Idents(b_cells)
# Display results table
knitr::kable(clustering_results,
col.names = c("Resolution", "Number of Clusters"),
caption = "B Cell Clustering Results at Different Resolutions")| Resolution | Number of Clusters |
|---|---|
| 0.05 | 4 |
| 0.10 | 6 |
| 0.20 | 10 |
| 0.30 | 12 |
| 0.50 | 14 |
| 0.80 | 17 |
The plots show the identified B cell subpopulations and how they are distributed across experimental conditions.
To identify distinct B cell functional states, differential expression analysis was performed to find cluster-specific marker genes. These markers were then used to annotate clusters into biologically meaningful B cell subtypes.
| Cluster | Top 3 Markers | Log2FC Range |
|---|---|---|
| 0 | Vpreb3, Fcer2a, Chchd10 | 0.7-0.73 |
| 1 | Ifit3, Usp18, Ifit2 | 4.41-6.12 |
| 2 | Cd80, Tbc1d9, Ctla4 | 4.73-5.07 |
| 3 | Themis, Icos, Gm2682 | 8.63-8.73 |
| 4 | Camk1d, St6galnac3, Zfp407 | 1.99-2.79 |
| 5 | Tnfsf13, Ifitm6, Ifitm1 | 10.33-10.55 |
Initial attempts using automated reference-based annotation tools (SingleR, Azimuth) and predefined gene signature matching yielded inconsistent results with poor cluster resolution. Therefore, a manual curation approach was adopted using cluster-specific top markers and established B cell subtype signatures.
Key B cell subtype markers were visualized to assess cluster identity based on spatial expression patterns in the UMAP embedding.
Based on the spatial expression patterns of cluster-specific markers and known B cell biology, clusters were manually annotated into functionally distinct subtypes.
| Cluster | Manual Annotation | Cell Count | Percentage |
|---|---|---|---|
| 0 | Transitional B cells | 24292 | 90.3 |
| 1 | Interferon-stimulated B cells | 1046 | 3.9 |
| 2 | Activated B cells | 678 | 2.5 |
| 3 | Alternative activated B cells | 449 | 1.7 |
| 4 | Immature B cells | 269 | 1.0 |
| 5 | Type I IFN B cells | 180 | 0.7 |
A .cloupe file was created from the Seurat object of subsetted re-analyzed B-cells to enable interactive visualization and explorattion in the 10X loupe browser. Download re-clustered annotated B-cells
Differential expression analysis was performed using the pseudobulk methodol, which aggregates cells within each sample before statistical testing. This approach properly accounts for biological replication by converting single-cell data to pseudobulk format by aggregating expression within each sample. This approach treats biological samples (not individual cells) as statistical units.and reduces false positives compared to cell-level testing.
The six comparisons are:
## Expression matrix dimensions: 21262 132
## Pseudobulk samples: 132
DESeq2 was used for differential expression testing on pseudobulk data. This method properly models the negative binomial distribution of RNA-seq count data and controls for multiple testing across genes.
Sample Size Requirements: DESeq2 requires sufficient biological replicates to estimate gene-wise variance accurately. A minimum of 4 total samples are required per comparison, which allows for the missing YT_Vehicle_2 sample while maintaining statistical power.
## Completed 6 of 6 comparisons