1 Study Overview

Principal Investigator:
Bioinformatics Analysis: Dr. Heather Kates (hkates@ufl.edu)

1.1 Experimental Design

This analysis examines changes in B cell population composition and gene expression related to Treatment and to age comparing young and aged mice lymph node and tumor tissues.

Sample Groups:

AL: Aged mice, Lymph node samples
AT: Aged mice, Tumor samples
YL: Young mice, Lymph node samples
YT: Young mice, Tumor samples

Treatment Conditions:

Vehicle control (3 biological replicates per group)
Treatment (3 biological replicates per group)

1.2 Data Generation and Processing

Sequencing: 10X Genomics single-cell RNA sequencing with cell hashing and VDJ-B/T sequencing was performed on all samples.

Analysis Pipeline: Raw sequencing data was processed through Cell Ranger Multi, followed by standard Seurat-based single-cell analysis including quality control, normalization, clustering, and cell type annotation. All sample groups were integrated using Harmony batch correction.

Input Dataset: The integrated dataset contains 6.241210^{4} high-quality cells across all conditions, with 26914 BCR-positive B cells identified through VDJ sequencing.

1.3 Data Access

Complete datasets, intermediate objects, and reproducible code are available at: https://data.rc.ufl.edu/secure/cancercenter-dept/zhangw/GE-8124/
Contact (hkates@ufl.edu) for access credentials.

2 B Cell Population Analysis

2.1 Identifying B Cells Using VDJ Data

B cells were identified using BCR (B cell receptor) information from the VDJ-B sequencing data. This approach is more specific than using gene expression markers alone, as it captures cells that have successfully undergone V(D)J recombination.

## Original dataset: 62412 cells

## B cells identified: 26914 cells

## B cell percentage: 43.12 %

2.2 Clustering B Cell Subpopulations

Because B cells represent a heterogeneous population with distinct functional states, this subset was re-analyzed to identify B cell sub-populations with better resolution than in the full dataset including re-normalization, dimensionality reduction, and clustering.

Multiple clustering resolutions (0.05-0.8) were tested, yielding 4-17 clusters. Resolution 0.1 was selected for producing 6 biologically interpretable clusters that capture major B cell functional states without over-clustering.

b_cells <- NormalizeData(b_cells)
b_cells <- FindVariableFeatures(b_cells, selection.method = "vst", nfeatures = 2000)
b_cells <- ScaleData(b_cells)
b_cells <- RunPCA(b_cells, features = VariableFeatures(object = b_cells))
b_cells <- RunHarmony(b_cells, group.by.vars = "sample_id")
b_cells <- RunUMAP(b_cells, reduction = "harmony", dims = 1:30)
resolutions <- c(0.05, 0.1, 0.2, 0.3, 0.5, 0.8)
b_cells <- FindNeighbors(b_cells, reduction = "harmony", dims = 1:30)

# Store clustering results
clustering_results <- data.frame(
  Resolution = numeric(),
  Number_of_Clusters = numeric()
)

invisible(capture.output({
  for(res in resolutions) {
    b_cells <- FindClusters(b_cells, resolution = res)
    n_clusters <- length(unique(b_cells[[paste0("RNA_snn_res.", res)]][,1]))
    clustering_results <- rbind(clustering_results, 
                              data.frame(Resolution = res, 
                                       Number_of_Clusters = n_clusters))
  }
}))

Idents(b_cells) <- b_cells$RNA_snn_res.0.1
b_cells$bcell_clusters <- Idents(b_cells)

# Display results table
knitr::kable(clustering_results, 
             col.names = c("Resolution", "Number of Clusters"),
             caption = "B Cell Clustering Results at Different Resolutions")

B Cell Clustering Results at Different Resolutions
Resolution	Number of Clusters
0.05	4
0.10	6
0.20	10
0.30	12
0.50	14
0.80	17

2.3 B Cell Cluster Visualization

The plots show the identified B cell subpopulations and how they are distributed across experimental conditions.

2.4 B Cell Subpopulation Characterization

To identify distinct B cell functional states, differential expression analysis was performed to find cluster-specific marker genes. These markers were then used to annotate clusters into biologically meaningful B cell subtypes.

Top Differential Expression Markers by Cluster
Cluster	Top 3 Markers	Log2FC Range
0	Vpreb3, Fcer2a, Chchd10	0.7-0.73
1	Ifit3, Usp18, Ifit2	4.41-6.12
2	Cd80, Tbc1d9, Ctla4	4.73-5.07
3	Themis, Icos, Gm2682	8.63-8.73
4	Camk1d, St6galnac3, Zfp407	1.99-2.79
5	Tnfsf13, Ifitm6, Ifitm1	10.33-10.55

2.5 Annotation Strategy: From Automated to Manual Curation

Initial attempts using automated reference-based annotation tools (SingleR, Azimuth) and predefined gene signature matching yielded inconsistent results with poor cluster resolution. Therefore, a manual curation approach was adopted using cluster-specific top markers and established B cell subtype signatures.

2.6 Manual Marker Visualization

Key B cell subtype markers were visualized to assess cluster identity based on spatial expression patterns in the UMAP embedding.

2.6.1 Final Manual Annotation

Based on the spatial expression patterns of cluster-specific markers and known B cell biology, clusters were manually annotated into functionally distinct subtypes.

Final B Cell Cluster Annotations and Cell Distributions
Cluster	Manual Annotation	Cell Count	Percentage
0	Transitional B cells	24292	90.3
1	Interferon-stimulated B cells	1046	3.9
2	Activated B cells	678	2.5
3	Alternative activated B cells	449	1.7
4	Immature B cells	269	1.0
5	Type I IFN B cells	180	0.7

2.7 Download Loupe Browser File

A .cloupe file was created from the Seurat object of subsetted re-analyzed B-cells to enable interactive visualization and explorattion in the 10X loupe browser. Download re-clustered annotated B-cells

3 Differential Expression Analysis

3.1 Experimental Comparisons

Differential expression analysis was performed using the pseudobulk methodol, which aggregates cells within each sample before statistical testing. This approach properly accounts for biological replication by converting single-cell data to pseudobulk format by aggregating expression within each sample. This approach treats biological samples (not individual cells) as statistical units.and reduces false positives compared to cell-level testing.

The six comparisons are:

Treatment effects in aged lymph node: Treatment vs Vehicle in aged mice (lymph node)
Treatment effects in aged tumor: Treatment vs Vehicle in aged mice (tumor)
Treatment effects in young lymph node: Treatment vs Vehicle in young mice (lymph node)
Treatment effects in young tumor: Treatment vs Vehicle in young mice (tumor)
Age effects in lymph node: Young vs Aged (vehicle-treated, lymph node)
Age effects in tumor: Young vs Aged (vehicle-treated, tumor)

## Expression matrix dimensions: 21262 132

## Pseudobulk samples: 132

3.2 Statistical Analysis with DESeq2

DESeq2 was used for differential expression testing on pseudobulk data. This method properly models the negative binomial distribution of RNA-seq count data and controls for multiple testing across genes.

Sample Size Requirements: DESeq2 requires sufficient biological replicates to estimate gene-wise variance accurately. A minimum of 4 total samples are required per comparison, which allows for the missing YT_Vehicle_2 sample while maintaining statistical power.

## Completed 6 of 6 comparisons

3.3 Results Summary and Download

3.4 Analysis Summary

Overview of differential expression results across all comparisons.

##                                                       Comparison_Clean
## Aged_Lymphnode_Treatment_vs_Vehicle   Aged Lymphnode Treatment vs. Vehicle
## Aged_Tumor_Treatment_vs_Vehicle           Aged Tumor Treatment vs. Vehicle
## Young_Lymphnode_Treatment_vs_Vehicle Young Lymphnode Treatment vs. Vehicle
## Young_Tumor_Treatment_vs_Vehicle         Young Tumor Treatment vs. Vehicle
## Young_vs_Aged_Lymphnode_Vehicle       Young vs. Aged Lymphnode Vehicle
## Young_vs_Aged_Tumor_Vehicle               Young vs. Aged Tumor Vehicle
##                                    Significant_Genes Upregulated Downregulated
## Aged_Lymphnode_Treatment_vs_Vehicle                 28           8            20
## Aged_Tumor_Treatment_vs_Vehicle                     54          17            37
## Young_Lymphnode_Treatment_vs_Vehicle                22          16             6
## Young_Tumor_Treatment_vs_Vehicle                     6           3             3
## Young_vs_Aged_Lymphnode_Vehicle                  104          33            71
## Young_vs_Aged_Tumor_Vehicle                      100          32            68
##                                    Clusters_Analyzed
## Aged_Lymphnode_Treatment_vs_Vehicle                  6
## Aged_Tumor_Treatment_vs_Vehicle                      6
## Young_Lymphnode_Treatment_vs_Vehicle                 5
## Young_Tumor_Treatment_vs_Vehicle                     6
## Young_vs_Aged_Lymphnode_Vehicle                    5
## Young_vs_Aged_Tumor_Vehicle                        6

## Significant genes by B cell cluster:
## # A tibble: 6 × 3
##   cluster Total_Significant Avg_LogFC
##   <chr>               <int>     <dbl>
## 1 0                     242      1.62
## 2 1                      24      3.31
## 3 2                      24      3.56
## 4 3                      12      4.96
## 5 5                      11      3.96
## 6 4                       1      3.18

Effect of Treatment on the immune landscape in young and old mice

B Cell Subset Analysis and Differential Expression

09/18/25