| analysis_date | h5ad_file | species | total_cells | design_type | edviz_grammar | factors | tool_version | |||
|---|---|---|---|---|---|---|---|---|---|---|
2025-11-12 |
GSE290106_analysed.h5ad |
Mouse (Mus musculus) |
21417 |
2×2 Nested Design (Hierarchical) |
Genotype(2) > Sample[5103|3881|7904|4537] : CellType(6) |
|
0.1.0 |
File: GSE290106_analysed.h5ad Analysis Date: 2025-11-12 Species: Mouse (Mus Musculus) Total Cells: 21,417
Experiment Type: Tumor microenvironment profiling study
Research Question: How does Keap1 knockout affect tumor microenvironment composition and immune cell infiltration compared to wild-type?
Factor Descriptions:
- Genotype: Two experimental conditions: KeapKO (Keap1 knockout - constitutive Nrf2 activation and altered oxidative stress response) and KeapWT (wild-type control)
- Sample: Biological replicates (2 per genotype), tumor tissue samples
- Cell Type: Six cell populations identified in tumor microenvironment: CAF (cancer-associated fibroblasts), DC (dendritic cells), Macrophages (dominant population at 92%), NK cells, Neutrophils, and Tumor cells
| Factor | Levels | Type |
|---|---|---|
| Genotype | 2 | Treatment |
| Sample | 4 | Replicate |
| Cell Type | 6 | Observation |
This dataset exhibits a nested design. The factor sample is nested within genotype, meaning each sample belongs to exactly one genotype condition. Cell Types are observed across all samples, creating a crossed relationship with the nested structure.
┌──────────────────── Design Structure ────────────────────┐
│ │
│ Genotype(2) │
│ ↓ │
│ Sample([5103 | 3881 | 7904 | 4537]) │
│ : │
│ │
│ CellType(6) │
│ │
│ │
│ │
│ │
│ │
│ │
└──────────────────────────────────────────────────────────┘
Genotype(2) > Sample[5103|3881|7904|4537] : CellType(6)
Samples per Genotype: 8,982 - 12,442 (mean: 10,712)
Cells per Sample: 3,881 - 7,904 (mean: 5,356)
Cells per Cell Type: 32 - 19,789 (mean: 3,573)
This design structure has implications for statistical analysis:
Random Effects Modeling: The nesting of sample within genotype indicates that sample-specific variation should be modeled as a random effect. When testing for genotype effects, use mixed-effects models with random intercepts for sample (e.g., ~ genotype + (1|sample) in lme4 notation).
Aggregation Strategy: For differential expression testing, pseudobulking to the sample level preserves the experimental unit structure. Aggregate cells to sample-by-cell_type pseudobulk profiles before applying standard DE methods, treating samples as biological replicates.
Contrast Specification: When comparing genotypes, ensure contrasts are computed at the sample level, not the cell level, to avoid pseudoreplication and inflated Type I error rates.
Highly unbalanced cell type distribution with Macrophages comprising 92.4% of all cells
Unbalanced sample sizes ranging from 3,881 to 7,904 cells per sample
KeapWT samples have more cells overall (12,442) compared to KeapKO (8,982)