Skip to content

Instantly share code, notes, and snippets.

@jnhutchinson
Created July 27, 2017 18:37
Show Gist options
  • Select an option

  • Save jnhutchinson/3c7548b494521a9852fe94639175f75c to your computer and use it in GitHub Desktop.

Select an option

Save jnhutchinson/3c7548b494521a9852fe94639175f75c to your computer and use it in GitHub Desktop.
RUVseq
# Batch Correction
There is a batch effect, which we will need to eliminate as it is confounded with the sample class. Ideally, all the A samples would cluster together.
Try using [RUVseq](http://www.nature.com/nbt/journal/v32/n9/full/nbt.2931.html) to remove variation associated wtih batch.
Leverage the two technical replicates we have that span the batches.
```{r ruvseqs}
counts <- counts(dds, normalized=TRUE) # count matrix from the DESeq2 object
#RUVs using replicates
## specify replicates, check this to be sure you did it right
replicates <- matrix(data=c(c(1,2), c(6,7)), byrow=TRUE, nrow=2)
## run RUVseq on size-adjusted data from DESeq2
# k of 2 completely overlaps replicates on PCA, iterated on this to optimize
batch_ruvs_calc <- RUVs(x=as.matrix(counts), cIdx=rownames(counts), k=2, scIdx=replicates, isLog=FALSE)
## variance stabilize RUVseq adjusted results
normed.ruvs.vs <- varianceStabilizingTransformation(batch_ruvs_calc$normalizedCounts)
## PCA plot the variance stabilized RUVseqS adjusted results
# have to shove it in this object for DESeq2::plotPCA to work
se.ruvs <- SummarizedExperiment(normed.ruvs.vs, colData=colData(dds))
```
## PCA plots on batch corrected, variance stabilized data
### batch and sample type
```{r ruvpca1}
plotPCA(DESeqTransform(se.ruvs), intgroup=c("sampletype", "batch"))
```
# Differential Expression Analysis
RUVseq batch correction is a great improvement over the uncorrected data but it still doesn't completely remove the batch effect (ignoring the one outlier, the lung samples still split themselves by batch). However, this is pretty good, especially with the data we have.
Differential gene expression analysis of count data was performed using the Bioconductor R package, DESeq2. The count data was fit to a negative binomial model and dispersion estimates were generated using the mean values from the maximum likelikhood estimate of log2 fold changes, optimizing the Cox-Reid adjusted profile likelihood.
The models used to estimate differential expression accounted for the matched sample design and the RUVseq generated batch correction.
As there was a batch effect, we recommend caution in interpreting the results. Significantly differentially expressed genes identified in these analyses will require lab verification.
```{r, DE}
ruvs <- batch_ruvs_calc$W
dds.ruvs <- DESeqDataSetFromMatrix(countData=rawcounts, colData=cbind(ruvs,pd), design=~W_1+W_2+day+sampletype)
dds.ruvs <- DESeq(dds.ruvs)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment