Created
July 27, 2017 18:37
-
-
Save jnhutchinson/3c7548b494521a9852fe94639175f75c to your computer and use it in GitHub Desktop.
RUVseq
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Batch Correction | |
| There is a batch effect, which we will need to eliminate as it is confounded with the sample class. Ideally, all the A samples would cluster together. | |
| Try using [RUVseq](http://www.nature.com/nbt/journal/v32/n9/full/nbt.2931.html) to remove variation associated wtih batch. | |
| Leverage the two technical replicates we have that span the batches. | |
| ```{r ruvseqs} | |
| counts <- counts(dds, normalized=TRUE) # count matrix from the DESeq2 object | |
| #RUVs using replicates | |
| ## specify replicates, check this to be sure you did it right | |
| replicates <- matrix(data=c(c(1,2), c(6,7)), byrow=TRUE, nrow=2) | |
| ## run RUVseq on size-adjusted data from DESeq2 | |
| # k of 2 completely overlaps replicates on PCA, iterated on this to optimize | |
| batch_ruvs_calc <- RUVs(x=as.matrix(counts), cIdx=rownames(counts), k=2, scIdx=replicates, isLog=FALSE) | |
| ## variance stabilize RUVseq adjusted results | |
| normed.ruvs.vs <- varianceStabilizingTransformation(batch_ruvs_calc$normalizedCounts) | |
| ## PCA plot the variance stabilized RUVseqS adjusted results | |
| # have to shove it in this object for DESeq2::plotPCA to work | |
| se.ruvs <- SummarizedExperiment(normed.ruvs.vs, colData=colData(dds)) | |
| ``` | |
| ## PCA plots on batch corrected, variance stabilized data | |
| ### batch and sample type | |
| ```{r ruvpca1} | |
| plotPCA(DESeqTransform(se.ruvs), intgroup=c("sampletype", "batch")) | |
| ``` | |
| # Differential Expression Analysis | |
| RUVseq batch correction is a great improvement over the uncorrected data but it still doesn't completely remove the batch effect (ignoring the one outlier, the lung samples still split themselves by batch). However, this is pretty good, especially with the data we have. | |
| Differential gene expression analysis of count data was performed using the Bioconductor R package, DESeq2. The count data was fit to a negative binomial model and dispersion estimates were generated using the mean values from the maximum likelikhood estimate of log2 fold changes, optimizing the Cox-Reid adjusted profile likelihood. | |
| The models used to estimate differential expression accounted for the matched sample design and the RUVseq generated batch correction. | |
| As there was a batch effect, we recommend caution in interpreting the results. Significantly differentially expressed genes identified in these analyses will require lab verification. | |
| ```{r, DE} | |
| ruvs <- batch_ruvs_calc$W | |
| dds.ruvs <- DESeqDataSetFromMatrix(countData=rawcounts, colData=cbind(ruvs,pd), design=~W_1+W_2+day+sampletype) | |
| dds.ruvs <- DESeq(dds.ruvs) | |
| ``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment