Tuesday, August 18, 2009

coverage 2

I would like to suggest the ShortRead and IRange packages in R/Bioconductor. If you have enough memory to load a lane in memory 8GB should be enough, they provide excellent functions to compute per base coverage and many other things.

for example for a solexa export file:
Code:
require(ShortRead)
aln<-readAligned("/path/to/file","filename_export.txt",type="SolexaExport")
# Filtering of reads e.g.:
aln <- aln[alignData(aln)$filtering=="Y" & !is.na(strand(aln)) ]
#Remove duplicated reads
aln<-aln[!srduplicated(aln)]
#Coverage
cvg<-coverage(aln,extend=60L) #in this case reads are extended 60 bp 3'
One can then use the package rtracklayer to export it as a wig
Code:
require(rtracklayer)
export.ucsc(as(cvg,"RangedData")),"test.wig",subformat="wig")
you might need to change the chromosome names afterwards if your original names already contained chr.

No comments: