F-UT-URE

Thursday, February 24, 2011

Usage examples from the wild:

============================================================
Example 1
Summary: Compute the "spanning" or "physical" coverage of my paired-end
dataset on the reference genome. I will use hg18.
Contributor: Aaron Quinlan
Date: 06-July-2010
Tools used: samtools, bamToBed, genomeCoverageBed
============================================================
Workflow:
# 1. Sort the BAM files by read ID so that step 2 can correctly compute the "span" of the
# paired-end fragment.
samtools sort -n -m 50000000000 sample.sorted.dup.bam sample.namesorted.dup

# 2. Create BED from the spanning cooridnates and sort them by chrom for genomeCoverageBed.
# NOTE: Using -bedpe format from bamToBed so that I have the starts/ends for each mate.
# NOTE: Only computing coverage based upon "properly-paired" reads. (i.e., -f 0x2)
samtools view -uf 0x2 sample.namesorted.dup.bam | bamToBed -i stdin -bedpe | awk '{OFS="\t"; print $1,$2,$6,$7,$8}' | sort -k 1,1 > sample.namesorted.dup.bam.bed

# 3. Compute the ___spanning___ coverage histogram for the sample
# NOTE: We are creating a histogram where we lump all positions with depth > 60 into a single bin (-max 60)
genomeCoverageBed -i sample.namesorted.dup.bam.bed -g human.hg18.genome -max 60 > sample.spanning.cov

============================================================
Example 2
Summary: Convert a BAM into a bedgraph or a bigWig.
Contributor: Davide Cittaro
Date: 08-July-2010
Tools used: bamToBed, slopBed, genomeCoverageBed, samtools, wigToBigWig
============================================================
Workflow:
# 1. Remove duplicates from the BAM file. I'm using samtools, Picard may be also used. I'm assuming it's a single read file
samtools rmdup -s sample.bam sample.nodup.bam

# 2. Get chromosome sizes. You may get this from UCSC or from BAM headers
samtools view -H sample.nodup.bam | awk '/@SQ/ {OFS="\t"; gsub("SN:", "", $2); gsub("LN:", "", $3); print $2, $3}' > chromsizes.tab

# 3a. Convert the BAM file into a gzipped bedgraph
# NOTE: You should define the extension size (ssize) as the average library size minus the read length. There's no need to be precise here, if you have library at 200 bp and 36 bp reads, you can set ssize in a range between 160-180 bp
bamTobed -i sample.nodup.bam | slopBed -i stdin -g chromsizes.tab -s -l 0 -r ${ssize} | genomeCoverageBed -g chromsizes.tab -i stdin -bg | gzip -c - > sample.bdg.gz

# 3b. Convert the BAM file into a bigWig file
bamTobed -i sample.nodup.bam | slopBed -i stdin -g chromsizes.tab -s -l 0 -r ${ssize} | genomeCoverageBed -g chromsizes.tab -i stdin -bg | wigToBigWig stdin chromsizes.tab sample.bw

============================================================
Example 3
Summary: A slightly different way (than ex. 2) to create BigWig files from BAM
Contributor: Assaf Gordon
Date: 09-July-2010
Tools used: samtools, genomeCoverageBed, bedGraphToBigWig
More Information : http://cancan.cshl.edu/labmembers/gordon/files/viz2.pdf
============================================================
Workflow:
# 1. Convert SAM to BAM
samtools view -S -b -o sample.bam sample.sam

# 2. Sort the BAM file
samtools sort sample.bam sample.sorted

# 3. Create BedGraph coverage file
genomeCoverageBed -bg -ibam sample.sorted.bam -g chromsizes.txt > sample.bedgraph

# 4. Convert the BedGraph file to BigWig
bedGraphToBigWig sample.bedgraph chromsizes.txt sample.bw

============================================================
Example 4
Summary: Create two coverage tracks - one per strand
Contributor: Assaf Gordon
Date: 09-July-2010
Tools used: samtools, genomeCoverageBed, bedGraphToBigWig, awk
More Information : http://cancan.cshl.edu/labmembers/gordon/files/viz2.pdf
============================================================
Workflow:
# 1. Convert SAM to BAM
samtools view -S -b -o sample.bam sample.sam

# 2. Sort the BAM file
samtools sort sample.bam sample.sorted

# 3. Create BedGraph coverage file - for positive strand
genomeCoverageBed -bg -ibam sample.sorted.bam -g chromsizes.txt -strand + > sample.pos.bedgraph

# 4. Create BedGraph coverage file - for negative strand
# use AWK to negate the coverage values
genomeCoverageBed -bg -ibam sample.sorted.bam -g chromsizes.txt -strand - | awk -v OFS="\t" '{ $4 = - $4 ; print $0 }' > sample.neg.bedgraph

# 5. Create BigWig tracks from both bedgraph files
bedGraphToBigWig sample.pos.bedgraph chromsizes.txt sample.pos.bw
bedGraphToBigWig sample.pos.bedgraph chromsizes.txt sample.neg.bw

# 6. In the UCSC Genome Browser tracks, add both tracks, with different color values:
track name="Positive Strand" color=0,0,255 type=bigWig bigDataUrl=http://myserver.edu/SAMPLE.POS.BW
track name="Negative Strand" color=255,0,0 type=bigWig bigDataUrl=http://myserver.edu/SAMPLE.NEG.BW

============================================================
Example 5
Summary: Create two coverage tracks - for exons only
Contributor: Assaf Gordon
Date: 09-July-2010
Tools used: samtools, genomeCoverageBed, bedGraphToBigWig
More Information : http://cancan.cshl.edu/labmembers/gordon/files/viz2.pdf
============================================================
Workflow:
# 1. Convert SAM to BAM
samtools view -S -b -o sample.bam sample.sam

# 2. Sort the BAM file
samtools sort sample.bam sample.sorted

# 3. Create BedGraph coverage file - calculating the coverage of exons only
genomeCoverageBed -bg -ibam sample.sorted.bam -g chromsizes.txt -split > sample.bedgraph

# 4. Create BigWig tracks from the bedgraph files
bedGraphToBigWig sample.bedgraph chromsizes.txt sample.bw

=============================================================
Example 6
Summary: Find all genomic regions where all samples have >=X coverage
Contributor: Aaron Quinlan
Date: 18-July-2010
Tools used: samtools, intersectBed, genomeCoverageBed, mergeBed, awk
=============================================================
Workflow:
##############################################################
# NOTE: The example below is for three samples in the interest
# of clarity/brevity. It could be extended to many more
# samples with a BASH or Perl/Python script.
##############################################################

# 1. Let's assume you have three human samples, 1, 2, and 3.
# Let's also assume that you have aligned the sequences for
# these samples and they are now in BAM format.
ls
1.bam 2.bam 3.bam

# 1a. These BAM files contain both concordant and discordant reads.
# Also, they contain alignments with _all_ mapping qualities.
# Let's cull the files to solely those alignments that are
# "properly-paired" (0x2) and have a mapping quality >=10.
samtools view -q 10 -f 0x2 -b 1.bam > 1.clean.bam
samtools view -q 10 -f 0x2 -b 2.bam > 2.clean.bam
samtools view -q 10 -f 0x2 -b 3.bam > 3.clean.bam

# 2. First create a BEDGRAPH of the coverage for each sample.
genomeCoverageBed -ibam 1.clean.bam -g human.hg19.genome -bga > 1.clean.bam.bedg
genomeCoverageBed -ibam 2.clean.bam -g human.hg19.genome -bga > 2.clean.bam.bedg
genomeCoverageBed -ibam 3.clean.bam -g human.hg19.genome -bga > 3.clean.bam.bedg

# 3. Now merge all intervals with sufficient depth (say >=6X).
# This will create a BED file.
awk '$4 >= 6' 1.clean.bam.bedg | mergeBed -i stdin > 1.clean.bam.gte6.bed
awk '$4 >= 6' 2.clean.bam.bedg | mergeBed -i stdin > 2.clean.bam.gte6.bed
awk '$4 >= 6' 3.clean.bam.bedg | mergeBed -i stdin > 3.clean.bam.gte6.bed

# 4. Intersect them all to get the set of intervals having depth >= 6 in _all_ 3.
intersectBed -a 1.clean.bam.gte6.bed -b 2.clean.bam.gte6.bed | \
intersectBed -a stdin -b 3.clean.bam.gte6.bed > all.clean.bam.gte6.bed

# 5. What is the total number of base pairs having >=6X in _all_ 3 samples?
awk '{sum += $3-$2} END{print sum}' all.clean.bam.gte6.bed

# 6. One could now filter variant calls against this BED file to restrict
# calls to regions that you believe to be of sufficient quality, etc.

Sunday, December 26, 2010

mac

172.21.163.217

Thursday, December 9, 2010

grep匹配TAB

1、grep -P '\t'

2、grep [[:space:]] // 所有空白字符

3、直接grep tab字符 // 命令行下用"ESC TAB"输入

4、grep $'\t'

Friday, December 3, 2010

NGS tools'

http://manuals.bioinformatics.ucr.edu/home/ht-seq

Thursday, December 2, 2010

good article for introduction of install R on cluster

http://technical.bestgrid.org/index.php/Installing_R_on_a_Rocks_Cluster

Wednesday, December 1, 2010

solexa_ucla

username: nelsonlabguest
password: dna345

Monday, November 22, 2010

editplus SVN 设置

kairyou 发表于 2008-12-17, 12:27 PM. 发表在: WebDev
SVN在团队开发的时候很强大。VS有VisualSVN插件，但是我不喜欢用VS。vim电脑上也有，虽然很多人说vim也很强大，不过我目前还是习惯editplus。也许以后会去适应vim吧。

1、首先这里有一篇介绍，edtiplus使用SlikSVN来实现update、commit。当然前提也是要安装TortoiseSVN的。

2、发现了一个更好的介绍，是deitplus wiki里面的介绍的方法。

说明：方法1使用的SlikSVN是命令行端，方法2使用的TortoiseProc.exe是GUI端。

下面我把方法二的实现方法稍微翻译一下：

Subversion Commit
说明：用TortoiseSVN校检文件并提交文件到服务器（当然，前提是你安装了TortoiseSVN）
添加方法:编辑edtiplus 工具-用户工具-添加工具-程序
菜单文本：SVN Commit
命令: C:\Program Files\TortoiseSVN\bin\TortoiseProc.exe
参数: /command:commit /path:"$(FilePath)" /notempfile /closeonend:0
初始目录: $(FileDir) Check: "Capture output", "Save open files"
勾上"保存打开文件"。可以不勾"捕获输出"，根据个人喜好吧，我是没勾。

说明：TortoiseSVN 使用临时文件在 shell 扩展和主程序之间传递多个参数，（低于1.5.0版，必须增加/notempfile参数，如果不这样做，该命令将无法正常工作，/path指定的文件将被删除。）从 TortoiseSVN 1.5.0 开始，废弃/notempfile参数，不再需要增加此参数。

Subversion Update, Delete, Rename, Checkout etc
说明：SVN的更新、删除、重命名、校检等命令
方法：只需要把上面的"参数"里的：/command:commit 用下面的替换（例如：/command:about）

:about 显示关于对话框
:log 打开日志对话框
:checkout 打开检出对话框
:import 打开导入对话框
:update 将工作副本的/path更新到HEAD，如果给定参数/rev，就会弹出一个对话框询问用户需要更新到哪个修订版本。为了避免指定修订版本号/rev:1234的对话框，需要加上/nonrecursive和/ignoreexternals参数（这2个参数我没加，还没遇到上述问题）
:commit 打开提交对话框
:add 将/path的文件添加到版本控制
:revert 撤消一个文件自上次更新后的所有的变更
:cleanup 递归清理工作拷贝，删除未完成的工作拷贝锁定
:resolve 将/path指定文件的冲突标示为解决，如果加上/noquestion，将不会提示确认操作。
:repocreate 在/path创建一个版本库
:switch 切换至分枝/标记
:export 将/path的工作副本导出到另一个目录
:merge 打开合并对话框
:mergeall 打开合并所有对话框
:copy 复制工作副本至URL
:settings 打开设置对话框
:remove 从版本控制里移除/path中的文件
:rename 重命名/path的文件
:diff 启动TortoiseSVN设置的外置比较程序
:help 打开帮助文件
:relocate 打开重定位对话框
:help 打开帮助文件
:repobrowser 打开版本库浏览器对话框
:ignore 将/path中的对象加入到忽略列表，仅对文件夹有效。
:blame 打开文件的追溯对话框
:createpatch 创建/path下的补丁文件。
:revisiongraph 显示/path目录下的版本变化图。
:lock 锁定一个文件，可以输入锁定的原因。
:rebuildiconcache 重建windows的图标缓存，当系统图标缓存出了问题才需要这样做（会导致桌面的图标会重新排列）
:properties 显示 /path 给出的路径之属性对话框。

更多的命令看：tortoisesvn docs吧

我只用了update、commit、add、revert、rename、remove、export、lock、unlock、resolve、checkout、blame、merge，后面几个都是不太常用的了。

另外在editplus wiki，还发现了一个不错的东西：打开当前文件的文件夹（使用svn的话，这个功能就很实用了）。方法，在用户工具里添加-程序：
菜单文本：Current Location(当前文件的文件夹)
命令：%systemroot%\explorer.exe /e,/root,\local disk, 参数：$(FileDir)
初始目录：空着
勾上：退出时关闭窗口、保存打开文件