F-UT-URE: 2010

Sunday, December 26, 2010

Thursday, December 9, 2010

grep匹配TAB

1、grep -P '\t'

2、grep [[:space:]] // 所有空白字符

3、直接grep tab字符 // 命令行下用"ESC TAB"输入

4、grep $'\t'

Friday, December 3, 2010

NGS tools'

http://manuals.bioinformatics.ucr.edu/home/ht-seq

Thursday, December 2, 2010

good article for introduction of install R on cluster

http://technical.bestgrid.org/index.php/Installing_R_on_a_Rocks_Cluster

Wednesday, December 1, 2010

solexa_ucla

username: nelsonlabguest
password: dna345

Monday, November 22, 2010

kairyou 发表于 2008-12-17, 12:27 PM. 发表在: WebDev
SVN在团队开发的时候很强大。VS有VisualSVN插件，但是我不喜欢用VS。vim电脑上也有，虽然很多人说vim也很强大，不过我目前还是习惯editplus。也许以后会去适应vim吧。

1、首先这里有一篇介绍，edtiplus使用SlikSVN来实现update、commit。当然前提也是要安装TortoiseSVN的。

2、发现了一个更好的介绍，是deitplus wiki里面的介绍的方法。

说明：方法1使用的SlikSVN是命令行端，方法2使用的TortoiseProc.exe是GUI端。

下面我把方法二的实现方法稍微翻译一下：

Subversion Commit
说明：用TortoiseSVN校检文件并提交文件到服务器（当然，前提是你安装了TortoiseSVN）
添加方法:编辑edtiplus 工具-用户工具-添加工具-程序
菜单文本：SVN Commit
命令: C:\Program Files\TortoiseSVN\bin\TortoiseProc.exe
参数: /command:commit /path:"$(FilePath)" /notempfile /closeonend:0
初始目录: $(FileDir) Check: "Capture output", "Save open files"
勾上"保存打开文件"。可以不勾"捕获输出"，根据个人喜好吧，我是没勾。

说明：TortoiseSVN 使用临时文件在 shell 扩展和主程序之间传递多个参数，（低于1.5.0版，必须增加/notempfile参数，如果不这样做，该命令将无法正常工作，/path指定的文件将被删除。）从 TortoiseSVN 1.5.0 开始，废弃/notempfile参数，不再需要增加此参数。

Subversion Update, Delete, Rename, Checkout etc
说明：SVN的更新、删除、重命名、校检等命令
方法：只需要把上面的"参数"里的：/command:commit 用下面的替换（例如：/command:about）

:about 显示关于对话框
:log 打开日志对话框
:checkout 打开检出对话框
:import 打开导入对话框
:update 将工作副本的/path更新到HEAD，如果给定参数/rev，就会弹出一个对话框询问用户需要更新到哪个修订版本。为了避免指定修订版本号/rev:1234的对话框，需要加上/nonrecursive和/ignoreexternals参数（这2个参数我没加，还没遇到上述问题）
:commit 打开提交对话框
:add 将/path的文件添加到版本控制
:revert 撤消一个文件自上次更新后的所有的变更
:cleanup 递归清理工作拷贝，删除未完成的工作拷贝锁定
:resolve 将/path指定文件的冲突标示为解决，如果加上/noquestion，将不会提示确认操作。
:repocreate 在/path创建一个版本库
:switch 切换至分枝/标记
:export 将/path的工作副本导出到另一个目录
:merge 打开合并对话框
:mergeall 打开合并所有对话框
:copy 复制工作副本至URL
:settings 打开设置对话框
:remove 从版本控制里移除/path中的文件
:rename 重命名/path的文件
:diff 启动TortoiseSVN设置的外置比较程序
:help 打开帮助文件
:relocate 打开重定位对话框
:help 打开帮助文件
:repobrowser 打开版本库浏览器对话框
:ignore 将/path中的对象加入到忽略列表，仅对文件夹有效。
:blame 打开文件的追溯对话框
:createpatch 创建/path下的补丁文件。
:revisiongraph 显示/path目录下的版本变化图。
:lock 锁定一个文件，可以输入锁定的原因。
:rebuildiconcache 重建windows的图标缓存，当系统图标缓存出了问题才需要这样做（会导致桌面的图标会重新排列）
:properties 显示 /path 给出的路径之属性对话框。

更多的命令看：tortoisesvn docs吧

我只用了update、commit、add、revert、rename、remove、export、lock、unlock、resolve、checkout、blame、merge，后面几个都是不太常用的了。

另外在editplus wiki，还发现了一个不错的东西：打开当前文件的文件夹（使用svn的话，这个功能就很实用了）。方法，在用户工具里添加-程序：
菜单文本：Current Location(当前文件的文件夹)
命令：%systemroot%\explorer.exe /e,/root,\local disk, 参数：$(FileDir)
初始目录：空着
勾上：退出时关闭窗口、保存打开文件

Friday, November 19, 2010

My IP

172.21.161.229

RNAseq FTP

172.21.162.91

xusheng

kahuna

Sunday, October 31, 2010

唐骏演讲实录

众人面前讲话紧张是低调的男人

　　在我刚刚复出，就收到了华夏时报的邀请，在这里我想说两点，第一，真的是非常非常的感谢华夏时报以及全国的很多人民，感谢他们对我的理解和厚爱。

　　第二我想说的是，华夏时报在这个时候邀请我来参加CEO的论坛，我觉得华夏时报真的是很有品位。

　　今天的唐骏和四个月之前的唐骏比起来是有很大的不同，CEO论坛这个讲坛对我来说并不陌生，但是今天，我站在这个讲台上，终有那么一点点寞然，也会有一点紧张，别看我这个粗糙的五官，还是很难掩盖我内心的紧张。我也查了一下书，书上说一个正常的男人，站在众人面前，会紧张的话，这个男人是低调的男人。

　　今天我们在座的每一位，很多都是CEO，或者说是我们未来的CEO，在我心目当中，我也有自己心目当中的CEO，一个是我为他工作了十年的我觉得在全球的企业界当中最受人尊敬的比尔盖茨先生。还有一个就是我觉得我们现在每个人都会感受到的风靡全球的苹果公司的CEO乔布斯，这两个人风格迥然不同，看看盖茨的高贵、儒雅，以及具有对未来判断的独特的见解，看看乔布斯那种超脱的创新，那种我行我素的心态，你不得不佩服他们。

　　在过去的十多年当中，我仔细观察了这两个人的共性和个性，这两个人在美国也好，在全球也好，他们俩既是竞争对手，又是两个风格完全不同的最受人尊敬的CEO，看了那么多年，居然我没有看到一点点他们的相同之处，只看到了一点，这一点是什么呢？也就是在我过去的十多年当中，我听了，看了盖茨的87 场演讲和录像，以及乔布斯的12场的演讲，他们俩只有一个个性，只要他能站到这个台上，他们的第一句话都是什么？我今天非常非常兴奋。这就是美国人的风格，美国人讲究的是什么？做一个好的CEO，他必须是自信的，必须是给别人带来震撼力的，必须是具有领导力的，特别是站在讲台上，他更希望是什么？更希望是给在座的每一位听众带来的是震撼，带来的是自信，带来的是未来。在中国人看来，这样的开场感觉有点高调，中国人最喜欢什么？中国人，我觉得更喜欢低调，什么叫低调？低调就是当你站在台上，你根本不紧张，但你却说你自己很紧张，就像我这样的，这就叫低调。

　　过去的2010年夏天，很热，特别对我来说，热的有点烦，什么叫烦？就是火字边的那个烦，而且是有火你还不能发，这叫什么，这叫很烦。所以在过去的一个夏天，我做了一件在我看来是一个非常非常难得的举动，我给自己放了三个星期的假。特别是在美国在西方生活过的人来说，简直是不可思议，因为对我来说，这是我在过去十六年的职业生涯当中，第一次给自己放了三个星期的长假。这个长假我选择了北欧的一个城市，度过了我三个星期，很多人不理解，为什么唐骏过去十六年当中居然没有给自己放过假？是的，我从来没给自己放过假，为什么？因为我们在职场当中，我和别人相比，我觉得没有这种智慧，没有这种天才，如果说在我过去十几年的职业生涯当中有过那么一点点成就，无论是说十年的微软，四年的盛大，还是我今天的新华都，如果有一点点成就的话，靠的是什么？靠的是运气。我觉得很多的一切对我来说好像做梦一样，感觉到所有的一切降临到我的头上的时候，我觉得好象不应该属于我，但是偏偏它属于了我，所以我只能归结成为我很幸运。

　　在微软十年我勤奋的令自己感动

　　如果再往下说，除了幸运之外，你还有什么特别的吗？我想了很久，我想不出来，但是唯一我觉得我可以拿出来说事的是什么？我比别人稍微勤奋那么一点点。什么叫勤奋？勤奋就是延长我的工作时间，这叫勤奋。我记得当年我在离开微软公司的时候，我真的无比的伤感，离开了一个我热爱十年的世界上最伟大的企业的时候，我给盖茨先生写了一封很长很长的信，据说这封信还很煽情，煽情到什么程度？我把这封邮件转发给了当时微软的一千多位中国的员工，据说很多的女员工读了这封信以后都流泪了。所以女人真的很善良很善良。

　　在微软的十年，我觉得我很勤奋，勤奋到什么程度？勤奋到连我自己都感动，你就知道有多勤奋。我给盖茨的信中，告诉他说，盖茨先生，我虽然不能说我是微软八万五千名员工当中最勤奋的员工，但是我也可以告诉你，在微软的八万五千名员工当中，没有一位员工可以站起来说他比我唐骏更加勤奋，其实讲的道理都是一样的。

　　勤奋是什么？作为一个好的员工也好，还是说作为一个优秀的CEO也好，我觉得在我们的职业的道路上，勤奋是成功的必备要素，而且是非常重要的要素。一个人没有了勤奋，我相信他离我们的成功会非常非常的遥远。一个人成功了，他如果还是勤奋的话，我相信他会从成功走向成功。回到我刚才的话题。 2010年的夏天，我做了一个非常伟大的举动，选择了北欧的一个城市，度过了我从未有过的三个星期。这三个星期，给了我很多很多的启示，也让我有了很多的感慨。

　　我感慨的是，今天的西方对我们中国真的是非常非常关注。关注中国，乃至于关注到我唐骏。我也感受到人民币的无比的威力所在。我更感受到了作为一个中国人，其实挺好。虽然我们都是中国人，但是有些时候我们会抱怨说，中国很多地方不规范，不合理，有些脏，有些乱，甚至还有一点点烦。但是我们想一下，如果中国没有了这些，你还会习惯吗，至少我不会习惯。所以我有了很多很多的启示。这样的启示我相信对我未来的人生，未来的职业生涯，也会带来很多很多的业绩。

　　我们在想，怎样做好一个CEO，怎么做一个成功的CEO，特别是当今中国成功的CEO？在我看来，如果要做一个好的CEO首先你必须是做好一个好人，这种好人是什么？就是一种正直、阳光的人。什么叫阳光？在我看来阳光就是你永远是看社会正面的那一面，永远看到的是未来，这就是阳光。如果你拥有了这样的阳光，我相信你的人生，你的未来，你的职场，陪伴着你的一定是灿烂的阳光，就像我们经常说一个人你想要得到别人的喜爱，你首先要去喜爱周围的人，当你喜欢上你周围的人以后，你就会发现，周围的人也在喜欢你。

　　就像一个好的CEO一样，你希望你的员工爱戴你、尊敬你、跟随你，最重要的是什么？你要发自内心地去喜欢你的员工，去尊重你的员工，去感谢你的员工，这个时候，我相信员工也都会从他内心深处来感谢你。

　　作为一个好的CEO，同样，你要去学会包容一切，包容别人所犯的错误。我相信我们在座的每一位CEO，每一位在企业当中做高管的人，你的部下哪个不会犯错误。你想想看，如果他不犯错误，和你一样的境界的话，那你觉得你这个CEO还做得长吗？你做不长了。如果你的员工每一个人，他的价值观也好，或者说他对市场的敏锐也好，或者对技术的深度也好，都超过了你的时候，你还可能是领导吗？所以我们要去包容。

　　同样的包容是什么，当周围有些对你不理解的时候，你应该告诉你自己说，这可能是信息不对称所造成的。当有很多的人对你有不同的理解，不同的看法的时候，什么是包容？这个包容就是说没关系，时间会证明一切的，这就是我们说的包容。

　　我相信做一个成功的CEO，你真的需要具备很多很多别人不具备的东西。因为你是领袖，你是领军人物，你需要什么？你需要周围的人都在跟随你，你是他们的榜样，千万不要忽略这个榜样的力量。当你成为了榜样以后，你才有可能去领导这样的企业。

　　同样，今天我们的中国，已经是一个全球化的局面，这个全球化是什么，也就是说无论是我们的产品也好，还是我们企业的未来也好，其实我们和世界已经联系在一起，今后可能你真的不会去太在乎说，这个是中国的品牌，还是世界的品牌，这个是中国式的管理，还是说是世界式的管理。我觉得未来我们都必须走向世界，走向世界是什么？就是用世界的眼光来看企业，用世界的境界来要求自己。这就是我们作为一个全球化的企业的CEO应该有的标准。

　　几年前我曾经带过一群企业家去了法国的巴黎。巴黎大家都知道，是一个非常浪漫的地方，虽然有很多人在罢工，但是它还是很浪漫。因为中国人富裕了，中国人期望享受更多的美食，所以我们团队的七八个人，他们要求我安排一个巴黎最好的法国菜的餐厅去用餐。当我们选择了一家巴黎市中心最高档的法国餐厅的时候，因为对中国人来，最贵的就是最好，所以我们就选择了一家巴黎的法国餐厅，一进去我就跟他们说，能否给我们安排一个包厢，这个包厢的概念我相信只有中国才会有。为什么需要包厢我现在也查不出来，但是在西方，是没有包厢这个概念的，西方人的概念如果要包厢你还不如在家吃呢，家里是最大的包厢。后来在我的一再要求之下，她给我安排了一个相对比较隐私的角落，这个时候我们开始点菜。点菜之前服务员说你们需要什么红酒，我马上跟他说，我说我们难得来到法国，所以我们想法国最好的葡萄酒，他跟我们讲了一大堆葡萄酒，我们一个也不熟悉，因为在我们的耳朵里我们只听过拉菲，觉得那是全世界最好的红酒。他说我们这里没有这种酒，但是我们有另外一种非常好的，我说能不能给我们来最贵的一种，因为中国人的价值观很简单，越贵的东西越好，他给我挑了一款非常贵的酒，650 欧元一瓶。我马上跟他们说，我说你先给我们来五瓶吧，法国人没听懂，七个人喝五瓶酒是什么概念？我说你不用担心，我们有钱和信用卡。他还是不懂，因为我说先来五瓶，因为我知道我的同伴的酒量是什么样的，中国人跟外国人的思维方式是不一样的。中国人希望的是什么？中国人希望的是结果，喝酒喝酒，不就是把自己喝到漂的感觉吗？西方人是什么？西方人喝酒是品尝这个过程，就像西方人喝咖啡一样，是那种慢慢的品尝，而中国人是把它当饮料来喝。这就是东西方文化的差异。

　　他还是不理解，拿了两瓶来。我们就把这两瓶红酒倒了四杯还没倒满，这时候法国人看不下去了，他说跟我出来一下，我马上拿出我的信用卡说，你放心，我们会付钱的，他说这不是钱的问题。我们接受不了，我说你怎么可能接受不了呢，我付钱喝你的酒。你有什么接受不了的？他说你们中国人喝酒的方式我们法国人接受不了，不适合我们餐厅的品位和风格，所以我说品位真的是很重要很重要。

　　所以他说我下面提个要求，这个要求就是，请你们尽快离开这个餐厅。你想，我好不容易找到一家这儿优秀的餐厅，这么昂贵的餐厅，我也不喜欢离开啊，没面子啊，那么人看着我们中国人被人轰出去，或者是什么呢，你按照我们法国人喝酒的方式喝酒。这时候我很尴尬，我回到了中国企业家的代表席，我说我跟你们商量一件事，我说这个地方的红酒最多只能喝两瓶，喝完换一个地方，这是法国人喝酒的方式。他说不可能吧，我们以前也在法国其他餐厅喝过，我说这是法国的NO.1餐厅，人家讲究格调，最多喝两瓶，就这样我们还是选择了留在了所谓的法国巴黎最好的餐厅。当然你也知道，我们中国人讲究是的结果，我们出了这个餐厅又走到了另外一个酒吧，因为还没有达到目的，我们的目的是一定要让自己喝到飘飘然，这就是中西方文化的一种差异。

　　做一个未来的全球的CEO，我希望我们都应该理解这样的差异所在，同样我相信，西方人来到中国，他们也会感到文化的壁垒所在。当年我在微软的时候，我的一个微软的高管叫劳瑞，学了很多的中文，会讲一点，所以我经常带她去见一些中国的客户。中国的客户一听到我说这个美国人会讲中文，每个人都很兴奋。就像我们中国人如果到了美国不会讲英文会觉得很难堪一样。这个我觉得很奇怪，英文现在是国际性语言，中文还不是，所以劳瑞在酒席上吃饭的时候，大家不断地跟她交谈一些中文。其中有一个领导站起来说，对不起我要出去方便一下，劳瑞没有听懂，什么叫方便一下。马上另外一个领导跟他解释说，在中国上厕所叫方便，这个时候劳瑞说，原来上厕所叫方便，所以她中间离开，也跟大家说我出去方便一下，学的很快。

　　最后临走的时候，中国领导非常友好。跟劳瑞说，你在中国还会待一段时间吗，劳瑞说未来一周我都在中国。这时候领导说，那么下次在你方便的时候，我去见你。这个时候劳瑞看着这个领导，感觉不知道说什么好，非常尴尬。这时候领导看出来她的尴尬，马上反问一句，你平时不方便吗？劳瑞说我每天都方便的。领导说，那就好。下一次等你方便的时候我赶过去看你。这个时候劳瑞用英文跟我说，她居然要在我上厕所的时候赶过来看我，你们中国男人太猛了。

　　在中国做企业要运用东方智慧

　　不是我们中国男人太猛，是什么？是这种文化的差异。这种文化的差异来自什么？来自于语言的差异。因为我觉得一个语言其实就是代表着文化。你不觉得吗？你看看我们中国的字，它是有文化的，它是经过五千年的文明和历史，造就了当今中国的汉字的文化。你再看西方的英语，你说它能复杂到什么程度呢？所以相对来说，西方人跟我们东方人比起来，要比较直接一点，相对简单一点，作为一个全球性的企业，我觉得一个好的CEO，应该是融合中西方的优秀的文化的东西，一个好的民族，未来最应该去吸纳各个民族的长处，这个民族才有可能不断地壮大和发展。所以我们也要去学习，就像我们的企业管理当中，我们说周围式企业管理应该是怎么样的，中国式企业应该是吸收一些西方的管理的模式，管理的架构，管理的理念，这些理念当中，一个非常重要的因素，也是我非常认同的，也是我早年经常提出来的，是这种简单的模式。

　　就像我们的商业模式一样，什么是好的商业模式，好的商业模式就是简单的商业模式，如果你的商业模式当中有太多需要依赖别人的话，这样的商业模式成功的可能性非常小。所以我们说，我们应该崇尚简单，真正意义上的崇尚简单，才有可能让我们的企业在管理上面变得更加的完善。

　　所以简单也是在我们的企业管理当中非常崇尚的，只有当你的企业在管理模式上，管理架构上，就像人与人之间的关系，什么样的关系让你变得最轻松自如，我相信这种简单的朋友关系也好，还是说简单的员工关系也好，还是说简单的任何关系也好，这都会给你带来一种最舒畅的关系。

　　所以我们说，在企业管理的过程当中，我们应该崇尚简单，只有崇尚简单，才有可能让企业走得更远，如果你的企业变得很复杂，在管理当中你的企业复杂了以后，你会发现什么？你会发现你企业的很多的时间都是在消耗在解决企业内部的复杂的人事关系当中，这就是企业的一大损失所在。

　　同样我们做CEO特别是当今中国的CEO，同样我们需要关心政治，因为中国的文化，中国的制度，让我们在座的每一位CEO，我们不得不去关心中国的政治。

　　怎么去关心，刚刚结束的十七大五中全会，我相信你们没有太多地关注，我可以告诉你我是怎么关注的。整个的十七大五中全会，我几乎是全程参加，当然是通过电视的方式。为什么要参加？因为我希望从大会精神当中，了解到我们下一个五年计划，未来中国会发生什么样的变化，这对我们企业又会带来什么样的影响。在这样的影响之下，我们怎么应对它。这不就是为了企业的发展吗？这也是在中国企业发展的非常重要的因素所在。我记得几年前我和我最尊敬的CEO盖茨先生有过这么一个对话，这个对话我觉得让我也印象非常深刻。盖茨先生跟我说，他说过去的经历当中，他接待过无数中国的客人。在微软公司，他还记得他接待的第一位中国客人是一个中国政府的官员，他在微软公司的总部西雅图接待了中国政府官员，让他印象非常深刻。这个官员只跟他讲了三句话，第一句话是什么？美国这个国家很美丽，盖茨听了非常开心；第二句话说，微软公司是全世界最著名的IT公司，微软公司当然是，所以盖茨也非常开心；第三句话说，微软的员工对他们的接待非常的周到，盖茨说我当时就被震撼了，为什么震撼了？因为中国人真的是太有全局观念了。从美国讲到微软，从微软讲到员工，方方面面虽然只讲了三句话，这三句话覆盖面是非常全的，所以他对中国官员非常佩服。他说为什么，我说还用问为什么吗？因为我们中国在学三个代表。

　　他说第二次我接待的是中国企业界的领袖。所谓的企业家，这位企业家跟盖茨也说了三句话，这三句话是什么呢？第一，美国国家很美丽，第二，微软是个非常著名的IT公司，第三，微软公司的员工的接待非常周到，盖茨说当我再次听到中国人讲这个的时候，我真的再次被震撼了，我说你不要被震撼，为什么？因为中国人都在学三个代表，而且接下来你还会碰到。

　　他说果然第三次我又听到中国的代表跟我说，第一中国，第二微软，第三盖茨马上说了，你是不是要说我们的接待很好。他说对，但是我还有第四，盖茨说是什么？他说你们微软公司的发展的速度、观念非常非常好。他说我一下蒙了，这是为什么。我说这还用问为什么吗？你知道吗？我们中国正在学习科学发展观。

　　他说为什么中国人都有那么大的全局观，我说这就是中西方文化的差异，我们中国人永远是把国家放在第一位，自己是放在第二位，而看看你们美国，你们美国人永远是把个人放在第一位，没有第二位，这就是中西方文化的差异。

　　再谈学历事件：不知道错在哪里

　　再回到我刚才的开场。前几天，我跟一个媒体的朋友有过一个，我觉得让我无比感触的交谈。这个媒体朋友跟我谈，你想现在都是跟我谈所谓的七月的风波。他说唐骏，你说说，这个七月的风波，其实跟你有什么关系？再说了，我们在乎的是你的能力，根本就不会在乎你的学历。我说你不能这么说，你这么说好像我没有学历似的。他说你知道吗，这件事的问题在于什么？在哪里你知道不知道？我说这个问题所在要么就是信息不对称，要么就是很多人误解了。他说都不是，这个问题的最大之处是谁，是你唐骏有了问题，我一听，我觉得这个启发很有新意。

　　我说我怎么了？我哪里错了？他说你错的地方就是，你没有第一时间站出来承认你的错误。我说我错在哪里都不知道你怎么叫我承认错误呢？他说这就是你不对的地方，在中国人眼里你做对什么做错什么没有那么重要，重要的是什么？你认不认错。你认错的是什么？认错了就说明你的态度，你不认错是什么？这是一个态度的问题。

　　如果一个人有态度问题的话，人家会放过你吗？我说我态度肯定是没有问题，如果我有错我会认错，如果没有错我就是不认错。他说这是美国式的思维方式，在中国你先把错认了，然后再来讲到底是错的还是对的，我木然地看了看他。他说你再想想，文化大革命期间，我们很多的领导干部，不是被打倒了吗，我说是的，我爸就是。他说很多的领导干部文革期间根本没有错，但是他们认错，我说我爸也是。他说你有没有看到，这些认错的干部过一段以后又平反了，平反以后比原来的级别又提升了一级，我说我爸更是这样，我爸原来是科级干部，后来提到了副处。

　　但是听完这席话我真的是无比感慨，感慨的是什么？其实在这个场合，我最想说的是什么？我错了，虽然我不知道错在哪里，虽然我更想说，你们尽快帮我拨乱反正。但是后来一想也不对，如果我说了这句话，又是什么，又是态度问题。

　　再次谢谢大家，我的态度很好，谢谢！

英文倍数

前几天看了一篇关于将英语倍数翻译成中文的小文章。不看不知道，看了才发现“倍数”的问题似乎挺别扭的。别说是英文了，就是中文，也不见得看得懂，被绕得稀里糊涂。后来花时间仔细研究了一下，感觉到倍数的问题，既不是英语表达的问题也不是中文翻译的问题，实际上是数学概念问题。如果数学概念没有搞清楚，那么中文、英文都不无法理解。比如：讲到增加多少倍的时候，包括还是不包括自己？包括自己是一种说法，不包括自己又是一种说法，中文用词是很讲究的。讲到减少的时候，原来的除法换成了乘法来计算，而且也有两川说法，即计算减少的倍数，还是计算减少剩下的净值？下面，作一个简要的叙述。

一、倍数的增加

1、“是……倍”与 “大、多、长……倍”

[中文] 这本书的篇幅是那本书的3倍。
[英文] This book is three times as long as that one.
或者：This book is three times longer than that one.
或者：This book is three times the length of that one.

中文里，“是……倍数” 应该包含自己。在这个例句中，假设那本书(A) 100页，“3倍” 就是300页。

但这句话还可换个说法：

[中文] 这本书的篇幅比那本书长2倍。(或者：大2倍、多2倍等)

开始看的时候，也被绕了一下：明明说A是B的3倍，这儿怎么变成了2倍了呢？仔细想了一下，才明白了中文的玄乎之处。这里所说的“大、多、长”等等，相当于说的是“绝对值”，或者说是“净值”。也就是说，它不包含自己。例如：假设B是200，A是800，可以有两种说法：A 是B的4倍(包括自己在内)；或者A比B 净多3倍(不包括自己在内)。

同时，英文的表达也要注意一点，就是，同级比较 three times as long as 与比较级 three times longer than 的意思是一样的。特别是后面这个比较级，也表示“是……倍”，而并不是净值的“大、多、长” 等等。

2、“增加到、增加至”与“增加、增加了”

例如：

increase 5 times / 5-fold [直接使用increase ]

increase to 5 times [使用介词 to ]

increase by 5 times / by a factor of 5 [使用介词 by ]

以上三种英语表达方式的意思是一样的。但在中文里也有包含不包含自己，即净增值的问题。一种翻译是：“增加到(至) 5倍” (或者说“是……5倍)，这种译法包含自己；另一种译法是“增加、增加了4倍”，这种译法为净增值，不包含自己。我们在将英语翻译成中文的时候，要特别注意这个细节上的差别。但是如果从英语的角度看，“增加到(至)”与“增加、增加了”是同一回事，都采用包含自己的表达方法。请比较下列例句：

The production of integrated circuits has been increased to three imes as compared with last year。
集成电路的产量比去年增加了2倍。(是去年的3倍)

The output of chemical fertilizer has been raised five times as gainst l986.
化肥产量比1986年增加了4倍。(是1986年5倍)

That can increase metabolic rates by two or three times。
那可使代谢率提高到原来的2倍或3倍。(这句是“增加到……”，等于“增加了……1倍或2倍)

The drain voltage has been increased by a factor of four.
漏电压增加了3倍。(是原来的4倍)

A record high increase in value of four times was reported.
据报道,价值破记录地增长了3倍。(是原来的4倍)

[注] 在这类句型中increase常可换成 raise, grow, go, step up, multiply 等词。

3、twice, again 与 double, treble, quadruple

上面几个英语单词，指增加特定的倍数。

其中，twice 和 again 相当于“是……2倍”，即包含自己。如果换成净增值，则是“增加、增加了” 1倍。例如：

A is as much（large，long，…）again as B.
A is twice as much（large，long，…）as B.
A 是B 的2倍。(或者：A比B多1倍。)

A is half as much（large，1ong，…）again as B.
A is one and a half times as much（large，1ong，…）as B.
A是B的1.5倍。(或者：A比B多一半。)

但是，double (增加1倍)，treble (增加2倍)，quadruple (增加3倍)，这几个词都属于“增加、增加了”，即是净增值，不包含自己。例如：

The efficiency of the machines has been more than trebled or quadrupled.
这些机器的效率已提高了2倍或3倍多。(是原来的3倍或4倍)

二、倍数的减少

上面说了倍数增加的几种情况。再来说倍数减少的情况。倍数减少更加复杂，不仅有净减值的问题，还一个分数比转换的问题。例如：假如说A比B减少了3倍，用分数表示就等于说“是原来的三分之一倍，即除法变成了乘法，相当于 A ÷ 3 = A × 1/3 。

所以，倍数的减少，实际上纯粹是个数学的问题，而不是英语表达的问题了。翻译者头脑一定要清楚。

1、目前剩下的净值“是……1/n”

这种方法在中文里最常用，即现在是原来的1/n。它表示的是减少后目前所剩下的净值。

例如：

decrease 6 times／6-fold
decrease by 6 times
decrease by a factor of 6

以上三式都表示“减少后剩下的净值是原来的1/6”。decrease 还可换为 reduce, shorten, go, slow, down 等词表示。

再例如：

A is 10 times as small（light，slow，…）as B.
A is 10 times smaller（lighter，slower，…）than B.

两句的意思相同，都表示：A的值是B的1/10，即假设原来是10公斤，现在只有1公斤了。

请看例句：

The hydrogen atom is near1y l6 times as light as the oxygen atom.
氢原子的重量约是氧原子的1/16。（即氢原子的净重量是氧原子1/16）。

This sort of membrane is twice thinner than ordinary paper.
这种薄膜比普通纸张要薄一半。（即薄膜的厚度是普通纸的1/2）。

2、为了表示出减少后的净值，也可直接说重量、产量、数量等“减至、减少到”等词语。也就是说，“减至、减少到”等词语表示的是减少后目前剩下的净值了。例如：

There is a 8-fold decrease／reduction…
某种东西的净重量(产量等)减少至1/8。

A rapid decrease by a factor of 7 was observed.
发现迅速减少到1／7。

The principal advantage of the products is a two-fold reduction in weight.
这些产品的主要优点是重量减轻了1／2。

3、只谈减少的倍数，而不谈剩下的净值。

中文里还可以用另外一种方法表示“减少”的情况，即：只谈减少的倍数，而不谈剩下的净值。请看例句：

Switching time of the new-type transistor is shortened 3 times.
新型晶体管的开关时间缩短了2/3。(即：缩短后剩下的净值是原来的1/3)

When the voltage is stepped up by ten times，the strength of the current is stepped down by ten times.
电压升高9倍，电流强度便降低了90％。( 现在降低后剩下的净值是原来的1/10)

The equipment reduced the error probability by a factor of 5.
该设备误差概率降低了4/5。(降低后现在剩下的净值是原来的1/5)

三、比……多

中文里在数量表示上还有一些笼统的说法，如：100多人，足足1小时，20多个等。英文也有一些特殊的表示方法，现举例如下：

A hundred and one factories (100多个工厂)
A long hundred 100 (100多个工厂)

Twenty odd (二十多个)
twenty and odd (二十多个)

A small gross 十打（120个）

A long hour 足足1小时

A score二十

A decade 十

最后说明一点：这篇文章所谈的观点，可能只是原作者的一家之言。究竟是不是这样，还请读者参照其他文章决定。本文只能作为参考。

Thursday, October 28, 2010

GeneChip® Mouse Expression Set 430

index | previous question | next question
What is a probe set? What do the different suffixes attached to a probe set name mean?
A probe set is a collection of probes designed to interrogate a given sequence. A probe set name is used to refer to a probe set, which looks like the following:
12345_at or 12345_a_at or 12345_s_at or 12345_x_at
The last three characters (_at, in RED) identify the probe set strand. Probe sets that are designed to detect the anti-sense strand of the gene of interest are annotated with "_at".
There are different types of probe sets that can result from the probe selection process. Most probe sets have an extension of an underscore and a letter to designate the probe set type, except for unique probe sets. These different probe set types are shown in the example above in BLUE.
Probes in a gene family probe set (_a set) all cross-hybridize to the same set of sequences that belong to the same gene family (i.e. having same name in the "geneCluster" column). This probe set type is only created if the "geneCluster" column is included in the Instruction File and contains information.
Probes in a unique probe set do not cross-hybridize to any other sequences in the design (including any additional pruning sequences provided).
Probes in an identical probe set (_s set) all cross-hybridize to the same set of sequences that are used for the design (including any additional pruning sequences if provided). These sequences are not defined as from the same gene family for one the following reasons: the values in the "geneCluster" column are different, or the gene family information is not provided.
Probes in a mixed probe set (_x set) contain at least one probe that cross-hybridizes with other sequence(s) used for the design. Cross-hybridizing probes have a cross-hybridization penalty applied to their raw probe scores, and thus, favoring unique probes of the same quality over cross-hybridizing probes.
The following diagram is a graphical representation of these different probe set types.

blat parameters

standalone blat:

blat search:
blat -stepSize=5 -repMatch=2253 -minScore=0 -minIdentity=0 database.2bit query.fa output.psl

Thursday, October 7, 2010

node problem

>> $ tentakel -g compute /bin/hostname
>> $ tentakel -g power /bin/hostname

Saturday, October 2, 2010

grant writing

http://www.spo.berkeley.edu/Links/writing.html

角斗士

我爱背单词8：0003147249

我爱背句子2：0999690578

角斗士奇迹阅读2:0012963827

Tuesday, September 28, 2010

UCSC genome

User: 'browser', password: 'genome' - full database access permissions
User: 'readonly', password: 'access' - read only access for CGI binaries.
User: 'readwrite', password: 'update' - readwrite access for hgcentral DB

Friday, September 24, 2010

Interpreting samtools flagstat output

I get the following output from running running the flagstat command in samtools for one particular bam file:

22283920 in total

0 QC failure

0 duplicates

20536595 mapped (92.16%)

22283920 paired in sequencing

11141960 read1

11141960 read2

17996862 properly paired (80.76%)

19980715 with itself and mate mapped

555880 singletons (2.49%)

852545 with mate mapped to a different chr

92547 with mate mapped to a different chr (mapQ>=5)

I am not sure what the 8th and 9th lines mean:

line 8: 17996862 properly paired (80.76%)

line 9: 19980715 with itself and mate mapped

I would have thought (looking at line 8 alone), that this means that 17996862 of the 20536595 mapped tags are "properly" (on the same chr, within the limits of the allowed insert size and same(?) orientation) paired with their mate. But I have no idea what line 9 means. Any help would be greatly appreciated!

Thanks

samtools flagstat output interpretation description
asked Jun 17 at 15:31

sasseq
16●3
One Answer: oldestnewestmost voted
2
At the risk of being overly technical, line 8, "properly paired", simply means the properly paired flag is set - it is entirely up to the program writing it to define "proper". Generally, this is set by an aligner to indicate what you describe, same chr, opposite orientation, and a within few deviations from the expected insert size. But this varies depending on the aligner used.

Line 9, means that both the forward and reverse are mapped -- somewhere, anywhere. Properly paired is a subset of these reads. You'll also find that the total of line 9 and the read pairs in which only one tag is successfully mapped (singletons) is equal to the total mapped reads. 19980715 (both mapped) + 555880 (singletons) = 20536595 (total mapped reads).

Lastly, since the sum of "proper pairs" and "itself and mate mapped to different chr" is less than "itself and mate mapped", it appears your aligner does require additional directionality or distance constraints for a "proper pair".

~J

answered Jun 22 at 10:24

jmanning2k
171●11

Thursday, September 23, 2010

trackDb

UCSC genome browser
Kent source tree

Get it here: http://hgdownload.cse.ucsc.edu/admin/jksrc.zip
Or view it online: http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/

Important README files: kent/src/product/README.xxx
System design

The genome browser is composed by following parts:
For service
server-side CGI binaries and HTML files
hg.conf configuration file
For administration
utility tools
.hg.conf file
data
stored in Mysql
stored in external files (optional, such as bigBed, bigWig, ...)
Design principle:
Each species will occupy a Mysql database. Inside each database:
each track correspond to one table, and each track belongs to a group
one trackDb table exists for one species, used to describe tracks
each record of the data in trackDb defines one track
one grp table to hold group info
one hgFindSpec table
Apart from databases for individual species, special databases exist:
hgFixed
hgcentral

Constructing new database

Download kent source and compile everything. See src/product/README.trackDb on how to compile everything.
Create hg.conf file at /cgi-bin/ directory. Sample file can be found at: http://genome-test.cse.ucsc.edu/~kent/src/unzipped/product/ex.hg.conf
Create .hg.conf file at your home directory:
$ cat > ~/.hg.conf
db.host=127.0.0.1
db.user=hguser
db.password=hguser
db.trackDb=trackDb
db.grp=grp
central.db=hgcentral
central.host=127.0.0.1
central.user=hguser
central.password=hguser
$ chmod 600 ~/.hg.conf
Create mysql database, make sure all folders at mysql directory are with user/grp of mysql.mysql. If not, will throw "errno: 13" when trying to modify it!!
To add new tracks:

Refer to general guide: kent/src/product/README.trackDb
generate track data in appropriate format (http://genome.ucsc.edu/FAQ/FAQformat.html)
Create a table to hold the track data. Need to identify the format of the track data file, and use corresponding loader program to load it. Following example is for bed format:
$ hgLoadBed dbName trackTableName file.bed
Realistic example: $ hgLoadBed -noBin -bedGraph=4 hg19 track_name data.bedGraph
Loader program source locates at: kent/src/hg/makeDb/
Create/update the trackDb:
Compose a new trackDb.ra file (by editing the old one) with configuration section for the new track
Example *.ra files: src/hg/makeDb/trackDb/[organism name]/[genome version]
Information on trackDb options: src/hg/makeDb/trackDb/README. Also see next section.
Compose a makefile at the place where trackDb.ra file resides. Could be like:
trackDbSql=/home/cgs/twlab/xzhou/kent/src/hg/lib/trackDb.sql
DB=hg19
all::
~/latest/hgTrackDb . ${DB} trackDb ${trackDbSql} .
Run $ make all to update the trackDb table.
Also tracks could be in bigBed or bigWig formats. See: http://www.mail-archive.com/genome@lists.soe.ucsc.edu/msg00924.html

hgsql hg19 -e 'drop table if exists myLocalBigWig; create table myLocalBigWig (fileName varchar(255) not null); insert into myLocalBigWig values ("/gbdb/hg19/bbi/myLocalBigWig.bw");'
About trackDb.ra files:

Example files for human hg19 can be found at: kent/src/hg/makeDb/trackDb/human/hg19/
use blank lines to separate tracks
each line begins with an attribute name and value, separated by space
Configurations
field: track
track trackName [override]
field: type
a lot of types there...
Track height: use maxHeightPixels 128:32:16
Track color: color r,g,b
Sub groups: subGroup1 sampleType Sample_Type fetalK=fetalK CD34=CD34 ....

Thursday, September 16, 2010

session for UCSC genomebrowser

i've enabled wiki track with sessions by these steps:
1.
Download and install mediawiki , better in a root subfolder of your mirror

2.
As written in kent/src/hg/wikiMod/README
$IP = MEDIAWIKIINSTDIR
cd kent/src/hg/wikiMod/
cp SpecialUserloginUCSC.php $IP/includes/specials/
cp SpecialUserlogouUCSC.php $IP/includes/specials/
cp configuration.SpecialUserloginUCSC.php $IP/SpecialUserloginUCSC.php
cp configuration.SpecialUserlogoutUCSC.php $IP/SpecialUserlogoutUCSC.php
cp UserloginUCSC.php $IP/includes/templates/

at the end of $IP/LocalSettings.php add these rows :

require_once( "$IP/extensions/SpecialUserloginUCSC.php" );
require_once( "$IP/extensions/SpecialUserlogoutUCSC.php" );
require_once( "$IP/includes/templates/UserloginUCSC.php" )

3.
Check if db "hgcentral" has this table: namedSessionDb.
If not use this sql to create it : CREATE TABLE namedSessionDb (
userName varchar(64) not null, # User name (from genomewiki).
sessionName varchar(255) not null, # Name that user assigns to
this session
contents longblob not null, # CGI string of var=val&... settings.
shared tinyint not null, # 1 if this session may be shared with
other users.
firstUse datetime not null, # Session creation date.
lastUse datetime not null, # Session most-recent-usage date.
useCount int not null, # Number of times this session has been
used.
settings longblob not null, # .ra-formatted metadata
#Indices
PRIMARY KEY(userName,sessionName)
);

3a.
rsync your local goldenPath/html dir with remote
htdocs/goldenPath/html/ dir.

4.
I've installed mediawiki in a subfolder of a root folder. I called it w.
Modify hg.conf adding these rows:
wiki.host=SERVER/w # w is MY installation dir of mediawiki.
wiki.userNameCookie=wikidbUserName ######### use the firebug to get the right name for wikidbUserName wikidbUserID and wikidb_session
wiki.loggedInCookie=wikidbUserID
wiki.sessionCookie=wikidb_session
wikiTrack.URL=http://SERVER/w
wikiTrack.browser=SERVER
wikiTrack.dbList=hg18,mm9,hg19

Note. If you have problem with genome session login, please check with
firefox live_http_header plugin what data is passed after login from
mediawiki. And modify wiki.userNameCookie or wiki.loggedInCookie or
wiki.sessionCookie values.

#LDAP AUTH To use ldap authentication with mediawiki download
LdapAuthentication extension from here:
cd MEDIAWIKIINSTDIR/extensions/
wget -v
http://upload.wikimedia.org/ext-dist/LdapAuthentication-trunk-r65285.tar.gz
tar xvzf LdapAuthentication-trunk-r65285.tar.gz

cd MEDIAWIKIINSTDIR

modify LocalSettings.php adding these rows (at the bottom! ) :

#enable LDAP require_once(
"$IP/extensions/LdapAuthentication/LdapAuthentication.php" );
$wgAuth = new LdapAuthenticationPlugin();

#$wgLDAPDebug = 3; //to enable ldap debug

$wgLDAPDomainNames = array( "CAMPUS" );
$wgLDAPServerNames = array( "CAMPUS" => "LDAPSERVER");
$wgLDAPSearchAttributes = array('CAMPUS' => 'uid');
$wgLDAPBaseDNs = array( "CAMPUS"=>"YOUR BASE DN" );
$wgLDAPEncryptionType = array( "CAMPUS" => "tsl" );

Add this row :
$this->mName=strtolower($this->mName);
after row 763 of SpecialUserloginUCSC.php in MEDIAWIKIDIR/includes/special/
#END LDAP AUTH

Wednesday, September 15, 2010

bwa from others

Here's what I use for bwa alignment (without removing PCR dups).
You can replace the paths with your own and put into a bash script for automation
comments or corrections welcome!

#Visit kevin-gattaca.blogspot.com to see updates of this template!
#Creates colorspace index
bwa index -a bwtsw -c hg18.fasta
#convert to fastq.gz
perl /opt/bwa-0.5.7/solid2fastq.pl Sample-input-prefix-name Sample
#aln using 4 threads
bwa aln -c -t 4 /data/public/bwa-color-index/hg18.fasta Sample.single.fastq.gz > Sample.bwa.hg18.sai
#for bwa samse
bwa samse /data/public/bwa-color-index/hg18.fasta Sample.bwa.hg18.sai Sample.single.fastq.gz > Sample.bwa.hg18.sam
#creates bam file from pre-generated .fai file
samtools view -bt /data/public/hg18.fasta.fai -o Sample.bwa.hg18.sam.bam Sample.bwa.hg18.sam
#sorts bam file
samtools sort Sample.bwa.hg18.sam.bam{,.sorted}
#From a sorted BAM alignment, raw SNP and indel calls are acquired by:
samtools pileup -vcf /data/public/bowtie-color-index/hg18.fasta Sample.bwa.hg18.sam.bam.sorted.bam > Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup
#resultant output should be further filtered by:
/opt/samtools/misc/samtools.pl varFilter Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup | awk '$6>=20' > Sample.bwa.hg18.sam.bam.sorted.bam.raw.pileup.final.pileup

bwa 4 bxd29

~/Biosoft/bwa-0.5.8a/bwa aln -c -t 8 ~/GenomeSequence/WholeGenome.fa bsd29mfrag_6_28_10_Sample1_fastq.single.fastq >bsd29mfrag_6_28_10_Sample1_fastq.sai &

~/Biosoft/bwa-0.5.8a/bwa samse ~/GenomeSequence/WholeGenome.fa bsd29mfrag_6_28_10_Sample1_fastq.sai bsd29mfrag_6_28_10_Sample1_fastq.single.fastq >bsd29mfrag_6_28_10_Sample1.sam

Saturday, September 11, 2010

writing comparison

more + noun + than

adverb:
more: significantly more; slightly more; considerably more;
less: substantially less;
as many + noun + as ... : Nearly as many as

while/however/by contrast/in comparison with/although

more to one of the ...

Tuesday, September 7, 2010

corpurus

http://xpq.blogspot.com/2007/04/blog-post.html

Saturday, September 4, 2010

stylewriter

http://sciencenet.cn/m/user_content.aspx?id=328098

英文学术论文的语言技巧

[转帖] 英文学术论文的语言技巧

a)如何指出当前研究的不足以及有目的地引导出自己的研究的重要性
通常在叙述了前人成果之后，用However来引导不足，比如
However, little information..
little attention...
little work...
little data CRS通信学社3 S3 Z3 E. V7 [$ G
little research 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研, I+ R4 r* {1 U' M7 z" Y/ g7 J
or few studies 通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研, b0 [* H( K K7 g6 R& [
few investigations... 9 U: {- \, Z& z7 m
few researchers...
few attempts...
or no 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研3 C# I2 R! ?4 z8 P% m/ O; }
none of these studies
has (have) been less www.crs001.com, `' N7 T! y0 |& p5 z O7 H- l
done on ... 通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研# S$ [& p! h2 h9 Y; _
focused on 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研; j. |! w7 _0 w" y" p! f$ d% f
attempted to CRS通信学社- ?+ O, m3 x3 s- c0 T& R
conducted
investigated 0 ~6 x- {( G. @; Q% w! [% e
studied www.crs001.com# W9 m, v6 L- |" g
(with respect to)
Previous research (studies, records) has (have)
failed to consider
ignored
misinterpreted 通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研2 H7 q |0 d( M, {& J
neglected to
overestimated, underestimated
misleaded 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研# x6 h5 v2 S1 a. ^- d$ `
thus, these previous results are www.crs001.com6 T3 ~8 s8 G0 Y0 g; a' }( \+ ?
inconclusive, misleading, unsatisfactory, questionable, controversial..
Uncertainties (discrepancies) still exist ... www.crs001.com$ o) _4 K, H0 K0 Q5 D* s
这种引导一般提出一种新方法，或者一种新方向。如果研究的方法学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研) D# b! e% U* ]* z9 t5 \. Z q! J
以及方向和前人一样，可以通过下面的方式强调自己工作的作用：
However, data is still scarce 8 x; j; y2 ?# d7 j s8 h
rare www.crs001.com3 Q) c1 D5 u8 N6 g4 \/ x9 U
less accurate - 通信学术论坛第一站。科研,学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研" V1 j: V5 ]. g0 v1 F% F
there is still dearth of
We need to 通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研% M0 W2 P5 @4 o: u
aim to 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研4 L5 ?, r# i( \4 Y# R$ R
have to 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研. ?- {, ]* h4 [
provide more documents
data 通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研: p' I; G& W. k5 A$ ~' S4 `; k) Z
records
studies www.crs001.com9 f! m7 x0 u8 L0 s8 N+ p
increase the dataset
Further studies are still necessary... 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研, E+ S5 N3 s, m
essential... 通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研) t7 Z' ` M! ]. A, U' _
为了强调自己研究的重要性，一般还要在However之前介绍自己研究
问题的反方面，另一方面等等
比如：通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研" m# ~* G# F" S2 t
1)时间问题
如果你研究的问题时间上比较新，你就可以大量提及对时间较老的问题
的研究及重要性，然后说(However)，对时间尺度比较新的问题研究不足
2)物性及研究手段问题 ( c5 T+ X. F7 y5 s# E
如果你要应用一种新手段或者研究方向，你可以提出当前比较流行的方法通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研) O, v0 {8 f" q4 v8 Z* n% v, W
以及物质性质，然后说对你所研究的方向和方法，研究甚少。 www.crs001.com# {$ \1 ?+ L+ F+ {$ A
3)研究区域问题
首先总结相邻区域或者其它区域的研究，然后强调这一区域研究不足
4)不确定性学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研! X. m) W& c. L: ^3 O, _
虽然前人对这一问题研究很多，但是目前有两种或者更多种的观点，
这种uncertainties, ambiguities，值得进一步澄清
5)提出自己的假设来验证 - 通信学术论坛第一站。科研,学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研1 P: n/ E" m2 ?. d' u5 h# n
如果自己的研究完全是新的，没有前人的工作进行对比，在这种情况下，
你可以自信地说，根据提出的过程，存在这种可能的结果，本文就是要
证实这种结果。
We aim to test the feasibility (reliability) of the ...
It is hoped that the question will be resolved (fall away) with our proposed www.crs001.com: M3 G0 V I$ C r. g

method (approach). - 通信学术论坛第一站。科研,学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研5 R: ?9 P9 @3 s7 G( y U
b) 提出自己的观点 - 通信学术论坛第一站。科研,学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研9 a- q7 p5 L, E/ Q
We aim to
This paper reports on - 通信学术论坛第一站。科研,学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研. @0 \8 V1 A8 O, I/ o% J+ @
provides results
extends the method..
focus on
The purpose of this paper is to 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研! E5 x0 @+ V3 j- Q) X# Q
Furthermore, Moreover, In addition,, we will also discuss... - 通信学术论坛第一站。科研,学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研 P! ^. Z- D9 `3 W* r- l9 |. f4 |
c) 圈定自己的研究范围
前言的另外一个作用就是告诉读者包括(reviewer)你的文章主要研究 - 通信学术论坛第一站。科研,学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研8 Z% Y+ U* {2 B, W+ X" t
内容。如果处理不好，reviewer会提出严厉的建议，比如你没有考虑 1 v- y* y7 D+ G: A% `9 D
某种可能性，某种研究手段等等。
为了减少这种争论，在前言的结尾你就要明确提出本文研究的范围：学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研" k8 T* K$ K4 y3 A' w5 j
1)时间尺度问题
如果你的问题涉及比较长的时序，你可以明确地提出本文只关心这通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研# e7 d- \8 O$ G3 J
一时间范围的问题。
We preliminarily focus on the older (younger)...
或者有两种时间尺度的问题 (long-term and short term)，你可以说
两者都重要，但是本文只涉及其中一种
2) 研究区域的问题学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研" t. I. J+ U+ S3 ^9 i
和时间问题一样，明确提出你只关心这一地区
d) 最后的原场
在前言的最后，还可以总结性地提出，这一研究对其它研究的帮助。
或者说，further studies on ... will be summarized in our next
study (or elsewhere) www.crs001.com# x" C) m8 \$ N& v% m' g! n- D6 U
总之，其目的就是让读者把思路集中到你要讨论的问题上来。减少 CRS通信学社2 r$ u2 n8 B# ] N4 v, b2 G0 u8 S
争论(arguments).
关于词汇以及常用结构，要经常总结，多读多模仿才能融会贯通。 & x- i/ @( T- |! Y5 g
------------------------------------------------------------- CRS通信学社7 A- F1 W! `: y: n
怎样提出观点
在提出自己的观点时，采取什么样的策略很重要。
不合适的句子通常会遭到reviewer的置疑。 www.crs001.com# \5 F6 p3 C( j# a: T
1)如果观点不是这篇文章最新提出的，通常要用
We confirm that...
2)对于自己很自信的观点,可用学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研& w3 E8 l% C4 M. s* l) T) u/ [
We believe that...
3)在更通常的情况下，由数据推断出一定的结论， : Q% s: X* j: Z2 q! ?
用， Results indicate, infer, suggest, imply that... 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研* n" k& ^. g! O2 j, V3 G8 c. h' A
4) 在及其特别的情况才可以用We put forward CRS通信学社; z, ^0 ]% \6 E0 B/ G) R* F. v! K
(discover, observe..) .. "for the first time".
来强调自己的创新。 CRS通信学社) j, T) ?! R2 J$ S9 j" y+ B
5) 如果自己对所提出的观点不完全肯定，可用
We tentatively put forward (interpret this to..) 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研. {) V8 j) r* w; c8 h2 _
Or The results may be due to (caused by) attributed to 1 M. s+ f2 p. R0 ~; q# C# F
resulted from.. - 通信学术论坛第一站。科研,学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研5 _9 U6 l Y& b6 X+ L
Or This is probably a consequence of
It seems that .. can account for (interpret) this.. 3 B$ {; x* \( [& {# V, |
Or It is possible that it stem from... www.crs001.com4 T: N5 I7 W. T" [5 I' D
---------------------------------------------------------
连接词与逻辑
写英文论文最常见的一个毛病就是文章的逻辑不清楚。解决
的方法有：
1)句子上下要有连贯，不能让句子之间独立
常见的连接词语有, However, also, in addition, 通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研 C* }+ v0 f* {( y# J' g* }
consequently, afterwards, moreover, Furthermore, 通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研# i, ]+ B4 _. Z1 c* ?3 V! x
further, although, unlike, in contrast, Similarly,
Unfortunately, alternatively, parallel results,
In order to, despite, For example, Compared with www.crs001.com4 y% P2 E0 h. m7 G5 M5 J' h0 H
other results, thus, therefore... 0 c" K- `( c, Q, B& M( {
用好这些连接词，能够使观点表达得有层次，更加明确。
比如，如果叙述有时间顺序的事件或者文献，学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研% i( O1 \+ o* d
最早的文献可用AA advocated it for the first time. 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研7 s, L5 O% G1 N5 z: E) c t8 q
接下来，可用Then BB further demonstrated that..
再接下来，可用Afterwards, CC.. CRS通信学社% V% T, c# k- O. w8 Q5 Y' O# E7 G8 \
如果还有，可用More recent studies by DD..
如果叙述两种观点，要把它们截然分开 9 B! D. r% o+ ]) X9 n8 p% _
AA put forward that...
In contrast, BB believe
or Unlike AA, BB suggest CRS通信学社9 Q+ d4 v. H+ R9 ?1 {# [& G; I
or On the contrary (表明前面的观点错误，如果只是表明 8 e/ R6 \1 G! P* [* C! H6 M
两种对立的观点，用in contrast)， BB.. 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研, h! s' O1 O; Y5 N V @
如果两种观点相近,可用学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研4 v+ ]4 k5 m( I2 e* C
AA suggest
Similarly, alternatively, BB..
Or Also, BB
or BB also does ..
表示因果或者前后关系，可用
Consequently, therefore, as a result,
表明递进关系，可用furthermore, further, moreover, in addition, 通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研+ W7 }1 c3 K, E4 g; J$ O
当写完一段英文，最好首先检查一下是否较好地应用通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研, D/ F7 b1 C8 S' g
了这些连接词。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研- S* B: U4 b# Z |; F% i8 `
2) 段落的整体逻辑通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研- x* M B( \: {
经常我们要叙述一个问题的几个方面。这种情况下，一定要注意 0 `4 r M! k* | c, J# c4 n$ b
逻辑结构。
首先第一段要明确告诉读者你要讨论几个部份 ( x9 f3 U- J5 s- l
...Therefore, there are three aspects of this problem have to - 通信学术论坛第一站。科研,学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研' A. ~7 B7 e! q
be addressed.
The first question involves...
The second problem relates to 通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研/ w! b" F5 Z& Y- e9 W& y
The third aspect deals with... CRS通信学社1 R* U9 Q; H/ n" d9 ~$ `' l2 S
上面的例子可以清晰地把观点逐层叙述。
Or, 可以直接用First, Second, Third...Finally,..
当然,Furthermore, in addition等可以用来补充说明。
3) 讨论部份的整体结构
小标题是比较好的方法把要讨论的问题分为几个片段。
一般第一个片段指出文章最为重要的数据与结论。补充说明 CRS通信学社4 Z4 Z9 J, p3 A7 i. g# U) t" r" m
的部份可以放在最后一个片段。 www.crs001.com" w+ D% A. ?9 x% E
一定要明白文章的读者会分为多个档次。文章除了本专业
的专业人士读懂以外，一定要想办法能让更多的外专业人读懂。
所以可以把讨论部份分为两部份，一部份提出观点，另一部份通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研, {& \+ L) c- ~1 J
详细介绍过程以及论述的依据。这样专业外的人士可以了解
文章的主要观点，比较专业的讨论他可以把它当成黑箱子，而这一
部份本专业人士可以进一步研究。
为了使文章清楚，第一次提出概念时，最好加以个括弧，给出
较为详细的解释。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研9 x$ i$ L" H! t) {( s- }+ B
如果文章用了很多的Abbreviation, 两种方法加以解决
1) 在文章最好加上个Appendix，把所有Abbreviation列表
2) 在不同的页面上，不时地给出Abbreviation的含义，用来
提醒读者。
总之，写文章的目的是要让读者读懂，读得清晰，并且采取各种学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研! V( Z; P/ a! ^8 L, E
措施方便于读者。
---------------------------------------------------------
一定要注意绝对不能全面否定前人的成果，即使在你看来 CRS通信学社; k4 f$ a ]. o& [7 D& b9 s2 Y% ^
前人的结论完全不对。这是前人工作最起码的尊重，英文 - 通信学术论坛第一站。科研,学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研9 m/ F7 g+ k3 J9 u: y) ~ S7 w7 a
叫做给别人的工作credits.
所以文章不要出现非常negative的评价，比如Their results
are wrong, very questionable, have no commonsense, etc.
遇到这类情况，可以婉转地提出： CRS通信学社! {; f0 g# @- l
Their studies may be more reasonable if they had
considered this situation.
Their results could be better convinced if they ... - E- ]8 l& K0 ]8 P5 Z6 `
Or Their conclusion may remain some uncertainties. www.crs001.com, @6 D6 ?1 L9 ]# ]
讨论部份还包括什么内容? CRS通信学社. ?$ V/ W* [8 }2 \
1. 主要数据特征的总结
2. 主要结论以及与前人观点的对比
3. 本文的不足
第三点，在一般作者看来不可取。事实上给出文章的不足恰恰
是保护自己文章的重要手段。如果刻意隐藏文章的漏洞，觉得
别人看不出来，是非常不明智的。 www.crs001.com) }* D" W [" X
所谓不足，包括以下内容:
1. 研究的问题有点片面 2 ?# T9 X' q% C6 {, g
讨论时一定要说，
It should be noted that this study has examined only..
We concentrate (focus) on only...
We have to point out that we do not.. CRS通信学社+ T1 D2 k3 E* Z* c2 p+ V
Some limitations of this study are... : U( Y; y, [: Q( `' z+ A
2. 结论有些不足
The results do not imply, 通信相关学科的论坛、图书馆、百科、知道，从事通信研究的教师、博士研究生、本科生交流的专业学术型社区。学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研5 s$ T) @ l* S) w* O r+ ]
The results can not be used to determine
be taken as evidence of www.crs001.com' L; W: Z8 v1 s: V
Unfortunately, we can not determine this from this data
Our results are lack of ...
但是，在指出这些不足之后，随后一定要再一次加强本文的重要性
以及可能采取的手段来解决这些不足，为别人或者自己的下一步 + I" H' B# A8 ?* z8 ]8 c/ {8 P
研究打下浮笔。
Notwithstanding its limitation, this study does suggest..
However, these problems could be solved if we consider 学术,论坛,百科,SCI,通信人家园,研学,通信会议,通信技术,通信工程,通信资料,通信社区,通信考研' T! S P" A9 I/ y1 d8 o
Despite its preliminary character, this study can clearly indicate..
用中文来说，这一部份是左右逢源。把审稿人想到的问题提前 www.crs001.com) } h- J! H) S9 t! ?* }
给一个交代，同时表明你已经在思考这些问题，但是由于文章
长度，试验进度或者试验手段的制约，暂时不能回答这些问题。
但是，这些通过你的一些建议，这些问题在将来的研究中有可能 www.crs001.com0 C: s: C& }$ r0 Z) T& R) e
实现。

Friday, September 3, 2010

Bioscope

http://hera.uthsc.edu:8080/bioscope/views/bioscope/BioScopeMain.faces

Monday, August 30, 2010

SNP from RNAseq

~/Biosoft/samtools-0.1.6_x86_64-linux/samtools pileup -f ~/GenomeSequence/WholeGenome.fa david_li_bc_transcriptme_07_20_10_bcSample1_F3_bxd_34.bam >david_li_bc_transcriptme_07_20_10_bcSample1_F3_bxd_34.pileup

~/Biosoft/varscan/VarScan.v2.2.jar pileup2snp david_li_bc_transcriptme_07_20_10_bcSample1_F3_bxd_34.pileup >david_li_bc_transcriptme_07_20_10_bcSample1_F3_bxd_34.snp

Saturday, August 28, 2010

text mining

http://www.textpresso.org/

Sunday, July 25, 2010

Results--- writing

The result section should have three parts:
1. General observations
2. Specific observations
3. Case study

Figure

1. scatter plot
The pattern looks like ___________
• A random dispersal of points
• A straight line
• A parabola
• A sine wave
• A circular patch of points
• A shapeless clump of points
• Dense points on the left grading to sparse points on the right

add as many details as you can. Perhaps you can write:
• “The pattern looks like a straight line, beginning at the data point (1, 13) and ending at the data point (7, 3). One data point, (3, 12), is farther from this line than any of the other 24 data points.”

• “The pattern looks like one clump of points.”
You might be able to write:
• “The patterns appears to be a roughly spherical cluster of data points, centered
at approximately (50, 29). Specifically, all the data points are contained within a
circle centered at (50, 29) with a radius of 18 graph units, and, to my eye, the data
points appear fairly uniformly distributed within this circular area.”

Sunday, July 18, 2010

sentence

Beginning
1. In this paper, we focus on the need for
2. This paper proceeds as follow.
3. The structure of the paper is as follows.
4. In this paper, we shall first briefly introduce fuzzy sets and related concepts
5. To begin with we will provide a brief background on the

Introduction
1. This will be followed by a description of the fuzzy nature of the problem and a detailed presentation of how the required membership functi***** are defined.
2. Details on xx and xx are discussed in later secti*****.
3. In the next section, after a statement of the basic problem, various situati***** involving possibility knowledge are investigated: first, an entirely possibility model is proposed; then the cases of a fuzzy service time with stochastic arrivals and non fuzzy service rule is studied; lastly, fuzzy service rule are c*****idered.

Review
1. This review is followed by an introduction.
2. A brief summary of some of the relevant concepts in xxx and xxx is presented in Section 2.
3. In the next section, a brief review of the .... is given.
4. In the next section, a short review of ... is given with special regard to ...
5. Section 2 reviews relevant research related to xx.
6. Section 1.1 briefly surveys the motivation for a methodology of action, while 1.2 looks at the difficulties posed by the complexity of systems and outlines the need for development of possibility methods.

Body
1. Section 1 defines the notion of robustness, and argues for its importance.
2. Section 1 devoted to the basic aspects of the FLC decision making logic.
3. Section 2 gives the background of the problem which includes xxx
4. Section 2 discusses some problems with and approaches to, natural language understanding.
5. Section 2 explains how flexibility which often ... can be expressed in terms of fuzzy time window
6. Section 3 discusses the aspects of fuzzy set theory that are used in the ...
7. Section 3 describes the system itself in a general way, including the ….. and also discusses how to evaluate system performance.
8. Section 3 describes a new measure of xx.
9. Section 3 dem*****trates the use of fuzzy possibility theory in the analysis of xx.
10. Section 3 is a fine description of fuzzy formulation of human decision.
11. Section 3, is developed to the modeling and processing of fuzzy decision rules
12. The main idea of the FLC is described in Section 3 while Section 4 describes the xx strategies.
13. Section 3 and 4 show experimental studies for verifying the proposed model.
14. Section 4 discusses a previous fuzzy set based approach to cost variance investigation.
15. Section 4 gives a specific example of xxx.
16. Section 4 is the experimental study to make a fuzzy model of memory process.
17. Section 4 contains a discussion of the implication of the results of Section 2 and 3.
18. Section 4 applies this fuzzy measure to the analysis of xx and illustrate its use on experimental data.
19. Section 5 presents the primary results of the paper: a fuzzy set model ..
20. Section 5 contains some conclusi***** plus some ideas for further work.
21. Section 6 illustrates the model with an example.
22. Various ways of justification and the reas***** for their choice are discussed very briefly in Section 2.
23. In Section 2 are presented the block diagram expression of a whole model
of human DM system
24. In Section 2 we shall list a collection of basic assumpti***** which a ... scheme must satisfy.
25. In Section 2 of this paper, we present representation and uniqueness theorems for the fundamental measurement of fuzziness when the domain of discourse is order dense.
26. In Section 3, we describe the preliminary results of an empirical study
currently in progress to verify the measurement model and to c*****truct membership functi*****.
27. In Section 5 is analyzed the inference process through the two kinds of inference experiments...

This Section
1. In this section, the characteristics and environment under which MRP is designed are described.
2. We will provide in this section basic terminologies and notati***** which are necessary for the understanding of subsequent results.

Next Section
2. The next section describes the mathematics that goes into the computer implementation of such fuzzy logic statements.
3. However, it is cumbersome for this purpose and in practical applicati***** the formulae were rearranged and simplified as discussed in the next section.
4. The three components will be described in the next two section, and an example of xx analysis of a computer information system will then illustrate their use.
5. We can interpret the results of Experiments I and II as in the following secti*****.
6. The next section summarizes the method in a from that is useful for arguments based on xx

Summary
1. This paper concludes with a discussion of future research c*****ideration in section 5.
2. Section 5 summarizes the results of this investigation.
3. Section 5 gives the conclusi***** and future directi***** of research.
4. Section 7 provides a summary and a discussion of some extensi***** of the paper.
5. Finally, conclusi***** and future work are summarized
6. The basic questi***** posed above are then discussed and conclusi***** are drawn.
7. Section 7 is the conclusion of the paper.

Chapter 0. Abstract
1. A basic problem in the design of xx is presented by the choice of a xx rate for the measurement of experimental variables.
2. This paper examines a new measure of xx in xx based on fuzzy mathematics which overcomes the difficulties found in other xx measures.
3. This paper describes a system for the analysis of the xx.
4. The method involves the c*****truction of xx from fuzzy relati*****.
5. The procedure is useful in analyzing how groups reach a decision.
6. The technique used is to employ a newly developed and versatile xx algorithm.
7. The usefulness of xx is also c*****idered.
8. A brief methodology used in xx is discussed.
9. The analysis is useful in xx and xx problem.
10. A model is developed for a xx analysis using fuzzy matrices.
11. Algorithms to combine these estimates and produce a xx are presented and justified.
12. The use of the method is discussed and an example is given.
13. Results of an experimental applicati***** of this xx analysis procedure are given to illustrate the proposed technique.
14. This paper analyses problems in
15. This paper outlines the functi***** carried out by ...
16. This paper includes an illustration of the ...
17. This paper provides an overview and information useful for approaching
18. Emphasis is placed on the c*****truction of a criterion function by which the xx in achieving a hierarchical system of objectives are evaluated.
19. The main emphasis is placed on the problem of xx
20. Our proposed model is verified through experimental study.
21. The experimental results reveal interesting examples of fuzzy phases of: xx, xx
22. The compatibility of a project in terms of cost, and xx are likewise represented by linguistic variables.
23. A didactic example is included to illustrate the computational procedure
Chapter 1. Introduction
Time
1. Over the course of the past 30 years, .. has emerged form intuitive
2. Technological revoluti***** have recently hit the industrial world
3. The advent of ... systems for has had a significant impact on the
4. The development of ... is explored
5. During the past decade, the theory of fuzzy sets has developed in a variety of directi*****
6.The concept of xx was investigated quite intensively in recent years
7. There has been a turning point in ... methodology in accordance with the advent of ...
8. A major concern in ... today is to continue to improve...
9. A xx is a latecomer in the part representation arena.
10. At the time of this writing, there is still no standard way of xx
11. Although a lot of effort is being spent on improving these weaknesses, the efficient and effective method has yet to be developed.
12. The pioneer work can be traced to xx [1965].
13. To date, none of the methods developed is perfect and all are far from ready to be used in commercial systems.

Objective / Goal / Purpose
1. The purpose of the inference engine can be outlined as follows:
2. The ultimate goal of the xx system is to allow the non experts to utilize the existing knowledge in the area of manual handling of loads, and to provide intelligent, computer aided instruction for xxx.
3. The paper concerns the development of a xx
4. The scope of this research lies in
5. The main theme of the paper is the application of rule based decision making.
6. These objectives are to be met with such thoroughness and confidence as to permit ...
7. The objectives of the ... operati***** study are as follows:
8. The primary purpose/c*****ideration/objective of
9. The ultimate goal of this concept is to provide
10. The main objective of such a ... system is to
11. The aim of this paper is to provide methods to c*****truct such probability distribution.
12. In order to achieve these objectives, an xx must meet the following requirements:
13. In order to take advantage of their similarity
14. more research is still required before final goal of ... can be completed
15. In this trial, the objective is to generate...
16. for the sake of concentrating on ... research issues
17. A major goal of this report is to extend the utilization of a recently developed procedure for the xx.
18. For an illustrative purpose, four well known OR problems are studied in presence of fuzzy data: xx.
19. A major thrust of the paper is to discuss approaches and strategies for structuring ..methods
20. This illustration points out the need to specify
21. The ultimate goal is both descriptive and prescriptive.
22. Chapter 2. Literature Review
23. A wealth of information is to be found in the statistics literature, for example, regarding xx
24. A c*****iderable amount of research has been done .. during the last decade
25. A great number of studies report on the treatment of uncertainties associated with xx.
26. There is c*****iderable amount of literature on planning
27. However, these studies do not provide much attention to uncertainty in xx.
28. Since then, the subject has been extensively explored and it is still under investigation as well in methodological aspects as in concrete applicati*****.
29. Many research studies have been carried out on this topic.
30. Problem of xx draws recently more and more attention of system analysis.
31. Attempts to resolve this dilemma have resulted in the development of
32. Many complex processes unfortunately, do not yield to this design procedure and have, therefore, not yet been automated.
33. Most of the methods developed so far are deterministic and /or probabilistic in nature.
34. The central issue in all these studies is to
35. The problem of xx has been studied by other investigators, however, these studies have been based upon classical statistical approaches.
36. Applied ... techniques to
37. Characterized the ... system as
38. Developed an algorithm to
39. Developed a system called ... which
40. Uses an iterative algorithm to deduce
41. Emphasized the need to
42. Identifies six key issues surrounding high technology
43. A comprehensive study of the... has been undertaken
44. Much work has been reported recently in these filed
45. Proposed/Presented/State that/Described/Illustrated/
Indicated/Has shown / showed/Address/Highlights
46. Point out that the problem of
47. A study on ...was done / developed by []
48. Previous work, such as [] and [], deal only with
49. The approach taken by [] is
50. The system developed by [] c*****ists
51. A paper relevant to this research was published by []
52. []'s model requires c*****ideration of...
53. []' model draws attention to evolution in human development
54. []'s model focuses on...
55. Little research has been conducted in applying ... to
56. The published information that is relevant to this research...
57. This study further shows that
58. Their work is based on the principle of
59. More history of ... can be found in xx et al. [1979].
60. Studies have been completed to established
61. The ...studies indicated that
62. Though application of xx in the filed of xx has proliferated in recent years, effort in analyzing xx, especially xx, is lacking.
Problem / Issue / Question
63. Unfortunately, real-world engineering problems such as manufacturing planning do not fit well with this narrowly defined model. They tend to span broad activities and require c*****ideration of multiple aspects.
64. Remedy / solve / alleviate these problems
67. ... is a difficult problem, yet to be adequately resolved
68. Two major problems have yet to be addressed
69. An unanswered question
70. This problem in essence involves using x to obtain a solution.
71. An additional research issue to be tackled is ....
72. Some important issues in developing a ... system are discussed
73. The three prime issues can be summarized:
74. The situation leads to the problem of how to determine the ...
75. There have been many attempts to
76. It is expected to be serious barrier to
77. It offers a simple solution in a limited domain for a complex

1. There are several ways to get around this problem.
2. As difficult as it seems to be, xx is by no means new.
3. The problem is to recognize xx from a design representation.
4. A xx problem can trace its roots to xx.
5. xx [1987] used a heuristic approach to simplify the complexity of the problem.
6. Several problems are associated with them.
7. Although some progress has been made in this area, at least two major obstacles must be overcome before a fully automated system can be realized.
8. Most problems in practice are complicated
9. More problem surface here.
10. Hamper effort toward a xx system
11. In order to overcome the limitati***** due to incomplete and imprecise xx knowledge, a xx program has been developed, which bases its knowledge upon the statistical analysis of a sample population of xx
12. The above difficulties are real challenges faced by researchers attempting to develop
13. This type of mapping raises no controversy to the issue of membership function determination.
14. However, attempts to quantify the xx have met both theoretical and empirical problems.
15. It has become apparent that in order to apply this new methodological framework to real world problems and data, we have to pay attention to the problems of xx and xx.

Chapter 3. Proposed methodology
Assumption
1. In the case when the assumption of a xx seems to be too restrictive or inadequate, the formulation with Fuzzy termination time, i.e. given by a fuzzy set in the space of control stages, may be applied.
2. We assume here the fuzzy c*****traints to be state dependent, and the fuzzy goal to be the same for all the control states, xx, which stems from the problem's nature.
3. An approach to the solution of this problem is presented under the assumption that the sampling rate Decision can be made prior to the execution of the experiment, as opposed to being made while the experiment is in progress.
4. Another assumption made above is that there are precise odds at which the expert is indifferent.
5. Main simplifying assumpti***** are:
6. This, in our view, is a questionable assumption.

Outline / Structure / Module
1. An outline of the research
2. Information is incorporated within the scheme
3. Is built into ... structure
4. A nice modular structure.
5. The principles of ... are applied as modularized criteria

Classification
1. A xx system comprises three main components:
2. Must decompose the original .. into a set of ..
3. C*****ists of the following steps:
4. This is summarized in the following steps:
5. Can be broadly classified into the following areas:
l Can be characterized by its function of effectively processing the
l Can allow further breadth of application of ...into more
l The following steps should be followed
l xx can be classified by a different ways.
l Based on the xx, one may classify xx into the following:
l This catalog may change due to wear, breakage, and purchasing.

System
l Unlike many conventional program, expert systems do not usually deal wit
h problem for which there is clearly a right or wrong answer.
l The system c*****ists of both ... and ...
l The system has a hierarchical modular architecture organized on three levels.
l expert system domains are area of expertise
l To develop a xx system for xx, the following factors must be c*****idered:
l The system has been developed / designed to determine
l The system has proven to be able to
l The domain in which an expert system operates is a particular domain
l The system comprises a ... with
l The system is [feature-oriented ] / based on the ... technique
l The system environment must be relatively stable
l The system is utilized to generate, load, store, update and retrieve ...
l The development of a xx system has two stages: xx stage and xx stage.
l The most essential part of .. system is the ...
l The successful developments in ESs have made them an important tool in the development of
l An automated system was developed for
作者: zhaokelun1975 发布日期: 2006-6-15
l In this case, the system can be c*****idered to be generative.
l An interactive automatic ... system
l A .. is commonly thought of as a truly integrated .. system
l Should be capable of being generated from a ... system
l xx is an important part of the integrated system.
l The model c*****ists of four rule bases, each of which addresses a separate problem in the hierarchy of scheduling decision.
l The rule bases are linked to each other in a chin like manner in the sense that the c*****equent of one rule base c*****titutes a part of the antecedent of the next rule base.
l The rule base c*****ists of all possible combinati***** of the linguistic terms associated with the linguistic variable of the antecedent of a rule.

Computer System
l The system has been implemented using Prolog language in an MS DOS environment. Prolog was chosen because it offers a well known and flexible environment in which fuzzy reasoning may be easily implemented.
l The current version of the xx program when compiled with WATFOR77 result
s in an executable code of about 270K bytes. Typical run time, when run on a
XX computer (an IBM compatible machine) operating at 4.77 Mhz with 640K RAM, ranges from 10 min to 2h, depending on the size (or complexity) of the problem.
l Time c*****uming procedures have been implemented in C language and directly linked to the Prolog environment.
l The xx process, once the xx's data has been entered, requires approximately 180 seconds.
l It should be noted that the computation was done with a 20 Hhz, 80386
209;based microcomputer equipped with a 80387 math co processor.
l The computer programs used for the analyses, one based on the xx method and the other based on the new method, were written in FORTRAN with a compiler that supports the math co processor.
l Lisp, Prolog give maximum flexibility but also maximizes development time.
l Internal representation is the way a model is represented in the computer.
l An interactive menu-driven procedure is used in this study
l Shell can be develop very fast at the cost of time fairly severe limitati*****.
l While there is no measurable saving of time for the case involving five criteria, the saving is dramatic for the case involving 10 criteria -- the computation time reduces from 10 hr 40 min to about 1 min.
l This combination is being implemented in an objected oriented programming environment (Smalltalk 80 system) to solve problems encountered in c*****truction xxx.

Method / Approach / Study / Process Model / Equation /Algorithm / Rule / Formula / Technique
l A discussion is presented of a problem-solving system
l To improve the efficiency of the method, the following approach may be applied.
l In order to an investigation was made to find the causes of the
l Although large collecti***** of rules and equati***** have been complied, none are generally accepted
l This approach will be explained and discussed thoroughly in the body of the report.
l This can be accomplished by
l This algorithm to compute the total cost can be described step by step as follows:
l The above preliminary analysis has provided important information
l Various methods have been proposed for selecting an optimum...
l These concepts have been applied to
l On the basis of the concept mentioned above,
l This can be achieved by
l This fact suggests that a new concept
l This was accomplished by taking ...
l The preparatory stage is very time c*****uming process.
l Test are performed for validity, completeness, and compatibility
l There is little hope of achieving successful ...
l There has been an increasing awareness of the potential of using most ..so far made have not taken this approach, with the exception of
l Only a few studies can be found.
l It is a very tedious process to go through
l It is only when .. has been completed that .. may be effected
l The entire interpretation process is conducted in one's head.
l These approaches are sometimes very tedious.
l Several techniques can be used
l A polynomial parametric model can be written as [the following]/[follows]:
l A xx model is c*****tructed/formulated using xx.
l A xx model represents an xx by its xx.
l A process decision model captures the logic essential to
l From the equation above, xx is equal to the summation of xx times the ...
l The validity of a xx model can be checked using Euler's formula.
l Given a model, one can mathematically determine whether ... or ...
l Equati***** for xx need to be derived and implemented in the system.
l A number of heuristic rules have been developed for
l Optimum .. techniques can be made more reliable by ... so that
l An algorithm based on the characteristic ... is used to determine
l Euler's formula states the following:
l The completed model should agree with the formula.
l For manufacturing purposes, a detailed and precise model of the object is necessary
l Engineering design models are very well defined; therefore,
To keep the domain narrow enough to be implementable, yet wide enough to
be useful.
Point of View
l from an implementation standpoint,
l From the point of view of this application,
l From this point of view, Zadeh suggested an inference rule named xxx (CRI for short).
l Information is the meaningful interpretation and correlation of some aggregation of data in order to allow one to make decisi*****.

l From a practical point of view, the computational aspects of an FLC require a simplification of the fuzzy control algorithm.
l The use of a hammer to insert screws, although partly effective, tends to distort, destroy, and generally defeat the purpose of using a screw [Kusiak AI Implicati***** for CIM p.129]

Justification
l We choose the so called xx in our experiment because it has received wide acceptance and can
l Prolog was chosen because it offers a well known and flexible environment in which fuzzy reasoning may be easily implemented.
l The rationale behind this is that it can be much easier for an estimator to rate a cost as high than to attempt to place a dollar value on the estimate.
l This strategy has been widely used in fuzzy control applicati***** since it is natural and easy to implement.
l A function definition expresses the membership function of a fuzzy set in a functional form, typically a bell shaped function, etc. Such functi***** are used in FLC because they lead themselves to manipulation through the use of fuzzy arithmetic.
l It should be noted that in our daily life most of the information on which our decisi***** are based is linguistic rather than numerical in nature. Seen in this perspective, fuzzy control rules provide a natural framework for the characterization of human behavior and decisi***** analysis.
l Many experts have found that fuzzy control rules provide a convenient way to express their domain knowledge. This explains why most FLCs are based on the knowledge and experience which are expressed in the language of fuzzy "if the" rule.

Chapter 4. Examples

Example/ Data
l The data used in the following example was taken from an experiment in which xx was measured between x and x using a xx technique.
l The data c*****ists of over xx measurements.
l An example of xx is discussed and the control rules of xx are compared with a xx
l Examples of complex processes to which this technique may be applied are xx, xx, etc.
l The following example is c*****tructed only for the purpose of illustrating the computational procedure discussed.
l This example clearly dem*****trates that the profile of an individual xx, or a very small group of xx, with no enough data to be studied statistically, can be meaningfully analyzed by fuzzy possibilistic methods.
l There is no space here to go into detail on all these methods, but deserve a mention and the bibliography will point to detailed references for those wishing this level of detail.
l Note that the golf ball spotting example is used throughout the paper.

Comparis*****
l As well, the pros and c***** of these representati***** from a process planning point of view will be discussed.
l The method of using xx to implement xx described by Zadeh (1973) appeared more suitable
l As discussed [in the previous section]/[preciously],

Relation
l We can not invert F' directly because it defines a many-to-one mapping.
l The relati*****hips appear very complicate
l Lifting tasks involve complex and imprecise relati*****hip between the task variables and the human operator's characteristics.
l These methods are based on the relati*****hip between ... and ...
l The fundamental concept of a fuzzy rating language is that we can establish a relati*****hip among terms such as high, medium, and low, and then modify these relati*****hips.
l This article will thus mention the latter as well as the former.
l The former two bear a close relation to a fuzzy Cartesian product.

Importance
l The emphasis is on an implementation of a general approach to rule based decision making.

C*****ideration / Attention
l Careful evaluation is necessary to ensure
l Such a formulation does not change further c*****iderati*****.
l C*****iderable attention has been paid to
l Attention should be paid to an important finding of this investigation.
l Caution should be exercised in this process to avoid ...
l Primary c*****ideration is given to ... components, though others can be accommodated
l After ... has been defined by ..., a carefully analysis is carried out/performed to determine
l A number of factors such as ...need to be taken into c*****ideration before making the appropriate decision.
l It should be noted that
作者: zhaokelun1975 发布日期: 2006-6-15
l It is important to point out that ...
l These c*****iderati***** have heightened interest in the possibility of providing ...
l We should stress the fundamental importance of the xx

Chapter 5. Results.
Advantages / Disadvantage

l One of the major advantages of this new measure of xx is that it can be applied to the experimental study of
l One advantage of using a .. is the ease of preparing it.
l The xx system is versatile
l It has a very fast decision making process
l All the algorithms involve mostly logical operati*****.
l It can be easily and without additional cost implemented in a microprocessor based environment.
l It can reduce the waste of designing from scratch.
l The advantages of using a xx to represent xx are the following:
l However, xx is not without its shortcomings.
l In most cases, the xxx shows an improvement over the existing xxx.
l Compared to the existing xx, the impacts of the xx are generally reduced by 5% to 9%.
l The "best case" results shows a savings of 6% to 9%.
l Most of the existing works based on xx approach can only recognize a xx .
l Most of the above methods are computational expansive and limited to xx.
l Some other advantages of xx are the following:
l The problem is the limitation of this method to a limited domain of parts.
l It proved limited in application because it demanded precision in system modeling that was impossible in practice.
l There are advantages to be gained in the structuring of costs and benefits, the use of xx,
l The disadvantages of this method are also disadvantages of conventional xx approaches.
l This combines the best features of both techniques
l Hopefully, this tool can be as the reference framework of for developing a xx platform, and helping the administration, marketing, and knowledge management activities in virtual communities.

Results
l An improvement on the result shown above can be made by based on the data provided
l Discussion of these theories is beyond the scope of this review
l Based on the information contained in this
l The result can be categorized into nine classes
l The results are illustrated by an example
l The experimental results for each xx time are reported in Table 2.
l From the results obtained so far, it seem that
l Because of the inaccuracy of the ..., a conclusion cannot be drawn as
l Although much effort has been made to., this reality is far from completion.
l The results indicate that the total benefits are higher than the total costs.
l Their results may then serve as guidelines for lower level models, less fuzzy and more detailed.

Chapter 6. Conclusion
l From the discussion, one may conclude that ...
l Form the above discussion, the conclusion can be reached that
l The conclusi***** drawn are also valid
l In conclusion to this, it becomes obvious that the problem of xx lies not only in...
l We have attempted to introduce some concepts associated with a theory of
xx based on fuzzy sets.
l C*****iderable more work, hopefully, will be done in this area
l A fuzzy set procedure is proposed to solve xx selection problems interwoven with imprecise data
l Employing the compositional rule of inference, the assessment of the xx compatibility in achieving prescribed xx projectiles in any level of the hierarchy is made possible.
l This paper has presented a theoretical and experimental study of the xx process and xx concept.
l The experimental research results will hopefully serve as useful feedback information for improvements for xx work.
l The scope of this contribution was to introduce a xx method.
l In general, fuzzy sets theory provides an alternative foundation for xx analysis in a fuzzy environment.

Future Research
l Thus, first extension of the approach could be,
l Present some cues for a further approach from Fuzzy Sets Theory application to
l Some improvements to the scheduling aspect of the model may be brought through additional levels in the hierarchy for more detailed representation of the scheduling activity.

Monday, June 21, 2010

sam2wig

/bin/sh /share/apps/corona/bin/pipelinesam2wig.sh -Dlog.file=./B6D2F1.log -i B6D2F1.bam -s 24 -v 0 -o B6D2F1 -b wigwam -w true -m unique -c 3

Saturday, June 19, 2010

UCSC RNAseq

Brain_eye_log2
track type=bigWig name="Plus eye B6" description="mRNA expression of eye in B6" visibility=full color=255,0,0 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/B6_eye_positive_log2.bw
track type=bigWig name="Minus eye B6" description="mRNA expression of eye in B6" visibility=full color=255,0,0 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/B6_eye_negative_log2.bw
track type=bigWig name="Plus eye D2" description="mRNA expression of eye in D2" visibility=full color=0,100,100 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/D2_eye_positive_log2.bw
track type=bigWig name="Minus eye D2" description="mRNA expression of eye in D2" visibility=full color=0,100,100 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/D2_eye_negative_log2.bw
track type=bigWig name="Plus brain B6" description="mRNA expression of whole brain in B6" visibility=full color=255,0,0 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/B6.negative.wig_all.bw
track type=bigWig name="Minus brain B6" description="mRNA expression of whole brain in B6" visibility=full color=255,0,0 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/B6.positive.wig_all.bw
track type=bigWig name="Plus brain D2" description="mRNA expression of whole brain in D2" visibility=full color=0,100,100 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/D2.negative.wig_all.bw
track type=bigWig name="Minus brain D2" description="mRNA expression of whole brain in D2" visibility=full color=0,100,100 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/D2.positive.wig_all.bw
track type=bigWig name="Plus brain F1" description="mRNA expression of whole brain in F1" visibility=hide color=0,0,200 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/F1.negative.wig_all.bw
track type=bigWig name="Minus brain F1" description="mRNA expression of whole brain in F1" visibility=hide color=0,0,200 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/F1.positive.wig_all.bw

eye_RNAseq_log2
track type=bigWig name="Plus eye B6" description="mRNA expression of eye in B6" visibility=full color=255,0,0 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/B6_eye_positive_log2.bw
track type=bigWig name="Minus eye B6" description="mRNA expression of eye in B6" visibility=full color=255,0,0 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/B6_eye_negative_log2.bw
track type=bigWig name="Plus eye D2" description="mRNA expression of eye in D2" visibility=full color=0,100,100 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/D2_eye_positive_log2.bw
track type=bigWig name="Minus eye D2" description="mRNA expression of eye in D2" visibility=full color=0,100,100 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/D2_eye_negative_log2.bw

RNAseq_Brain_Log
track type=bigWig name="Plus brain B6" description="mRNA expression of whole brain in B6" visibility=full color=255,0,0 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/B6.negative.wig_all.bw
track type=bigWig name="Minus brain B6" description="mRNA expression of whole brain in B6" visibility=full color=255,0,0 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/B6.positive.wig_all.bw
track type=bigWig name="Plus brain D2" description="mRNA expression of whole brain in D2" visibility=full color=0,100,100 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/D2.negative.wig_all.bw
track type=bigWig name="Minus brain D2" description="mRNA expression of whole brain in D2" visibility=full color=0,100,100 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/D2.positive.wig_all.bw
track type=bigWig name="Plus brain F1" description="mRNA expression of whole brain in F1" visibility=hide color=0,0,200 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/F1.negative.wig_all.bw
track type=bigWig name="Minus brain F1" description="mRNA expression of whole brain in F1" visibility=hide color=0,0,200 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/F1.positive.wig_all.bw

Friday, June 11, 2010

whole brain

track type=bigWig name="B6 Positive" description="mRNA expression of whole brain in B6" visibility=full color=255,0,0 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/B6.negative.wig_all.bw

track type=bigWig name="B6 Negative" description="mRNA expression of whole brain in B6" visibility=full color=255,0,0 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/B6.positive.wig_all.bw

track type=bigWig name="D2 Positive" description="mRNA expression of whole brain in D2" visibility=full color=0,100,100 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/D2.negative.wig_all.bw

track type=bigWig name="D2 Negative" description="mRNA expression of whole brain in D2" visibility=full color=0,100,100 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/D2.positive.wig_all.bw

track type=bigWig name="F1 Positive" description="mRNA expression of whole brain in F1" visibility=hide color=0,0,200 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/F1.negative.wig_all.bw

track type=bigWig name="F1 Negative" description="mRNA expression of whole brain in F1" visibility=hide color=0,0,200 altColor=0,0,0 yLineMark=4 yLineOnOff=on priority=10 maxHeightPixels=256:256:64 autoScale=off yLineMark=4 viewLimits=2:10 yLineOnOff=on gridDefault=on bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/F1.positive.wig_all.bw

Help for bigwig

http://genome.ucsc.edu/goldenPath/help/wiggle.html

Friday, May 28, 2010

sequencing

jingels Fish123!

Thursday, May 27, 2010

文件大小

du --block-size=1

track

track type=bigWig name="F1 sense" description="mRNA Expression of sense in F1" visibility=full autoScale=on color=0,0,255 yLineMark=3 yLineOnOff=on priority=10 bigDataUrl=ftp://tyche.uthsc.edu/user/xusheng/F1.negative.wig_all.bw
browser position chr2:136,539,186-136,608,164

Friday, May 14, 2010

wig2bw

1. generating bw file
/home/xusheng/mRNAseq_12samples/wigToBigWig B6.negative.wig_all.log_remove ../../bc_1/output/sam2wig_new2/mm9.chrom.sizes B6.negative.wig_all.bw &

2. track type=bigWig name="Example One" description="A bigWig file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/bigWigExample.bw

track type=wiggle_0 name="variableStep" description="variableStep format" \
visibility=full autoScale=off viewLimits=0.0:25.0 color=50,150,255 \
yLineMark=11.76 yLineOnOff=on priority=10

Wednesday, April 28, 2010

sge to pbs

service pbs_server restart
service maui restart

Tuesday, April 20, 2010

bigwigs

>>>
>>> Are you talking about the database contents of
>>> the wiggle data, or the source ascii files that
>>> are given to wigEncode ?
>>>
>>> If they are just the source ascii files, they
>>> can all go together with 'cat':
>>> $ cat files*.wig > result.wig
>>>
>>> If they have custom track and browser lines in them:
>>> $ egrep -v "^track|^browser" files*.wig > result.wig
>>>
>>> Or, all these can be sent into wigEncode:
>>> $ egrep -v "^track|^browser" files*.wig | wigEncode stdin result.wig
>>> result.wib
>>>
>>> We now try to distinguish source files with the suffix:
>>> .wigFixed or .wigVar
>>> for fixedStep or variableStep wiggle ascii data.
>>>
>>> You may also be interested in the more efficient
>>> encoding mechanism of wigToBigWig and the resulting .bw
>>> file which can be a URL resource. See also:
>>> http://genome.ucsc.edu/goldenPath/help/bigWig.html
>>> This encoder works for bedGraph file types too and is
>>> much more efficient:
>>> http://genome.ucsc.edu/goldenPath/help/bedgraph.html
>>>

Tuesday, February 9, 2010

Thursday, January 14, 2010

Find nnnn in genome

perl -e'open(FILE,"chr1.fa");my $content=""; while(){chomp $_; $content.=$_;} while($content=~/(NNNNNNN.*?N[A|C|G|T|a|c|g|t])/g){print length($1),"\t",pos($content),"\n";}'

Monday, January 4, 2010

maq

esión 2 - Mapeo de secuencias cortas a un genoma con MAQ

Introducción

En esta sesión vamos a aprender a utilizar MAQ (Mapping and Assembly with Qualities) que se describe en el artículo: “Mapping short DNA sequencing reads and calling variants using mapping quality scores” (pdf)

Vamos a utilizar datos como los que obtuvimos al final de la práctica pasada, es decir, archivos en formato fastq ya depurados y recortados. Para que todos podamos trabajar con los mismos datos, les estoy incluyendo la liga a una carpeta.

Se necesita lo siguiente para realizar esta práctica

maq Instalado (src)
maqview No instalado (src)
Archivos en formato fastq carpeta
Genoma de referencia Salmonella CT18
El manual de maq viene en el src, y está disponible en pdf por si necesitan consultarlo.

Una vez más, durante esta página web, secciones de ejemplo de código, que pueden copiar directamente serán mostrados así:

echo "Hello world!"
Los resultados o salidas, se indicarán así:

Hello world

Funciones de MAQ y procesamiento de archivos de entrada

Dentro de un sólo programa maq tenemos muchas funciones, que van generando archivos con distintos propósitos. El siguiente diagrama de flujo muestra estas funciones, así como sus entradas y salidas.

Como se ve en la figura, maq utiliza como entradas un genoma de referencia en formato fasta y archivos de secuencia en formato fastq. Sin embargo, utiliza formatos de archivo binarios, para ser más eficiente. Lo primero que necesitamos hacer es convertir todos nuestros archivos de secuencia a su formato binario correspondiente.

maq fasta2bfa genome.fa genome.bfa # Para genomas de referencia, formato fasta
maq fastq2bfq reads.fq reads.bfq # Para archivos de secuencias, formato fastq
Con un ejemplo de los archivos que estamos usando:

gzip -dc Salmonella_CT18.fa.gz | maq fasta2bfa - ref.bfa
-- 3 sequences have been converted.
gzip -dc Typhi_E022759_solexa_3.a.filtered.fq.gz | maq fastq2bfq - solexa_3.a.bfq
-- finish writing file 'solexa_3.a.bfq'
-- 248010 sequences were loaded.

Mapeo de secuencias al genoma de referencia

El siguiente paso es mapear las secuencias al genoma de referencia:

maq map # Dejando las opciones que nos interesan
Usage: maq map [options]

Options: -1 INT length of the first read (<=127) [0]
-e INT maximum allowed sum of qualities of mismatches [70]
-n INT number of mismatches in the first 24bp [2]
-u FILE dump unmapped and poorly aligned reads to FILE [null]
Debido a que ya tenemos nuestras secuencias recortadas, no necesitamos la opcion -1. Las siguientes dos opciones van a controlar la cantidad de secuencias que podremos mapear. Si por alguna razón esperamos que nuestras secuencias tengan más de 2 errores, y que estas posiciones sean de alta calidad podríamos incrementarlas un poco. Sin embargo, es poco probable que haya más de un SNP dentro de una secuencia corta. Si las incrementamos, encontraremos más SNPs, pero también tendremos más falsos positivos. El incrementar -n ocasionará tambien que maq use más memoria y sea más lento. Si queremos guardar las secuencias que no pueden ser mapeadas (para posteriores análisis, por ejemplo para tratar de ensamblarlas) nos será útil la opción -u. Podemos intentar entonces:

maq map -n 2 -e 70 -u solexa_3.a.unmap solexa_3.a.map ref.bfa solexa_3.a.bfq
-- maq-0.7.1
[ma_load_reads] loading reads...
...

# tardó 2:30 minutos en palenque
Las secuencias que no pudieron ser mapeadas con los parámetros que elegimos las podemos ver directamente, pues es un archivo de texto:

more solexa_3.a.unmap
Si queremos a estas alturas ver el resultado del alineamiento, tenemos que extraerlo con el siguiente comando:

maq mapview solexa_3.a.map > solexa_3.a.map.txt
head solexa_3.a.map.txt
IL2_30_2_3_400_762 chr 121 - 0 0 85 85 85 0 0 1 0 30 cGgGCAGATACTTTAACCAATATAGGAATA 0I2IIIIIIIIIIIIIIIIIIIIIIIIIII
IL2_30_2_17_1003_180 chr 168 + 0 0 76 76 76 0 0 1 0 30 AATGACAGAGTACACAAcAtcCaTgaAcCg IIIIIIIIIIIIIIIII5I0*I0I.,I8G%
Las columnas de este archivo se describen a continuación:

1 identificador del read
2 nombre de la secuencia de referencia
3 posición en la secuencia de referencia
4 cadena de la secuencia de referencia
5-6 información para mapeo de secuencias apareadas
7-9 calidad de mapeo, idénticas por no ser secuencias apareadas
10 número de mismatches del mejor hit
11 suma de calidades de los mismatches del mejor hit
12 número de hits con 0-mismatches en las primeras 24 bases
13 número de hits con 1-mismatches en las primeras 24 bases
14 longitud del read
15 secuencia del read
16 calidades del read

El archivo .map es uno de los centrales del proceso de maq. Cuando tengamos todos los archivos .map correspondientes a los archivos fastq iniciales, podemos combinarlos usando:

maq mapmerge all.map solexa_3.a.map solexa_3.b.map solexa_3.c.map ...
Otras funciones para extraer información del archivo de mapeo:

maq mapcheck -s ref.bfa all.map
Number of reference sequences: 3
Length of reference sequences exlcuding gaps: 5133713
Length of gaps in the reference sequences: 0
Length of non-gap regions covered by reads: 4845450
Length of 24bp unique regions of the reference: 4694282
Reference nucleotide composition: A: 24.01%, C: 25.90%, G: 25.97%, T: 24.12%
Reads nucleotide composition: A: 23.58%, C: 26.32%, G: 26.37%, T: 23.73%
Average depth across all non-gap regions: 33.773
Average depth across 24bp unique regions: 33.854

A C G T : AC AG AT CA CG CT GA GC GT TA TC TG : 0? 1? 2? 3? 4? : 0? 1? 2? 3? 4?
1 25.3 25.4 25.0 24.3 : 0 0 0 3 0 1 2 1 0 2 1 1 : 4 12 20 30 933 : 248 169 0 0 0
2 24.6 24.8 25.9 24.7 : 1 1 0 2 0 0 0 0 0 1 1 2 : 5 15 25 35 919 : 200 33 7 3 1
3 24.6 25.8 25.0 24.6 : 1 1 0 4 0 1 0 0 0 1 1 2 : 6 18 29 40 906 : 191 30 8 4 1
4 23.9 25.7 26.7 23.7 : 1 1 0 3 0 0 0 0 0 1 1 3 : 6 17 28 38 909 : 204 34 9 4 1
5 25.1 24.2 26.6 24.1 : 2 2 1 5 1 1 2 1 1 2 2 4 : 10 17 28 37 907 : 405 41 8 4 1
...
25 23.6 26.6 26.4 23.5 : 15 5 2 9 5 2 1 3 3 3 7 8 : 126 206 161 120 385 : 99 8 2 1 1
26 23.6 26.0 26.9 23.6 : 13 9 3 12 13 4 1 2 3 3 6 13 : 149 209 159 117 364 : 114 10 3 2 1
27 23.2 26.7 26.8 23.4 : 23 9 3 12 10 4 1 5 5 4 10 13 : 178 238 166 114 304 : 117 10 3 2 1
28 23.4 26.0 26.9 23.7 : 21 11 6 16 14 8 2 4 8 4 8 12 : 203 248 165 109 273 : 121 12 3 2 1
29 22.7 26.7 27.1 23.5 : 46 19 9 20 20 10 3 10 10 7 18 20 : 245 265 161 100 228 : 169 18 4 2 1
30 22.9 26.1 27.0 24.0 : 39 20 16 27 25 20 4 11 16 8 16 20 : 278 274 156 93 199 : 173 25 0 0 0

<-contenido de bases-> <-frecuencia de cambios observados-> <-calidades reads -> <-cambios/calidad->
maq mapstat all.map
-- Total number of reads: 5791504
-- Sum of read length: 173745120
-- Error rate: 0.016155
-- Average read length: 30.00

-- Mismatch statistics:

-- MM 0 4051603
-- MM 1 1036434
-- MM 2 423348
-- MM 3 206417
-- MM 4 64118
-- MM 5 9584

-- Mapping quality statistics:

-- MQ 00-09 146191 146191
-- MQ 10-19 150298 150298
-- MQ 20-29 18452 18452
-- MQ 30-39 240481 240481
-- MQ 40-49 389043 389043
-- MQ 50-59 638169 638169
-- MQ 60-69 475203 475203
-- MQ 70-79 1216844 1216844
-- MQ 80-89 2312666 2312666
-- MQ 90-99 204157 204157
Para obtener un resúmen por cada base de nuestra referencia, sobre la cobertura y las secuencias que ahí maapean, podemos generar un archivo llamado pileup.

maq pileup -v ref.bfa all.map | head -n 20
chr 1 A 0 @ @ @
chr 2 G 0 @ @ @
chr 3 A 0 @ @ @
chr 4 G 0 @ @ @
chr 5 A 0 @ @ @
chr 6 T 1 @, @I @v
chr 7 T 1 @, @I @v
chr 8 A 2 @,g @I$ @vy
chr 9 C 3 @,., @III @vyh
chr 10 G 4 @,.,. @IIII @vyhz
chr 11 T 4 @,.,. @IIII @vyhz
chr 12 C 5 @,.,.. @II8II @vyhz{
chr 13 T 6 @,.,.., @I4II1I @vyhz{{
chr 14 G 9 @,.,..,,,. @IIIIIIII9 @vyhz{{uv{
chr 15 G 9 @,.,..,,,. @IIIIIIII< @vyhz{{uv{
chr 16 T 9 @,.,..,,,. @HIIIIIIII @vyhz{{uv{
chr 17 T 10 @,.,..,,,., @I?IIIIIIII @vyhz{{uv{v
chr 18 G 11 @,.,..,,,.,, @IIIIIIIIIII @vyhz{{uv{vw
chr 19 C 11 @,.,..,,,.,, @IIHIIIIIIII @vyhz{{uv{vw
chr 20 A 11 @,.,..,,,.,, @IIIIIIIIIII @vyhz{{uv{vw
El cual incluye el cromosoma, posición, base de referencia, cobertura en esa posición y 3 columnas codificadas: secuencias de los reads, calidad de los reads, calidad de mapeo. Estas últimas columnas siempre empiezan con "@". La secuencia de los reads se muestra en "." o "," cuando la base es igual a la de referencia si no, se muestra la base. Para indicar la cadena "+" se usa la "," o una letra mayúscula, para la cadena "-" se usa el "." o una letra minúscula.

Generación del consenso de mapeo y predicción de SNPs

El siguiente archivo central del proceso maq es el consenso (.cns). Lo obtenemos así:

maq assemble -N 1 -s consensus.cns ref.bfa all.map
A partir del consenso, existen varias funciones para extraer archivos con distinto tipo de información. Podemos por ejemplo extraer el consenso completo, es decir la secuencia del nuevo genoma de acuerdo al consenso de mapeo, junto con sus valores de calidad de cada base, en formato fastq:

maq cns2fq consensus.cns > cns.fq
head(cns.fq)
@chr
nnnnnttaCGTCTGGTTGCAAGAGATCATAACAGGGGAAATTGATTGAAAATAAATATAT
CGCCAGCAGCACATGAACAAGTTTCGGAATGTGATCAATTTAAAAATTTATTGACTTAGG
CGGGCAGATACTTTAACCAATATAGGAATACAAGACAGACAAATAAAAATGACAGAGTAC
ACAACATCCATGAACCGCATCAGCACCACCACCATTACCACCATCACCATTACCACAGGT
AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGAACAGTGCGG
GCTTTTTTTTCGACCAGAGATCACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGT
ACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATTCC
AGGCAAGGGCAGGTAGCGACCGTACTTTCCGCCCCCGCGAAAATTACCAACCATCTGGTG
GCGATGATTGAAAAAACTATCGGCGGCCAGGATGCTTTGCCGAATATCAGCGATGCCGAA
El objetivo de muchos estudios de resecuenciación es encontrar los SNPs. Para obtener todas las posiciones que varían entre el consenso y el genoma de referencia usamos cns2snp:

maq cns2snp consensus.cns > cns.snp
head cns.snp
chr 12478 C T 255 39 1.00 63 62 C 255 Y
chr 17187 C T 255 33 1.00 63 62 N 255 N
chr 22623 T C 255 36 1.00 63 62 A 255 M
los campos de esta salida se describen a continuación:

1 nombre del cromosoma de referencia
2 posición en el cromosoma
3 base de referencia
4 base consenso
5 calidad del consenso
6 profundidad de la cobertura
7 número promedio de hits de los reads que cubren esta posición
8 calidad de mapeo más alta de los reads que cubren esta posición
9 calidad mínima del consenso de las 3 bases de cada lado del sitio
10 segunda base consenso
11 log likelihood ratio de la segunda y tercera base
12 tercera base consenso

En particular el campo 5 nos define la calidad del consenso y por lo tanto la confianza que tenemos de que es un SNP verdadero. También es bueno considerar la calidad de las bases vecinas, definidas en el campo 9. Una región de baja calidad puede implicar un problema de mapeo, no variabilidad puntual. La columna 7 no da información sobre la redundancia de esta posición. Si el número es cercano a 1, esta región es única, pero si es mayor puede implicar una duplicación o una región repetitiva.

Debido que estos SNP potenciales pueden contener muchos falsos positivos, conviene filtrarlos para elegir los mejores candidatos. Para esto maq incluye un script de perl que nos ayuda:

maq.pl SNPfilter
Usage: maq.pl SNPfilter [options]

Options: -d INT minimum depth to call a SNP [3]
-D INT maximum depth (<=254), otherwise ignored [256]
-n INT minimum neighbouring quality [20]
-Q INT required max mapping quality of the reads covering the SNP [40]
-q INT minimum consensus quality [20]
-w INT size of the window in which SNPs should be filtered out [5] (cerca de un indel)
-S FILE splitread output [null]
-F FILE indelpe output [null]
-f FILE indelsoa output [null]
-s INT indelsoa score (= left_clip + right_clip - across) [3]
-m INT indelsoa: max number of reads mapped across the indel [1]
-W INT window size for filtering dense SNPs [10]
-N INT maximum number of SNPs in a window [2]
-a alternative filter for single end reads
Recuerden que no podemos usar información de indeles porque no contamos con secuencias apareadas. Podemos usar algunas de las reglas que sugieren en el artículo de maq:

maq.pl SNPfilter -d 4 -n 20 -Q 40 -q 40 -W 10 -N 3 cns.snp > filtered.snp

wc -l cns.snp filtered.snp
7489 cns.snp
256 filtered.snp
Como podrán ver, se puede eliminar un gran numero de SNPs de esta manera.

Visualización del mapeo con maqview

En un momento dado vamos a querer explorar manualmente ciertas regiones de mapeo, para visualizar mejor la cobertura, o la calidad del mapeo al rededor de un SNP potencial. Para esto nos sirven programas como maqview.

Primero debemos indexar los archivos de mapeo y consenso:

maqindex -i -c consensus.cns all.map
Maqview es un programa que utliza OpenGL para darnos una interfaz gráfica al mapeo. Nos permite navegar rápidamente por nuestro genoma de referencia, observando las secuencias que han sido mapeadas, posiciones de error, calidad de secuencia, calidad de mapeo, SNPs, etc.

maqview -c consensus.cns all.map
Un ejemplo mostrando un SNP

Una de las modalidades gráficas, los colores indican la cadena de mapeo

Ejercicios:

Escriban un script para poder trabajar fácilmente con maq y múltiples archivos fastq de entrada.
Procesen 10 de los archivos fastq, hasta el punto de predecir SNPs.
Comparen sus predicciones con las de este archivo: filtered.snp
Si asumen que esos son los verdaderos SNPs, qué pueden decir sobre falsos/verdaderos positivos/negativos?
Prueben algunas combinaciones de filtrado, para ver cómo cambia esto.