F-UT-URE: 2009

Thursday, December 17, 2009

8.11. run
8.11.1. run host
rocks run host [host...] {command} {managed} [command=string]

Run a command for each specified host.

arguments

[host]
Zero, one or more host names. If no host names are supplied, the command is run on all known hosts.

command
The command to run on the list of hosts.

managed
Run the command only on 'managed' hosts, that is, hosts that generally have an ssh login. Default is 'yes'.

parameters

[command=string]
Can be used in place of the 'command' argument.

examples

$ rocks run host compute-0-0 command="hostname"
Run the command 'hostname' on compute-0-0.

$ rocks run host compute "ls /tmp"
Run the command 'ls /tmp/' on all compute nodes.

8.11.2. run roll
rocks run roll [roll...]

Installs a Roll on the fly

arguments

[roll]
List of rolls. This should be the roll base name (e.g., base, hpc, kernel).

examples

# rocks run roll viz
Installs the Viz Roll onto the current system.

add driver to nodes

1. Introduction

The Network File System is certainly one of the most widely used network services. Network file system (NFS) is based on the Remote procedure call. It allows the client to automount and therefore, transparently access the remote file systems on the network.
2. Scenario

In this scenario we are going to export the file system from the linuxconfig.org (IP address 10.1.1.200) host and mount it on linuxconfig.local(IP address 10.1.1.100).
3. Prerequisites

At this point, we assume that the NFS service daemon is already installed on your system, including portmap daemon on which NFS setupt depends. Moreover, your system needs to support the NFS file system.
$ cat /proc/filesystems

NFS daemon should be listening on both standard ports 2049 and portmap on port 111.

Another way to check if NFS is functioning, is to use the rpcinfo command.
# rpcinfo -p
You should get a response/output similar to one below:

4. Server export file

All NFS server exports need to be defined in /etc/exports file.
4.1. Most common exports options

Here are the most common export techniques and options:
/home/nfs/ 10.1.1.100(rw,sync) export /home/nfs directory for host with IP 10.1.1.100 with read, write permissions, and synchronized mode
/home/nfs/ 10.1.1.0/24(ro,sync) export /home/nfs directory for network 10.1.1.0 netmask 255.255.255.0 with read only permissions and synchronized mode
/home/nfs/ 10.1.1.100(rw,sync) 10.1.1.10(ro,sync) export /home/nfs directory for host with IP 10.1.1.100 with read, write permissions, synchronized mode, and also export /home/nfs directory for hosts with IP 10.1.1.10 with read only permissions and synchronized mode
/home/nfs/ 10.1.1.100(rw,sync,no_root_squash) export /home/nfs directory for host with IP 10.1.1.100 with read, write permissions, synchronized mode and the remote root user will not be treated as a root but as a default nfs user.
/home/nfs/ *(ro,sync) export /home/nfs directory for any host with a read only permission and synchronized mode
/home/nfs/ *.linuxconfig.org(ro,sync) export /home/nfs directory for any host within linuxconfig.org domain with a read only permission and synchronized mode
/home/nfs/ foobar(rw,sync) export /home/nfs directory for hostname foobar with read, write permissions and synchronized mode
4.2. Edit exports file

Open up your favorite text editor, for example, vim and edit /etc/exports file and add line /home/nfs/ *(ro,sync) to export /home/nfs directory for any host with read only permissions.
Be sure that the directory you export by NFS exists. You can also create a file inside the /home/nfs directory which will help you troubleshoot once you mount this file system remotely.
# touch /home/nfs/test_file
4.3. Restart NFS daemon

Once you edit /etc/exports file you need to restart NFS daemon to apply changes in the /etc/exports file. Depending on your Linux distribution, the restarting of NFS may differ. Debian users:
# /etc/init.d/nfs-kernel-server restart
Redhat users
# /etc/init.d/nfs restart
If you later decide to add more NFS exports to the /etc/exports file, you will need to either restart NFS daemon or run command exportfs:
# exportfs -ra
5. Mount remote file system on client

First we need to create a mount point:
# mkdir /home/nfs_local
If you are sure that the NFS client and mount point are ready, you can run the mount command to mount exported NFS remote file system:
# mount 10.1.1.200:/home/nfs /home/nfs_local
In case that you need to specify a type of the filesystem you can do this by:
# mount -t nfs 10.1.1.200:/home/nfs /home/nfs_local
You may get error message
mount: mount to NFS server failed: timed out (retrying).
This may mean that your server supports higher versions of nfs and therefore you need to pass one extra argument to your nfs client. In this example we use nfs version 3:
# mount -t nfs -o nfsvers=3 10.1.1.200:/home/nfs /home/nfs_local

Now you should be able to see that the file system is mounted. Notice that the mount command reports that the filesystem is mounted as "read and write", although you can see that it provides a "read only" permission.
6. Configure automount

To make this completely transparent to end users, you can automount the NFS file system every time a user boots a PC, or you can also use PAM modules to mount once a user logs in with a proper username and password. In this situation just edit /etc/fstab to mount system automatically during a system boot. You can use your favorite editor and create new line like this:
10.1.1.200:/home/nfs /home/nfs_local/ nfs defaults 0 0
in /etc/fstab or
# echo "10.1.1.200:/home/nfs /home/nfs_local/ nfs defaults 0 0" >> /etc/fstab

7. Conclusion

The Network File System comes with tons of export options. What has been shown here, just barely scratches the surface of NFS. Please visit Linux NFS-HOWTO hosted by linux documentation project or NFS homepage for more details.
8. Appendix A

Following section of this NFS tutorial is going to be devoted to RedHat like Linux systems which by default block all incoming traffic to a NFS server by engaging firewall using iptables rules. For this reason when the firewall is running on your NFS server, you might get this error when mounting NFS filesytem: mount.nfs: mount to NFS server '10.1.1.13' failed: System Error: No route to host. This error message has nothing to do with your NFS configuration, all what needs to be done is either turn of the firewall or add iptables rules to allow traffic on portmap port 111, nfs port 2049 and random ports for other nfs services.

There are two solutions to this problem: easy solution is to turn off the firewall completely and the right solution to add appropriate iptables rules.
8.1. Turn off firewall on Redhat like systems:

The easiest solution is to just turn off the firewall. This will automatically grant access to the nfs daemon to anyone. I would suggest this solution only for testing purposes of your NFS configuration. Enter the following command to stop firewall and clean up all iptables rules:
# service iptables stop
Now when your NFS settings are correct you should be able to mount nfs filesystem from you client machine.
8.2. Add iptables rules to allow NFS communication

This is a more complex but right solution to the given problem. First we need to set static port for nfs services such as rquotad, mountd, statd, and lockd by editing /etc/sysconfig/nfs file. Add or uncomment following lines in your /etc/sysconfig/nfs file:
LOCKD_TCPPORT=32803
LOCKD_UDPPORT=32769
MOUNTD_PORT=892
STATD_PORT=662

Restart you NFSD daemon with following commands:
# /etc/init.d/nfs restart
# /etc/init.d/nfslock restart
Use rpcinfo command to confirm a validity of your new ports settings:
# rpcinfo -p localhost
The output should be similar to the one below:
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100011 1 udp 999 rquotad
100011 2 udp 999 rquotad
100011 1 tcp 1002 rquotad
100011 2 tcp 1002 rquotad
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100021 1 udp 32769 nlockmgr
100021 3 udp 32769 nlockmgr
100021 4 udp 32769 nlockmgr
100021 1 tcp 32803 nlockmgr
100021 3 tcp 32803 nlockmgr
100021 4 tcp 32803 nlockmgr
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100005 1 udp 892 mountd
100005 1 tcp 892 mountd
100005 2 udp 892 mountd
100005 2 tcp 892 mountd
100005 3 udp 892 mountd
100005 3 tcp 892 mountd
100024 1 udp 662 status
100024 1 tcp 662 status
Save your current iptables rules into iptables-rules-orig.txt :
# iptables-save > iptables-rules-orig.txt
Create file called iptables-nfs-rules.txt with the following content:
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [2:200]
:RH-Firewall-1-INPUT - [0:0]
-A INPUT -j RH-Firewall-1-INPUT
-A FORWARD -j RH-Firewall-1-INPUT
-A RH-Firewall-1-INPUT -i lo -j ACCEPT
-A RH-Firewall-1-INPUT -p icmp -m icmp --icmp-type any -j ACCEPT
-A RH-Firewall-1-INPUT -p esp -j ACCEPT
-A RH-Firewall-1-INPUT -p ah -j ACCEPT
-A RH-Firewall-1-INPUT -d 224.0.0.251 -p udp -m udp --dport 5353 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m udp --dport 631 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 631 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 2049 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 111 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 111 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 2049 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 32769 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 32769 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 32803 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 32803 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 662 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 662 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 892 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 892 -j ACCEPT
-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited
COMMIT
Apply new rules with iptables-restore, where the single argument will be iptables-nfs-rules.txt file:
NOTE: this will create a new set of iptables rules. If you have already defined some iptables rules previously, you may want to edit iptables-rules-orig.txt and use it with iptables-restore command instead.
# iptables-restore iptables-nfs-rules.txt
Save these new rules, so you do not have to apply new rules for nfs daemon next time you restart your server:
# service iptables save
Now your server is ready to accept client nfs requests. Optionally, you may restart iptables rules / firewall with the following command:
# service iptables restart

Thursday, December 3, 2009

llumina2srf

${PipelineDir}/bin/illumina2srf -o lane_${lane_no}.srf
${BustardDir}/s_${lane_no}_*_qseq.txt

Thursday, November 26, 2009

真正用意

the real point of this slide is just to impress upon you the large number of interaction.

Tuesday, November 24, 2009

drives to node

Add it to NFS (Network File System) service on Cluster HeadMaster, and then mount it on every Cluster nodes.

http://zh.wikipedia.org/wiki/NFS

Friday, October 30, 2009

abyss

qsub -pe mpi 8 -b y mpirun -np 8 /home/xusheng/bin/ABYSS-P -k40 -l50 /home/xusheng/s_1_1_sequence.fastq /home/xusheng/s_1_2_sequence.fastq -o test.fa

Thursday, October 8, 2009

推测

1. confer

Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor–positive breast cancer.

All of these 11 R genes confer race-specific hypersensitive resistance.

2. conjecture

The evidence for the conjecture consists of several correspondences between the two theories.

A major result of this paper is to affirm a conjecture of Coase (1972) that states that the market will open at a price close to zero.

A complete proof for Holyer’s conjecture is the content of this paper.

Examination of Carnegie's conjecture allows us to examine this proposition from a new standpoint.

3. guess

4. presume

We presume that testosterone suppression was a result of GnRHa-induced gonadotropin suppression.

5. speculate

One can speculate that most of the anomalies encountered in this dialogue are perhaps
a natural consequence of dealing with a theory which is so revolutionary

Therefore, it is tempting to speculate that the chloroplast import signal was added
to an ancestor gene of endosymbiotic origin in the course of plant evolution

Wednesday, September 30, 2009

complicated

be complicated by
因。。。变得复杂

These estimates, however, were complicated by the whole-genome shotgun sequence assembly (WGSA) method, which cannot resolve large, highly identical duplications.

This study was complicated by the absence in this strain of the NK1.1 allele, the only one for which an antibody is available.

Tuesday, September 29, 2009

extreme

In two of the most extreme cases of discrepancy

Sunday, September 27, 2009

161.229

Thursday, September 24, 2009

关键

1. crucial

A crucial question in the field of gene regulation is whether...

Chronic inflammation in fat plays a crucial role in the

has a crucial role in the

2. key

the following two key questions were identified

One of the key questions in climate change research relates to the future dynamics

3. critical

To resolve the critical question of accurate identification of the macrophages

When multiple outcomes are assessed, a critical question to be considered is whether
these should be examined separately or somehow integrated,

reveal a critical role in

4. essential
The essential problem here is to find a feasible path from a source

So this raises the essential questions

The essential questions are as follows

5. vital

6. pivotal
This question is pivotal to the business-case issue

7. extremely important

8. fundamental

Friday, September 18, 2009

PDA

NCBI

xwang

monroe855

Thursday, September 17, 2009

特别

1. extremely costly

特别费钱

a. This can be extremely costly and ineffective. 这可能特别费钱，而且没多大效果

b. Drug addiction is a common brain disorder that is extremely costly to the individual and to society.

c. This task is extremely difficult. 这项任务极其困难

2. highly

3. exceedingly
The mix of science, epistemology, and politics is exceedingly novel, exceedingly
in- teresting, and exceedingly important.

Wednesday, September 16, 2009

往往

tend to be

Filtered reads tend to be shorter because a larger proportion of the long reads are instrument artifacts related to the base addition order.

sequencing process

We generated at least 6 human genome coverage of sequence per week-long run. Each full run consisted of 50 channels distributed across two flow cells. We combined data from four instrument runs, during which 172 of the 200 channels were loaded with P0 genomic DNA. Sequence data were mapped to the NCBI 36 reference human genome (hg18) using the open-source aligner IndexDP; 63% of the raw reads were aligned (Fig. 1a), yielding a total useful coverage of 28.

Toward this end

Towards this end, the paper also briefly describes

Ultimately

Ultimately, understanding the details of these regulatory pathways will provide
insights into the role of the

This question is important to the ultimate understanding of the causes of mass
extinction and also influences the format of any statistical analysis.

Ultimately, understanding the behaviour of each mutation, and analysing it thoroughly
for each patient, could allow us to develop sound correlations between ...

Eventually

Finally

Tuesday, September 15, 2009

优缺点

1. strengths and weaknesses
优点和缺点

a. will therefore have different strengths and weaknesses

b. Each approach has its strengths and weaknesses and may be suitable for different circumstances.

每种方法都有其优缺点，适合不同的情况。

What are the strengths and weaknesses of these mechanism?

2. merits and limits
In this paper, we evaluated the merits and limits of endoscopic neck surgery.

Finally, results from different methods are compared, and the merits and
limits of various methodological choices are discussed.

3. opportunities and limitations
opportunities and limitations of drug therapy in prevention of sudden death

4. Promises and pitfalls
前景和困难

5. Challenges and Opportunities

Sunday, September 13, 2009

IP:

172.21.59.202

172.21.161.185

172.23.25.10

128.169.4.26

Tuesday, September 8, 2009

程度

An issue of particular importance is to what extent the data exhibit a “lane effect,”
一个重要的问题是那种程度的数据表现。。效应

Friday, September 4, 2009

根据，依据，按照

according to ....

According to this model one would expect to find differentiation into

According to our results this does not seem to occur frequently

No significant differences were observed when results were analyzed according to previous treatment.

As expected, these estimates of probability did not differ according to the results
of the stress test

in terms of
从……方面（说来），按照

The results are analyzed in terms of

this data profile is described in terms of the parameters α and β in eq. (1)
用参数α 和 β 来描述数据

evaluates groups of yeast genes in terms of their annotations in

in the light of

In the light of these changes, we must revise our plan.
鉴于这些变化, 我们必须重新修订我们的计划。

on the basis of

in accordance with the above views

in accordance to
依照, 根据, 与…一致 (law)

In accordance to your request, I am sending you sample pages of the dictionary.

All procedures were performed in accordance to specifications of institutionally approved animal protocols.

由于。。。

as a result, = as the matter of fact == as a consequence.. 因此，所以，想当于so

So, therefore, thus, hence, consequently, as a consequence, accordingly, as a result, because of this, as a result of this

as a result, finally, therefore, accordingly, in short, thus, consequently, in conclusion, so, in brief, in a word

as a result of == as a consequence of

due to

due to the fact that...

最有可能

the list of most likely candidate genes ...
最可能的候选基因列表

that are most likely to provide valid results
最可能提供矫正结果

This effect is most likely to be mediated by a phosphorylation event
这种效果最可能受到。。事件的调节

those factors that are most likely to be involved in the phenotype
这些因素最有可能参与这个表型

The maximum range and the most possible value of ..
最大范围和最可能的值。。

In this situation, the most possible mechanism seems to be occlusion.
这种情情况下，最有可能的机理好像

The most possible mechanism for explaining this 。。 is ..
解释这种。。。的最大可能机理是。。。

Thursday, September 3, 2009

Free MSVC tools + Activestate to compile CPAN Modules

Free MSVC tools + Activestate to compile CPAN Modules

by jZed | Log in | Create a new user | The Monastery Gates | Super Search |
| Seekers of Perl Wisdom | Meditations | PerlMonks Discussion | Snippets |
| Obfuscation | Reviews | Cool Uses For Perl | Perl News | Q&A | Tutorials |
| Code | Poetry | Recent Threads | Newest Nodes | Donate | What's New |
on Aug 31, 2004 at 00:03 UTC ( #387070=perlmeditation: print w/ replies, xml ) Need Help??
update In the YMMV vein, please see shay's Re: Building Perl with the free MSVC tools. The method below will apparently only work for some modules. I tried with DBI, DBD::SQLite, and Text::CSV_XS and they all work fine, so this method is useful for me regardless of what other modules it will or won't work with.

I got a new winXP box and wanted to be able to use perl (well, duh!) and to compile my own versions of CPAN modules without using cygqin bash or purchasing a compiler from Microsoft (not so duh, but not so hard either).

After attempting unsuccessfully to follow Building Perl with the free MSVC tools and through no fault of the excellent advice of corion++, intrepid++, demerphq++, I gave up and tried to just use the free tools to compile modules with activestate perl ... success!
Here's how:

Caution: these are humongous downloards (hundreds of mb). corion has a script which hopefully he'll share to wget just the needed portions. But here's how I did it with the full (*free* as in beer) downloads:

Download and install Microsoft Visual C++ Toolkit 2003
Download and install Microsoft SDK Platform Update
Download and install .NET redistributable
Download and install .NET SDK
Set environment to recognize the bin/lib/inc dirs for all of those (update these assume the default install paths, but see demerphq's points below for reasons you might want to use short path names instead and for alternate ways to set the environment):
@echo off
set MSVC=C:\Program Files\Microsoft Visual C++ Toolkit 2003
set MSSDK=C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7
set Mstools=C:\Program Files\Microsoft SDK

set INCLUDE=%MSVC%\include;%MSSDK%\include;%Mstools%\include
set LIB=%MSVC%\lib;%MSSDK%\lib;%Mstools%\Lib;
set PATH=%MSVC%\bin;%MSSDK%\bin;%Mstools%\Bin;%Mstools%\Bin\WinNT;%PATH%
Download and install the latest (*free as in speech*) perl binary
Download and install (*free as in speech*) cygwin (or just grab the needed tools - gzip, tar, cat, etc.)
You can now use PPM or CPAN.pm or CPANPLUS or nmake to compile and install (many) CPAN modules ... Enjoy!
update : made URLs into links per bart's suggestion

表明

However, the findings of 。。。 suggest that the connection may be more complex.
但是，。。的发现表明这种联系更加复杂。

On the basis of our data, we suggest that

Taken together, these results suggest that

To accomplish this, we suggest that researchers must clearly address

是否。。。

Thus, it is reasonable to ask whether 。。。

因此，这是非常合理的来问是否。。

it seemed reasonable to ask whether

leading us to ask whether。。。

it makes sense to ask whether。。。

It may now be insufficient to ask whether

we might ask whether it might be。。

矛盾

This apparent paradox has been at least partially explained in recent years through

最近几年，通过。。。这个明显的矛盾至少部分已经被解释。。。

This paradox may be attributable in part to

这个矛盾部分归因于。。。

One discrepancy between .. and .. remains unexplained.

..和..之间的差异（矛盾）仍没有解释

This discrepancy might be explained by ... results

这种不一致可通过。。结果来解释。

The reason for this discrepancy has never been explained adequately.

这种不一致的原因从来没有足够解释。

This discrepancy could be explained by either 1) 。。。 2）

这种不一致可能通过1）。。2）。。来解释。。

The inconsistent results of association studies in behavioral genetics are usually
explained by the fact that ...

行为遗传学关联研究的不一致结果通常通过。。。事实来解释。

The reason is that the universality of contradiction can be explained more briefly

Monday, August 31, 2009

Analyzing mixed (nested) designs in R

Analyzing mixed (nested) designs in R

} I’m trying to understand the differences among the different ways of analyzing mixed models, both in theory and in how they are implemented in various R packages. Specifically, for a classic balanced design where the method of moments/ F ratio approach works, what are the comparable answers from the model-fitting-and-comparison approach? How do the old (nlme) and new (lmer) approaches to this analysis compare? Quinn and Keough [3] have a good discussion of nested designs: they give a worked example based on Andrew and Underwood [1].

Reading data, analyzing it the wrong way (and fixing it by hand)

The following discussion and R code is modified from http://www.stat.sfu.ca/~thompson/stat403-650/complexdesigns.html, by Steve Thompson, Simon Fraser University. Grab data from [1] (from Quinn and Keough’s web site):

> datafile <- "http://www.zoology.unimelb.edu.au/qkstats/chpt9/andrew.csv"
The first three columns (TREAT, PATCH, QUAD) are categorical, the last (ALGAE) is numeric (percent algal cover):

> urchins <- read.csv(file = datafile, colClasses = c(rep("factor",
+ 3), "numeric"))
What do the data look like?

> summary(urchins)
TREAT PATCH QUAD ALGAE
con :20 1 : 5 1:16 Min. : 0.00
rem :20 10 : 5 2:16 1st Qu.: 0.00
t0.33:20 11 : 5 3:16 Median : 5.00
t0.66:20 12 : 5 4:16 Mean :20.26
13 : 5 5:16 3rd Qu.:41.00
14 : 5 Max. :83.00
(Other):50
(plot(urchins) actually does a not-half-bad job of summarizing the experimental design and results with a scatterplot matrix). Fit a two-way, fixed-effect ANOVA:

> lm1 = lm(ALGAE ~ TREAT * PATCH, data = urchins)
> a1 = anova(lm1)
> a1
Analysis of Variance Table

Response: ALGAE
Df Sum Sq Mean Sq F value Pr(>F)
TREAT 3 14429.1 4809.7 16.1075 6.579e-08
PATCH 12 21241.9 1770.2 5.9282 8.323e-07
Residuals 64 19110.4 298.6
The above analysis of variance table has the right mean squares and degrees of freedom, but the wrong F value for testing treatments, because it has used the residual (subsampling) mean square in the denominator. This gives a highly inflated F value and unrealistically small p-value because the variance among nearby quadrats within a patch is small compared to the variability between patches. The correct F value is obtained by dividing the mean square for treatments by the mean square for patches. It is calculated below together with the correct p-value. So the experiment has not shown strong evidence for an effect of sea urchin density on algae cover, even though the pattern of the results is suggestive of such an effect.

> msq <- a1$`Mean Sq`
> fratio <- msq[1]/msq[2]
> fratio
[1] 2.717102
> dfs <- a1$Df
> pf(fratio, dfs[1], dfs[2], lower.tail = FALSE)
[1] 0.091262
Test for significant effect of patches:

> fratio2 <- msq[2]/msq[3]
> fratio2
[1] 5.928207
> pf(fratio2, dfs[2], dfs[3], lower.tail = FALSE)
[1] 8.322613e-07
The "right" way: with ''aov''

Alternatively, we can do this directly with aov:

> a2 <- aov(ALGAE ~ TREAT + Error(PATCH), data = urchins)
> summary(a2)
Error: PATCH
Df Sum Sq Mean Sq F value Pr(>F)
TREAT 3 14429.1 4809.7 2.7171 0.09126
Residuals 12 21242.0 1770.2

Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 64 19110.4 298.6
In this data set PATCH is coded as 1–16 — i.e., distinct values for each treatment type (1–4 are control, 5–8 are 66\ density, etc.), so I don’t actually have to say that they are nested. If they were coded as 1–4 I would have to use Error(TREAT:PATCH) to specify that “patch 1” should not be treated as the same among treatments (if patch 1 were the same in each treatment — e.g. if patch 1 were always the northernmost patch — I would have to specify a crossed design instead). I can do the next level (patch effect) F test by hand:

> fratio = 1770.2/298.6
> pf(fratio, 12, 64, lower.tail = FALSE)
[1] 8.320035e-07
How do I get both levels of error analysis — i.e. the test of MS(patch)/MS(residual) as well — calculated automatically?

Checking assumptions etc.

Other notes: Quinn and Keough say: “there were large difference in within-cell variances. Even the variances among patch means within treatments varied, with very low variances among control patch means. These data are percentages, although an arcsin-[square root transformation] had not effect in improving variance homogeneity, nor did a log transformation. Like Andrew \& Underwood (1993), we analyzed untransformed data, relying on the robustness of tests in balanced ANOVA designs.” Here are the residual values by treatment group and Q-Q plot (which should be a straight line if the residuals are normally distributed). (We don’t have to worry about error structure for these purposes, so we can use plain old lm, not aov.)

> par(mfrow = c(1, 2))
> plot(lm1, which = c(5, 2))
(Quinn and Keough also tried the analysis omitting the control group, and still got the same [non-significant] result for treatment effect, so felt better about the whole thing.) Here are the diagnostic plots if we arcsin-square-root transform:

> par(mfrow = c(1, 2))
> plot(lm(asin(sqrt(ALGAE/100)) ~ TREAT * PATCH, data = urchins),
+ which = c(5, 2))
Or if we log(1+x)-transform:

> par(mfrow = c(1, 2))
> plot(lm(log10(ALGAE + 1) ~ TREAT * PATCH, data = urchins), which = c(5,
+ 2))
Does transforming change the results?

> summary(aov(asin(sqrt(ALGAE/100)) ~ TREAT + Error(PATCH), data = urchins))[[1]]
Df Sum Sq Mean Sq F value Pr(>F)
TREAT 3 3.2830 1.0943 2.8525 0.08181
Residuals 12 4.6037 0.3836
> summary(aov(log(ALGAE + 1) ~ TREAT + Error(PATCH), data = urchins))[[1]]
Df Sum Sq Mean Sq F value Pr(>F)
TREAT 3 72.737 24.246 2.9028 0.07859
Residuals 12 100.229 8.352
Not much.

Using ''lme''

R has another way to do linear mixed models, which allows for more complex designs etc. Pinheiro and Bates recommend this approach, which gives the same result as above:

> library(nlme)
> lme1 = lme(ALGAE ~ TREAT, random = ~1 | PATCH, data = urchins,
+ method = "REML")
> anova(lme1)
numDF denDF F-value p-value
(Intercept) 1 64 18.555081 0.0001
TREAT 3 12 2.717102 0.0913
It is possible (but strongly discouraged) to compare such fits using a likelihood ratio test (from [2], p. 87–88: `Even though a likelihood ratio test for the ML fits of models with different fixed effects can be calculated, we do not recommand using such tests. Such likelihood ratio tests using the standard χ2 reference distribution tend to be “anticonservative”—sometimes quite badly so.’) For example:

> lme1A = lme(ALGAE ~ TREAT, random = ~1 | PATCH, data = urchins,
+ method = "ML")
> lme2A = lme(ALGAE ~ 1, random = ~1 | PATCH, data = urchins, method = "ML")
> anova(lme1A, lme2A)
Model df AIC BIC logLik Test L.Ratio p-value
lme1A 1 6 718.8312 733.1234 -353.4156
lme2A 2 3 721.1250 728.2711 -357.5625 1 vs 2 8.2938 0.0403
almost halves the p value.

With ''lmer''

> detach("package:nlme")
> library(lme4)
> lme1r = lmer(ALGAE ~ TREAT + (1 | PATCH), data = urchins, method = "REML")
> lme1r.ML = lmer(ALGAE ~ TREAT + (1 | PATCH), data = urchins,
+ method = "ML")
> anova(lme1r)
Analysis of Variance Table
Df Sum Sq Mean Sq
TREAT 3 2434.27 811.42
> anova(lme1r.ML)
Analysis of Variance Table
Df Sum Sq Mean Sq
TREAT 3 3245.3 1081.8
Don’t know why these sums of squares/mean square values are so different (even without having a p-value). To get the same values as listed above (but with a warning):

> anova(lmer(ALGAE ~ TREAT + (1 | TREAT), data = urchins, method = "ML"))
Analysis of Variance Table
Df Sum Sq Mean Sq
TREAT 3 14429.1 4809.7
From Doug Bates:

A "p-value" could be formulated from an MCMC sample if we assume
that the marginal distribution of the parameter estimates for beta\_2
and beta\_3 has roughly elliptical contours and you can evaluate that
by, say, examining a hexbin plot of the values in the MCMC sample.
One could take the ellipses as defined by the standard errors and
estimated correlation or, probably better, by the observed standard
deviations and correlations in the MCMC sample. Then determine the
proportion of (beta\_2, beta\_3) pairs in the sample that fall outside
the ellipse centered at the estimates and with that eccentricity and
scaling factors that passes through (0,0). That would be an
empirical p-value for the test.
> mcmcpvalue <- function(samp) {
+ std <- backsolve(chol(var(samp)), cbind(0, t(samp)) - colMeans(samp),
+ transpose = TRUE)
+ sqdist <- colSums(std * std)
+ sum(sqdist[-1] > sqdist[1])/nrow(samp)
+ }
> m1 = mcmcsamp(lme1r, 5000)
> mcmcpvalue(m1[, 2:4])
[1] 0.0848
In the same ballpark as the other methods.

languageR

The languageR has similar methods — the aovlmer.fnc function will produce a (somewhat bogus) ANOVA table based only on the fixed effects degrees of freedom, as well as the MCMC sampling p-value.

> library(languageR)
> aovlmer.fnc(lme1r, noMCMC = TRUE)
Analysis of Variance Table
Df Sum Sq Mean Sq F Df2 p
TREAT 3 2434.27 811.42 2.7174 76.00 0.05
> aovlmer.fnc(lme1r, mcmc = m1, which = 2:4)
$MCMC
$MCMC$p
[1] 0.0848

$MCMC$which
[1] 2 3 4

$Ftests
Analysis of Variance Table
Df Sum Sq Mean Sq F Df2 p
TREAT 3 2434.27 811.42 2.7174 76.00 0.05
> detach("package:lme4")
Randomization tests

A function to permute the sampled data (ALGAE), leaving the experimental design — the first three rows of the data set — intact:

> permdat <- function() {
+ x = cbind(urchins[, 1:3], urchins[sample(nrow(urchins), replace = FALSE),
+ "ALGAE"])
+ names(x)[4] = "ALGAE"
+ x
+ }
The hardest part is digging inside the object for the F statistic:

> getFstat <- function(x) {
+ summary(x)[[1]][[1]]$"F value"[1]
+ }
Permute and run the analysis 5000 times:

> set.seed(1001)
> nsim = 5000
> Fdistrib <- replicate(nsim, getFstat(aov(ALGAE ~ TREAT + Error(PATCH),
+ data = permdat())))
Plot:

> trueFstat = getFstat(a2)
> plot(density(Fdistrib, from = 0), main = "", ylim = c(0, 0.7),
+ xlim = c(0, 10))
> curve(df(x, 3, 12), col = 2, add = TRUE)
> abline(v = qf(0.95, 3, 12), col = 2, lty = 2)
> abline(v = quantile(Fdistrib, 0.95), lty = 2)
> abline(v = trueFstat, lwd = 2)
> legend("topright", c("permutation distrib.", "theoretical F distrib.",
+ "perm. 95% cutoff", "theor. 95% cutoff", "observed F stat"),
+ lty = c(1, 1, 2, 2, 1), col = c(1, 2, 1, 2, 1))
Comparing p-values:

> sum(Fdistrib > trueFstat)/nsim
[1] 0.0882
> summary(a2)[[1]][[1]]$"Pr(>F)"[1]
[1] 0.091262
Bottom line: the ANOVA [[]] conclusion seems very robust in this case!

References

N. L. Andrew and A. J. Underwood. Density-dependent foraging in the sea urchin {Centrostephanus rodgersii} on shallow subtidal reefs in {New South Wales}, {Australia}. MEPS, 99:89–98, 1993.
Jos{\’e} C. Pinheiro and Douglas M. Bates. Mixed-effects models in {S and {S-PLUS}}. Springer, New York, 2000.
Gerry P. Quinn and Michael J. Keough. Experimental Design and Data Analysis for Biologists. Cambridge University Press, Cambridge, England, 2002.
down somewhere \ldots)

Tuesday, August 18, 2009

coverage 2

I would like to suggest the ShortRead and IRange packages in R/Bioconductor. If you have enough memory to load a lane in memory 8GB should be enough, they provide excellent functions to compute per base coverage and many other things.

for example for a solexa export file:
Code:
require(ShortRead)
aln<-readAligned("/path/to/file","filename_export.txt",type="SolexaExport")
# Filtering of reads e.g.:
aln <- aln[alignData(aln)$filtering=="Y" & !is.na(strand(aln)) ]
#Remove duplicated reads
aln<-aln[!srduplicated(aln)]
#Coverage
cvg<-coverage(aln,extend=60L) #in this case reads are extended 60 bp 3'
One can then use the package rtracklayer to export it as a wig
Code:
require(rtracklayer)
export.ucsc(as(cvg,"RangedData")),"test.wig",subformat="wig")
you might need to change the chromosome names afterwards if your original names already contained chr.

coverage

...
[maq-steps]
...
maq pileup -p [your bfa] [your map] > pileup.out

cut -f 2-4 pileup.out > croppedpileup.out

#then launch R
R
#following are R commands
data <-read.table(file="croppedpileup.out",sep="\t",header=F)
colnames(data)<-c("pos","consensus","coverage")
depth<-mean(data[,"coverage"])
# depth now has the mean (overall)coverage
#set the bin-size
window<-101
rangefrom<-0
rangeto<-length(data[,"pos"])
data.smoothed<-runmed(data[,"coverage"],k=window)
png(file="cov_out.png",width=1900,height=1000)
plot(x=data[rangefrom:rangeto,"pos"],y=data.smoothed[rangefrom:rangeto],pch=".", cex=1,xlab="bp position",ylab="depth",type="l")
dev.off()

Monday, August 17, 2009

assess_cluster_multiple

In large indel tool

####################### Xusheng debug 2009/8/17 #######################
# Remove all clusters with < 2 clones
# for($i = 0; $i < (scalar @clusters); $i++){
# if($clusters[$i]{number_of_clones} < 2){
# splice(@clusters, $i, 1);
# $i--;
# }
# }

Thursday, August 6, 2009

asking lettter

“Professor Smith,

As you might remember, after graduating, I moved to Far Away Land, where I have been working as a [Position] at [Organization]. Well, I’ve recently decided to take the plunge and apply to graduate school! The program(s) I am looking at compliment both my undergraduate work and things I have learned and done since then.

Your recommendation was vital in my getting my current position, and I was hoping you would be willing to do the favor once again. The deadline is a month away. Let me know what you think, and I’ll send you all the details.

Read more: http://college-preparation.suite101.com/article.cfm/asking_for_a_letter_of_recommendation#ixzz0NRDHmdTM

Thank You Letter

Sending a Thank You Letter

Saying thank you for a letter of recommendation is important. First, it is the right thing to do. After all, the other person did not have to write a recommendation, but was doing a favor. Second, a person might need to ask someone for more than one letter, so it is a good idea to stay in that person’s good graces. A thank you letter should be sent promptly, but it need not be detailed. A couple of sentences should suffice:

“Dear NAME,

I wanted to let you know how much I appreciate the letter(s) you recently wrote for me. You have been a wonderful teacher/employer, and I am truly thankful for your support as I begin this next stage in my life.

Thank you once again.

Sincerely,

Name.”

Read more: http://college-preparation.suite101.com/article.cfm/asking_for_a_letter_of_recommendation#ixzz0NRCkGZOz

Monday, July 6, 2009

How to Password Protect a Directory on Your Website

Password protecting a directory on your site is actually fairly easy. Webmasters typically want to protect a directory if they have information that they want to make available only to a selected number of people. This guide teaches how you can make a folder on your website accessible only to people with the appropriate password.

If Your Web Host Has a Control Panel
Before you dive into the task of manually password-protecting a directory using Apache's built-in facilities, you might want to check out your web host's control panel to see if they already provide the facility for protecting directories. In my experience, many commercial web hosts already provide an easy way for you to password-protect your directories. If such facility is already available, it's probably best to use it since it will save you time, particularly if you are not familiar with shell command lines and editing of .htaccess files.

Otherwise, read on.

System Requirements
You will need the following before your attempt to password-protect anything is successful.

Your website must be running on an Apache web server.

Your web host must have enabled .htaccess processing - that is, they allow you to customize your web server environment using localized configuration files called .htaccess files.

You must have shell access, either via telnet or Secure Shell (SSH). You should also know how to use telnet or SSH to connect to your web hosting account.

Steps to Protecting a Directory with a Password Using .htaccess on Apache
Create a .htaccess file
Use an ASCII text editor like Notepad to create a text file with the following contents:

AuthName "Secure Area"
AuthType Basic
AuthUserFile /path/to/your/directory/.htpasswd
require valid-user
Note that you will have to modify the above according to your situation. In particular, change:

AuthName
Change "Secure Area" to any name that you like. This name will be displayed when the browser prompts for a password. If, for example, that area is to be accessible only to members of your site, you can name it "Members Only" or the like.

AuthUserFile
You will later create a file containing passwords named .htpasswd. The "AuthUserFile" line tells the Apache web server where it can locate this password file.

Ideally, the password file should be placed outside any directory accessible by visitors to your website. For example, if the main page of your web site is physically located in "/home/your-account-name/public-html/", place your .htpasswd file in (say) /home/your-account-name/.htpasswd. That way, on the off-chance that your host misconfigures your server, your visitors cannot view the .htpasswd contents by simply typing http://www.example.com/.htpasswd.

Wherever you decide to place the file, put the full path of that file after "AuthUserFile". For example, if the directory where you placed the file is /home/your-account-name/.htpasswd, modify that name to "AuthUserFile /home/your-account-name/.htpasswd". Note that your password file need not be named .htpasswd either. It can be any name you wish. For ease of reference, however, this tutorial will assume that you chose ".htpasswd".

AuthType and require
You do not have to modify these. Just copy the lines as they are given above.

Save and Upload the .htaccess file
Save the .htaccess. If you are using Notepad, be sure to save the file as ".htaccess", including the quotes, otherwise Notepad will change the name to ".htaccess.txt" behind your back. Then upload the .htaccess file to the directory that you want to protect.

Set Up the Password File, .htpasswd
Use your telnet or SSH software and log into your shell account.

Be sure that you are in your home directory, not somewhere else. Note that your web directory is probably not your home directory on most commercial web hosts. On servers that use a Unix-type system (like Linux, FreeBSD and OpenBSD), you can usually go to your home directory by simply typing "cd" (without the quotes) followed by the ENTER key (or RETURN key on a Mac). This, by default, will switch you to your home directory. (Note for Windows users - this is different from the Windows/DOS shell, where "cd" only displays the current working directory.)

Then, type the following command:

htpasswd -c .htpasswd your-user-name
where your-user-name is the login name of the user you want to give access. The user name should be a single word without any intervening spaces. You will then be prompted to enter the password for that user. When this is done, the htpasswd utility creates a file called .htpasswd in your current directory (home directory). You can move the file to its final location later, according to where you set the AuthUserFile location in .htaccess.

If you have more than one users, you should create passwords for them as well, but using the following command for each subsequent user:

htpasswd .htpasswd another-user-name
Notice that this time, we did not use the "-c" option. When the "-c" option is not present, htpasswd will look for an existing file by the name given (.htpasswd in our case), and append the new user's password to that file. If you use "-c" for your second user, you will wipe out the first user's entry since htpasswd takes "-c" to mean create a new file, overwriting the existing file if present.

If you are curious about the contents of the file, you can take a look using the following command:

cat .htpasswd
Since the .htpasswd file is a plain text file, with a series of user name and encrypted password pairs, you might see something like the following:

sally:abcdefgHijK12
mary:34567890LMNop
This file has two users "sally" and "mary". The passwords you see will not be the same as the one you typed, since they are encrypted.

Before you quit, you should make sure that permissions on the file are acceptable. To check the permissions, simply type the following on the shell command line:

ls -al .htpasswd
If you see the file with a listing like:

-rw-rw-rw- (...etc...) .htpasswd
it means that the .htpasswd can be read and written by everyone who has an account on the same server as you. The first "rw" means that the owner of the file (you) can read it and write to it. The next "rw" means everyone in the same group as you can read and write the file. The third "rw" means that everyone with an account on that machine can read and write the file.

You don't want anyone else to be able to write to the file except you, since they can then add themselves as a user with a password of their own choosing or other nefarious stuff. To remove the write permission from everyone except you, do this from the shell command line:

chmod 644 .htpasswd
This allows the file to be read and written by you, and only read by others. Depending on how your server is set up, it is probably too risky to change the permissions to prevent others from your group or the world from reading it, since if you do so, the Apache web server will probably not be able to read it either. In any case, the passwords are encrypted, so a cursory glance at the file will hopefully not give away the passwords.

If you have set a different directory for your password file in your .htaccess earlier, you will need to move it there. You can do this from the shell command line as follows:

mv .htpasswd final/location/of/the/file
Remember that your file does not even have to be called .htpasswd. You can name it anything you like. However, if you do, make sure that your AuthUserFile has the same directory and filename or Apache will not be able to locate it.

Testing Your Setup
Once you have completed the above, you should test your set up using your browser to make sure that everything works as intended. Upload a simple index.html file into your protected directory and use your web browser to view it. You should be greeted with a prompt for your user name and password. If you have set everything up correctly, when you enter that information, you should be able to view the index.html file, and indeed any other file in that directory.

A Word of Caution
You should note a few things though, before you go berserk password protecting directories and harbouring the illusion that they can safeguard your data:

The password protection only guards access through the web. You can still freely access your directories from your shell account. So can others on that server, depending on how the permissions are set up in the directories.

It protects directories and not files. Once a user is authenticated for that folder, he/she can view any file in that directory and its descendants.

Passwords and user names are transmitted in the clear by the browser, and so are vulnerable to being intercepted by others.

You should not use this password protection facility for anything serious, like guarding your customer's data, credit card information or any other valuable information. It is basically only good for things like keeping out search engine bots and casual visitors. Remember, your data isn't even encrypted in the directory with this method.

Congratulations
Congratulations. You have now successfully password-protected a directory on your website.

Copyright © 2007 by Christopher Heng. All rights reserved.
Get more free tips and articles like this, on web design, promotion, revenue and scripting, from http://www.thesitewizard.com/.

If you find this article useful, please consider making a donation.

You are here: Top > Apache Configuration and .htaccess Articles > How to Password Protect a Directory on Your Website

thesitewizard™ News Feed (RSS Site Feed)
Do you find this article useful? You can learn of new articles and scripts that are published on thesitewizard.com by subscribing to the RSS feed. Simply point your RSS feed reader or a browser that supports RSS feeds at http://www.thesitewizard.com/thesitewizard.xml. You can read more about how to subscribe to RSS site feeds from my RSS FAQ.

Please Do Not Reprint This Article
This article is copyrighted. Please do not reproduce this article in whole or part, in any form, without obtaining my written permission.

Related Pages
How to Customize Your 404 File Not Found Page
How to Protect Your Images from Bandwidth Theft With .htaccess
How to Accept Credit Cards on Your Website
How to Install and Configure Apache 2 on Windows
Which Web Host Do You Recommend? (FAQ)
How to Use Meta Tags In Search Engine Promotion
How to Register Your Own Domain Name
New Pages
How to Use the Frame Blocking Facility (Anti-Clickjacking Defence) in Internet Explorer 8
How to Add a CAPTCHA Test to Your Feedback Form Script: Reducing Spam in Your Contact Form
How to Point a Domain Name to Your Website (Or What to Do After Buying Your Domain Name)
What Does It Mean to Park a Domain Name? Domain Name Parking Explained
How to Add Images to Your Website in Serif WebPlus X2
Serif WebPlus X2 Tutorial: How to Design Your Website with Serif WebPlus X2
Is it Possible to Use Microsoft Word or Office to Create a Website? If So, How?
How to Transfer / Move Your Website from GeoCities: Closure of GeoCities' Free Web Hosting
How to Upload and Link to a PDF File (or PDF Ebook) in KompoZer and Nvu
What is HTML, CSS, JavaScript, PHP and Perl? Do I Need to Learn Them to Create a Website?
Popular Pages
How to Make / Create Your Own Website: The Beginner's A-Z Guide
Tips on Choosing a Good Domain Name
How to Create a Search Engine Friendly Website
How to Create a Website with Dreamweaver CS4 (Dreamweaver Tutorial)
How to Design and Publish Your Website with KompoZer (free WYSIWYG web editor)
Free Customized Feedback Form Wizard (PHP / Perl Script)

Wednesday, July 1, 2009

Next Generation Seq Tools

Something I came across.

Integrated solutions
* CLCbio Genomics Workbench - de novo and reference assembly of Sanger, 454, Solexa, Helicos, and SOLiD data. Commercial next-gen-seq software that extends the CLCbio Main Workbench software. Includes SNP detection, browser and other features. Runs on Windows, Mac OS X and Linux.

* NextGENe - de novo and reference assembly of Illumina and SOLiD data. Uses a novel Condensation Assembly Tool approach where reads are joined via “anchors” into mini-contigs before assembly. Requires Win or MacOS.

* SeqMan Genome Analyser - Software for Next Generation sequence assembly of Illumina, 454 Life Sciences and Sanger data integrating with Lasergene Sequence Analysis software for additional analysis and visualization capabilities. Can use a hybrid templated/de novo approach. Early release commercial software. Compatible with Windows® XP X64 and Mac OS X 10.4.

Align/Assemble to a reference

* Bowtie - Ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Link to discussion thread here. Written by Ben Langmead and Cole Trapnell.

* ELAND - Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine.

* EULER - Short read assembly. By Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in Genome Research).

* Exonerate - Various forms of alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX.

* GMAP - GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix.

* MOSAIK - Reference guided aligner/assembler. Written by Michael Strömberg at Boston College.

* MAQ - Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre.

* MUMmer - MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required.

* Novocraft - Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X.

* RMAP - Assembles 20 - 64 bp Solexa reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required.

* SeqMap - Works like ELand, can do 3 or more bp mismatches and also INDELs. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS’s.

* SHRiMP - Assembles to a reference sequence. Developed with Applied Biosystem’s colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto.

* Slider- An application for the Illumina Sequence Analyzer output that uses the probability files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences.. Authors are from BCGSC. Paper is here.

* SOAP - SOAP (Short Oligonucleotide Alignment Program). A program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. Author is Ruiqiang Li at the Beijing Genomics Institute. C++ for Unix.

* SSAHA - SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha.

* SXOligoSearch - SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent.

de novo Align/Assemble
* MIRA2 - MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.

* SHARCGS - De novo assembly of short reads. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.

* SSAKE - Version 2.0 of SSAKE (23 Oct 2007) can now handle error-rich sequences. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada’s Michael Smith Genome Sciences Centre. Perl/Linux.

* VCAKE - De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.

* Velvet - Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).

SNP/Indel Discovery
* ssahaSNP - ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence. Highly repetitive elements are filtered out by ignoring those kmer words with high occurrence numbers. More tuned for ABI Sanger reads. Developers are Adam Spargo and Zemin Ning from the Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and Mac

* PolyBayesShort - A re-incarnation of the PolyBayes SNP discovery tool developed by Gabor Marth at Washington University. This version is specifically optimized for the analysis of large numbers (millions) of high-throughput next-generation sequencer reads, aligned to whole chromosomes of model organism or mammalian genomes. Developers at Boston College. Linux-64 and Linux-32.

* PyroBayes - PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences sequencing machines. It was designed to assign more accurate base quality estimates to the 454 pyrosequences. Developers at Boston College.

Genome Annotation/Genome Browser/Alignment Viewer/Assembly Database
* STADEN - Includes GAP4. GAP5 once completed will handle next-gen sequencing data. A partially implemented test version is available here
* EagleView - An information-rich genome assembler viewer. EagleView can display a dozen different types of information including base quality and flowgram signal. Developers at Boston College.

* XMatchView - A visual tool for analyzing cross_match alignments. Developed by Rene Warren and Steven Jones at Canada’s Michael Smith Genome Sciences Centre. Python/Win or Linux.

* SAM - Sequence Assembly Manager. Whole Genome Assembly (WGA) Management and Visualization Tool. It provides a generic platform for manipulating, analyzing and viewing WGA data, regardless of input type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui and Steven Jones at Canada’s Michael Smith Genome Sciences Centre. MySQL backend and Perl-CGI web-based frontend/Linux.

CHiP-Seq/BS-Seq
* FindPeaks - perform analysis of ChIP-Seq experiments. It uses a naive algorithm for identifying regions of high coverage, which represent Chromatin Immunoprecipitation enrichment of sequence fragments, indicating the location of a bound protein of interest. Original algorithm by Matthew Bainbridge, in collaboration with Gordon Robertson. Current code and implementation by Anthony Fejes. Authors are from the Canada’s Michael Smith Genome Sciences Centre. JAVA/OS independent. Latest versions available as part of the Vancouver Short Read Analysis Package

* CHiPSeq - Program used by Johnson et al. (2007) in their Science publication

* BS-Seq - The source code and data for the “Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA Methylation Patterning” Nature paper by Cokus et al. (Steve Jacobsen’s lab at UCLA). POSIX.

* SISSRs - Site Identification from Short Sequence Reads. BED file input. Raja Jothi @ NIH. Perl.

* QuEST - Quantitative Enrichment of Sequence Tags. Sidow and Myers Labs at Stanford. From the 2008 publication Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. (C++)

Alternate Base Calling
* Rolexa - R-based framework for base calling of Solexa data. Project publication

* Alta-cyclic - “a novel Illumina Genome-Analyzer (Solexa) base caller”

Tuesday, June 30, 2009

kraken.nics.tennessee.edu

6022640barcode

Monday, June 22, 2009

GenomicIOLib.pm

###################### Xusheng ############################
# eval($text); Line：1142

Sunday, June 21, 2009

generateStates.pl

"~/Biosoft/CASAVA_1_0/bin/generateStats.pl" 185

Friday, June 19, 2009

CASAVA

Biosoft/CASAVA_1_0/lib/Illumina/Common/TaskManager.pm

############### xusheng #########################
next if(!defined($taskDesc[0]));
################################################

Wednesday, June 10, 2009

rsync

rsync -azuv -e ssh useratoriginserver@xx.xx.xx.xx:public_html/* public_html/

Thursday, May 28, 2009

solid_IP

172.23.25.10

Wednesday, May 27, 2009

Shell命令行下如何查找并替换多个文件中的字符

如果你在shell命令行下，需要一种能快速查找并替换多个文件里字符的方法，
那么下面这行命令你一定要记住或者收藏起来,它能帮到你的。

find . -name '*.html' -print0 | xargs -0 perl -pi -e 's/SEARCHSTRING/REPLACESTRING/g'

注解：
*.html:表示查找所在目录下的所有扩展名为html的文件;
SEARCHSTRING:要查找搜索的字符;
REPLACESTRING:替换后的字符。

记住:如果替换的字符包括 ()[]/"'!? 等等这样的特殊字符，你必须在字符前加上反斜杠\ 。

使用举例:

1.创建一个新目录:

$ mkdir test
$ cd test

2.创建1.html,2.html两个文件，并分别输入i like china :

[root@localhost test]$ vi 1.html

按i进入输入模式,输入：

i like china

按esc键退出输入模式，输入

:wq

退出。

[root@localhost test$] vi 2.html

按i进入输入模式,输入：

you like china

按esc键退出输入模式，输入

:wq

退出。

3.运行替换命令,like感情不够强烈，呵呵，我们这里要将2个文件中的字符 like 换为 love :

[root@localhost test]$ find . -name '*.html' -print0 | xargs -0 perl -pi -e 's/like/love/g'

4.检查替换结果：

[root@localhost test]$ cat 1.html
i love china

[root@localhost test]$ cat 2.html
you love china

(the end)

Monday, May 25, 2009

penguin

penguin.memphis.edu

xwang39

xshwang+78

Tuesday, May 12, 2009

sftp

root@172.21.162.205:/home/xusheng/sequencing_data/$FirstDir/$dir/matching_F3/' "$FirstDir/$dir/matching_F3/$FirstDir_$dir_F3.csfasta.ma.25.2"

Sunday, May 10, 2009

linux支持超过4G内存

顺利安装centos 5.0

但是发现 top下内存显示为3.3g不到点
没有完全识别出4g

查了大量资料后发现

主要需要2个方面设置
1.bios:在bios里开启对大内存的设置
2.安装支持大内存的kernel

centos 5.0 默认安装 for i386的内核不支持 4g+的内存
需要安装上kernel-hugemem
CODE:yum install kernel-hugemem
结果发现 centos 下面没有 kernel-hugemem这个rpm包了

已经改名为kernel-PAE
尝试用yum安装
CODE:yum install kernel-PAE

安装好后还需要手工修改引导
CODE:vi /boot/grub/grub.conf

如果看到如下代码代表PAE内核已经安装好了
title CentOS (2.6.18-8.1.4.el5PAE)
root (hd0,0)
kernel /vmlinuz-2.6.18-8.1.4.el5PAE ro root=LABEL=/
initrd /initrd-2.6.18-8.1.4.el5PAE.img
修改设置为默认启动
CODE:default=0

init 6
重启服务器
再执行top 内存已经是4.1g了

以上全部在centos 5.0系统下执行
希望以后碰到同样问题的朋友们能注意，我也遇到同样的问题，正好再网上找到这篇文章，借花献佛。

Monday, February 2, 2009

computer4zju

10.49.45.177

name:paper
pswang:paperwang

Saturday, January 17, 2009

perl饼文件内容替换

##perl饼文件内容替换

perl -i -p -e'~s/text1/text2/g' filename