Thursday, November 27, 2008
Monday, November 24, 2008
Thursday, November 20, 2008
Our analysis is based on three sources of data: human genomic sequence assemblies (5), human ESTs from the UniGene database (31) and human EST library information. Human genomic assembly sequences (accession no. NT_XXXX) and ‘draft’ BAC clone sequences (accession nos ACXXXX, ALXXXXX) were downloaded from NCBI (ftp://ftp. ncbi.nih.gov/genomes/H_sapiens and ftp://ftp.ncbi.nih.gov/ genbank/gbhtgXX.seq.gz). Human ESTs and library information were downloaded from UniGene (ftp://ftp.ncbi.nih.gov/repository/UniGene). Additional EST library information about human tissue sources was obtained from the NCBI Library Browser, downloaded from www.ncbi.nlm.nih.gov/UniGene/lbrowse.cgi?ORG=Hs. The work described in this paper is based on the January 2002 release of the human genome and UniGene data.
Alignment construction. The goal of the first step is to obtain a set of native EST alignments to the genomic template. After the genomic locus for a RefSeq transcript is identified, the genomic sequence containing the genic region and up to 20 kb extensions at both ends is extracted. This genomic template is searched against dbEST usingWU-BLASTN (Gish 1996–2000) and sequences of high-scoring EST hits are aligned to the genomic template using sim4 (Florea et al. 1998).
2.
Gene structure prediction. In the second step, genomic EST alignments with near-identity are used to infer the exon/intron structures. Although TAP can predict genes simultaneously on both strands, for simplicity we herein describe the gene prediction results with respect to the plus strand for each RefSeq gene. First, the splice pair, donor, and acceptor splice junctions that define the boundaries of an intron, are inferred from segmentation patterns in EST alignments and screened according to splice site patterns. The test set contains 1007 multiexon genes with 8879 pairs of known splice junctions.TAP correctly identified 5111 known splice pairs, yielding a sensitivity of 58%. Separately, PASS scans the genomic sequence for poly-A sites by clustering 3′ ESTs. For 290 RefSeq sequences with known poly-A sites, PASS scored a 84.5% sensitivity. Second, mutually exclusive splicing patterns are resolved by selecting the “predominant” splice pairs, according to EST coverage, and a joint gene structure for the entire genomic region is assembled from individual splice pairs. The EST-based connectivity between two adjacent splice pairs is examined to define exons and to delineate gaps in EST coverage. Finally, the gene boundaries are defined by segmenting the joint gene structure into individual genes at inferred intergenic regions (Fig. 1).
3.
Evaluation. The predicted gene structure are compared with the known gene structures to evaluate the accuracy of TAP. (A Web-based interface to TAP and the reconstruction results for 1124 RefSeq genes are available at http://stl.wustl.edu/∼zkan/TAP/.)
Alignment construction. The goal of the first step is to obtain a set of native EST alignments to the genomic template. After the genomic locus for a RefSeq transcript is identified, the genomic sequence containing the genic region and up to 20 kb extensions at both ends is extracted. This genomic template is searched against dbEST usingWU-BLASTN (Gish 1996–2000) and sequences of high-scoring EST hits are aligned to the genomic template using sim4 (Florea et al. 1998).
2.
Gene structure prediction. In the second step, genomic EST alignments with near-identity are used to infer the exon/intron structures. Although TAP can predict genes simultaneously on both strands, for simplicity we herein describe the gene prediction results with respect to the plus strand for each RefSeq gene. First, the splice pair, donor, and acceptor splice junctions that define the boundaries of an intron, are inferred from segmentation patterns in EST alignments and screened according to splice site patterns. The test set contains 1007 multiexon genes with 8879 pairs of known splice junctions.TAP correctly identified 5111 known splice pairs, yielding a sensitivity of 58%. Separately, PASS scans the genomic sequence for poly-A sites by clustering 3′ ESTs. For 290 RefSeq sequences with known poly-A sites, PASS scored a 84.5% sensitivity. Second, mutually exclusive splicing patterns are resolved by selecting the “predominant” splice pairs, according to EST coverage, and a joint gene structure for the entire genomic region is assembled from individual splice pairs. The EST-based connectivity between two adjacent splice pairs is examined to define exons and to delineate gaps in EST coverage. Finally, the gene boundaries are defined by segmenting the joint gene structure into individual genes at inferred intergenic regions (Fig. 1).
3.
Evaluation. The predicted gene structure are compared with the known gene structures to evaluate the accuracy of TAP. (A Web-based interface to TAP and the reconstruction results for 1124 RefSeq genes are available at http://stl.wustl.edu/∼zkan/TAP/.)
Monday, November 17, 2008
Sunday, November 16, 2008
Tuesday, November 4, 2008
error : did not return a true value
The "did not return a true value" error usually means that a module didn't have a positive value at the end such as the usual "1;".
my own perl package
[xusheng@b82570 UT_script_test]$ PERL5LIB=/home/xusheng/UTScript/
[xusheng@b82570 UT_script_test]$ export PERL5LIB
[xusheng@b82570 UT_script_test]$ export PERL5LIB
Subscribe to:
Posts (Atom)