Difference between revisions of "TXGP ens63 reference"

From Marcotte Lab
Jump to: navigation, search
Line 1: Line 1:
= Introduction =
+
= Overview =
 +
One of the most interesting questions we can ask with ''X. laevis'' genome would be how many genes this frog have. To construct gene models, we are mainly focusing on de novo transcriptome assembly approach with our RNA-seq data, by using [http://www.ebi.ac.uk/~zerbino/velvet/ velvet]+[http://www.ebi.ac.uk/~zerbino/oases/ oases]. However, many de novo transcriptome assembly program generates 'false positive' transcripts. Also, because of allotetraploidy in ''X. laevis'', transcriptome data may contain many transcript variants for each gene. So, to estimate the gene model from transcriptome data, it would be helpful to combine all transcripts derived from the same gene, and analyze them separately. Sequence-based clustering is natural way to do this, but we need to optimize parameters, such as %identity to define a cluster. 
  
The goals of TXGP RNA-seq analysis is (1)to construct gene models in ''X. laevis'', and (2) estimate their expression levels in different tissues. However, many de novo transcriptome assembly program generates 'false positive' transcript models. Also, because we do not have a genome sequence yet, it is hard to derive gene models from transcripts directly (esp. in allotetraploidy condition). So we are using [[http://www.ensembl.org EnsEMBL]] annotation as a reference of gene model construction.
+
To get some ideas for this, we have looked at genes and transcripts of several well-studied organisms.  
  
 
= Genes & Transcripts =
 
= Genes & Transcripts =

Revision as of 11:10, 13 October 2011

Overview

One of the most interesting questions we can ask with X. laevis genome would be how many genes this frog have. To construct gene models, we are mainly focusing on de novo transcriptome assembly approach with our RNA-seq data, by using velvet+oases. However, many de novo transcriptome assembly program generates 'false positive' transcripts. Also, because of allotetraploidy in X. laevis, transcriptome data may contain many transcript variants for each gene. So, to estimate the gene model from transcriptome data, it would be helpful to combine all transcripts derived from the same gene, and analyze them separately. Sequence-based clustering is natural way to do this, but we need to optimize parameters, such as %identity to define a cluster.

To get some ideas for this, we have looked at genes and transcripts of several well-studied organisms.

Genes & Transcripts

ens63_gene_tx.small.png

= Clustering of transcripts to gene