Difference between revisions of "Xenopus Genome Project"
|(2 intermediate revisions by one user not shown)|
|Line 24:||Line 24:|
= Web server =
= Web server =
= Assembled transcripts =
= Assembled transcripts =
Latest revision as of 09:06, 14 October 2014
Xenopus laevis is an essential model organism in several areas of biology (see Harland & Grainger, TIG (2011) for review). In addition to the key attributes of these embryos for in vivo imaging, cell-free extracts from Xenopus provide among the most powerful in vitro systems for studies of cell and molecular biology. A complete sequence of the X. laevis genome is an essential resource for accurate identification of peptides for mass-spec analyses, for cloning of an ORFeome, for identifying evolutionarily conserved regulatory regions, and for design of morpholino-oligonucleotides for gene knockdowns.
The Wallingford and Marcotte labs obtained funding from the Texas Institute for Drug and Diagnostic Development (TI3D), in conjunction with projects funded by the National Institutes of Health, to begin sequencing of the X. laevis genome. We began the project with Scott Hunicke-Smith at the University of Texas Genome Sequencing and Analysis facility, with funding sufficient for ~20x coverage of the X. laevis genome using ABI SOLiD next-generation sequencing.
The project rapidly expanded to include de novo reconstruction of X. laevis transcripts, in collaboration with groups around the world donating Illumina Hi-Seq RNA sequencing datasets, coordinating these efforts with genome sequencing by the Harland and Rokhsar groups at UC Berkeley and with Taira and collaborators at the University of Tokyo, Japan. We're posting our intermediate datasets here in advance of publication for use by the wider community. See Xenopus_Genome_Project_Consortium page for the members & contributors of the project.
If you have any question about this data in general, please contact Taejoon Kwon.
You can download X. laevis genomes at XenBase FTP site. Alternatively, you can download them at UTA 'daudin' web repository. The current version is named 'JGI 7.1' at XenBase, and 'JGIv7b' at UTA web repository('daudin'). See XENLA_Genome page for detailed information about each version.
- The main difference between the XenBase/JGI genome and UTA genome is the scaffold name. For example, I renamed 'Scaffold102974' in XenBase/JGI 7.1 genome to 'JGIv7b.000102974'. I converted original scaffold name to prevent confusion in comparison between different versions of the draft genome, because 'Scaffold102974' at XenBase/JGI 6.1 genome is different to 'Scaffold102974' at XenBase/JGI 7.2 genome.
XENLA_JGIv6a.seqlen:JGIv6a.000102974 325 XENLA_JGIv7b.seqlen:JGIv7b.000102974 21560636
- Xenopus laevis - http://daudin.icmb.utexas.edu/XENLA_JGIv72/
- Xenopus tropicalis - http://daudin.icmb.utexas.edu/XENTR_JGIv80/
Official gene annotation/genome browser is served at XenBase.
- Xenopus laevis - http://gbrowse.xenbase.org/fgb2/gbrowse/xl7_1/
- Xenopus tropicalis - http://gbrowse.xenbase.org/fgb2/gbrowse/xt8_0/
Old websites for UT Austin gene model
- http://daudin.icmb.utexas.edu/Mayball/ - UT Austin "MayBall" gene model
- http://daudin.icmb.utexas.edu/Oktoberfest/ - UT Austin "Oktoberfest" gene model
- XENLA_GeneModel2012 - raw sequences from individual projects (before releasing Oktoberfest)
- XENLA_Oktoberfest - released on October, 2012 (code name "Oktoberfest")
- XENLA_Mayball - released on May, 2013 (code name "MayBall")
CHORI-219 BAC sequencing
We started the first runs by sequencing 96 BACs from the CHORI-219 library (vector: pBACGK1.1) at ~100X coverage. The selected BACs include ~70 genes of interest (Shroom3, Wnt5a, Glypican-4, Noggin, Gremlin, Pax6, Formin, etc., as initially identified by the group of Jan-Fang Cheng via probing the CHORI-219 library), as well as 10 BACs that have already been sequenced by the DOE Joint Genome Institute/HudsonAlpha Genome Sequencing Center to serve as positive controls for the sequencing and assembly pipeline.
- CHORI-219 BACs: List of 96 test BACs (MS Excel file)
See /XENLA_SA09023 for more details. Three mate paired libraries were sequenced:
- X_laevis_WG - the X. laevis whole genome library, 5kb insert size - about 4.4GB raw data, 0.4GB high quality data
- X_laevis_2kb - The set of 96 BACs, with 2kb insert size - about 3.6GB raw data, 0.3GB high quality data
- X_laevis_5kb - The set of 96 BACs, with 5kb insert size - about 2.8GB raw data, 0.2GB high quality data
This (very roughly) corresponds to >600X coverage by raw data, ~50X coverage by high quality data, of the BAC set.
- Given that we currently see better mapping of the shotgun SA09023 reads to X. tropicalis than to X. laevis (both to BACs and mRNAs), we're confirming the sample identity before continuing with whole genome sequencing. See the 'sanity check' /Species_Identification for details.
- TXGP_RNAseq_assembly - Current status of TXGP
- XENLA_Genome - Current status of Xenopus genome
- TXGP_reference - Public resources compiled to be used in TXGP.
- TXGP_ens63_reference - Some statistics derived from EnsEMBL-63 (used as a reference in TXGP).
- TXGP_Data_Description - Data collected in TXGP.