Difference between revisions of "Texas Xenopus Genome Project/Species Identification"
From Marcotte Lab
Line 1: | Line 1: | ||
− | == | + | == Select candidate sequences == |
* Download ''X. tropicalis'' mRNA sequences from XenBase (Nov. 27, 2009 version). | * Download ''X. tropicalis'' mRNA sequences from XenBase (Nov. 27, 2009 version). | ||
** [[:xdata:ID/XENTR_mRNA.xenbase20091127.fasta.gz]] 17 MB, gzipped. | ** [[:xdata:ID/XENTR_mRNA.xenbase20091127.fasta.gz]] 17 MB, gzipped. | ||
Line 8: | Line 8: | ||
** [[:xdata:ID/XENTR_mRNA.XENLA_CH219.blat_pslx.gz]] 1.2 MB, gzipped. | ** [[:xdata:ID/XENTR_mRNA.XENLA_CH219.blat_pslx.gz]] 1.2 MB, gzipped. | ||
** [[:xdata:XENTR_mRNA.XENTR_CH216.blat_pslx.gz]] 20 MB, gzipped. | ** [[:xdata:XENTR_mRNA.XENTR_CH216.blat_pslx.gz]] 20 MB, gzipped. | ||
+ | * Parse two BLAT output files with the following criteria. | ||
+ | *# From ''X. tropicalis'' mRNA, only RefSeq (starts sith 'NM_') sequences are considered. | ||
+ | *# Select ''X. tropicalis'' mRNA sequences which hit both CHORI-219 and CHORI-216 (minimum match length is 200 bp to be called as a 'hit'). | ||
+ | *# Survey each hit blocks. If the same mRNA fragment hits both CHORI-219 and CHORI-216, report three sequences: the query sequence from ''X. tropicalis'' mRNA, the target sequence from CHORI-219 BACs (''X. laevis'') and the target sequence from CHORI-216 BACs (''X. tropicalis''). ONE hit block is reported. | ||
+ | <pre> | ||
+ | >XENTR_NM_001142220_0 gi|213983084|ref|NM_001142220| | ||
+ | ttatttgtgccctgggtacccctggaactatagcggggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaaggggg | ||
+ | cccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtacccctggaactatagcagggtgactgttacccc | ||
+ | aatgtttctatatatctgtaaccttgttatgggctaagggggcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtg | ||
+ | ccctgggtacccctggaactatagcagggtgac | ||
+ | >XENTR_CH216-2E23_0 | ||
+ | tcaccccaaatccccccctaactggccttcaggctgggcccccttagctcataacaaggttacagatatatagaaacattggggtaacagtca | ||
+ | ccccgctatagttccaggggtacccagggcacaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccccttagccca | ||
+ | taacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggcacaaataagcactcaccccaa | ||
+ | atc | ||
+ | >XENLA_CH219-20I13_0 | ||
+ | ttatttgtgccctggatacccctggaactatagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttattagctaaggggg | ||
+ | cccagtctgaaggtcagttagggggagatttggggtgagggcttatttgtaccctgggtacccctggaactatagcagggtgactgttacccc | ||
+ | aatgtttctatatatctgtaaccttgttatgagctaagggggcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtg | ||
+ | ccctggttacccctggaactatagcagggtgac | ||
+ | </pre> | ||
---- | ---- | ||
[[Category:XenopusGenome]] | [[Category:XenopusGenome]] |
Revision as of 11:02, 9 December 2009
Select candidate sequences
- Download X. tropicalis mRNA sequences from XenBase (Nov. 27, 2009 version).
- xdata:ID/XENTR_mRNA.xenbase20091127.fasta.gz 17 MB, gzipped.
- Download CHORI-216 sequences (from XenBase) and CHORI-219 sequences (from NCBI GenBank).
- xdata:ID/XENTR_CH216.fasta.gz 1.2 MB, gzipped. (CHORI-216 sequences. 160 BAC sequences from X. tropicalis genome)
- xdata:ID/XENLA_CH219.fasta.gz 6.5 MB, gzipped. (CHORI-219 sequences. 29 BAC sequences from X. laeves genome)
- Run BLAT (with default option) to known CHORI BAC sequences.
- xdata:ID/XENTR_mRNA.XENLA_CH219.blat_pslx.gz 1.2 MB, gzipped.
- xdata:XENTR_mRNA.XENTR_CH216.blat_pslx.gz 20 MB, gzipped.
- Parse two BLAT output files with the following criteria.
- From X. tropicalis mRNA, only RefSeq (starts sith 'NM_') sequences are considered.
- Select X. tropicalis mRNA sequences which hit both CHORI-219 and CHORI-216 (minimum match length is 200 bp to be called as a 'hit').
- Survey each hit blocks. If the same mRNA fragment hits both CHORI-219 and CHORI-216, report three sequences: the query sequence from X. tropicalis mRNA, the target sequence from CHORI-219 BACs (X. laevis) and the target sequence from CHORI-216 BACs (X. tropicalis). ONE hit block is reported.
>XENTR_NM_001142220_0 gi|213983084|ref|NM_001142220| ttatttgtgccctgggtacccctggaactatagcggggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaaggggg cccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtacccctggaactatagcagggtgactgttacccc aatgtttctatatatctgtaaccttgttatgggctaagggggcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtg ccctgggtacccctggaactatagcagggtgac >XENTR_CH216-2E23_0 tcaccccaaatccccccctaactggccttcaggctgggcccccttagctcataacaaggttacagatatatagaaacattggggtaacagtca ccccgctatagttccaggggtacccagggcacaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccccttagccca taacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggcacaaataagcactcaccccaa atc >XENLA_CH219-20I13_0 ttatttgtgccctggatacccctggaactatagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttattagctaaggggg cccagtctgaaggtcagttagggggagatttggggtgagggcttatttgtaccctgggtacccctggaactatagcagggtgactgttacccc aatgtttctatatatctgtaaccttgttatgagctaagggggcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtg ccctggttacccctggaactatagcagggtgac