Difference between revisions of "MSblender TACC"

From Marcotte Lab
Jump to: navigation, search
(Setup mstb.conf)
Line 109: Line 109:
  
 
== Prepare search ==
 
== Prepare search ==
 
 
<pre>$ python ~/git/MS-toolbox/bin/prepare-tandemK.py  
 
<pre>$ python ~/git/MS-toolbox/bin/prepare-tandemK.py  
 
Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK.
 
Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK.
Line 133: Line 132:
  
 
MSGFDB is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-MSGFDB.sh.</pre>
 
MSGFDB is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-MSGFDB.sh.</pre>
 +
 +
== Run search ==
 +
In a standalone workstation, you can run ./script/run-(search_engine).sh directly to start. But you shouldn't do this in TACC login terminal. Put the following parameters on each run-*.sh script, then submit a job by qsub.
 +
 +
* If you use lonestar, replace '4way 8' to '8way to 24'. See [[http://www.tacc.utexas.edu/user-services/user-guides/lonestar-user-guide Lonestar user guide]] and [[http://www.tacc.utexas.edu/user-services/user-guides/longhorn-user-guide Longhorn user guide]] for detail.
 +
* Don't forget to put your email address at -M.
 +
 +
<pre>#!/bin/bash
 +
#$ -V                  # Inherit the submission environment
 +
#$ -cwd                # Start job in submission directory
 +
#$ -j y                # Combine stderr and stdout
 +
#$ -o $JOB_NAME.o$JOB_ID
 +
#$ -pe 4way 8
 +
#$ -q long
 +
#$ -l h_rt=24:00:00    # Run time (hh:mm:ss)
 +
#$ -M (your email)
 +
#$ -m be                # Email at Begin and End of job
 +
#$ -P hpc
 +
set -x
 +
 +
(put the remaining part of run-* script after #!/bin/bash line) </pre>

Revision as of 13:50, 14 February 2012

Contents

Before you start

  • To use this setting, your TACC account needs to be allocated to our lab project('A-cm10'). If you don't have an account, create it at https://portal.tacc.utexas.edu/. Then, ask Edward to assign your account as a member of lab project.
  • Generally, 'longhorn' has shorter queue than 'lonestar'. So if you don't need large memory(> 4GB), use 'longhorn'.
  • Always work at $SCRATCH directory, not at /corral or your $HOME.
  • All source codes are available at /corral/utexas/A-cm10/src.MS/. Personally I use symbolic link for this directory under $HOME so I can use '~' shortcut.
$ ln -s /corral/utexas/A-cm10/src.MS/ ~/src.MS 
$ ls ~/src.MS 
  • All packages are installed at /corral/utexas/A-cm10/src.MS/local/.
  • To run InsPecT, you need to set LD_LIBRARY_PATH for expat library. Type below command before running InsPecT (or put it on '$HOME/.profile_user'
 $ export LD_LIBRARY_PATH="/corral/utexas/A-cm10/src.MS/local/lib/" 
  • Default python interpreter is 2.4 at TACC. You need to load 2.7.1 as below.
 $ module load python

Currently installed packages

These packages are installed at lonestar.

  • TPP-4.5.1 + X!Tandem (2010.10.01.1)
  • /corral/utexas/A-cm10/src.MS/local/tppbin/xinteract (integrated wrapper for *Prophet)
  • /corral/utexas/A-cm10/src.MS/local/tppbin/tandem (X!Tandem with k-score support)
  • Crux 1.37
    • /corral/utexas/A-cm10/src.MS/local/bin/crux
  • Tide 1.0
    • /corral/utexas/A-cm10/src.MS/local/bin/tide-index
    • /corral/utexas/A-cm10/src.MS/local/bin/tide-msconvert
    • /corral/utexas/A-cm10/src.MS/local/bin/tide-search
  • MSGFDB (20120106)
    • /corral/utexas/A-cm10/src.MS/MSGFDB/current/MSGFDB.jar
  • InsPecT (20120109)
    • /corral/utexas/A-cm10/src.MS/local/bin/inspect

Install MS-toolbox & MSblender

  • I recommend to install MS-toolbox at your home individually, because everyone may want to use different search parameters.
$ module load git
$ cd ~
$ mkdir git
$ cd git
$ git clone git@github.com:marcottelab/MS-toolbox.git
$ git clone git@github.com:marcottelab/MSblender.git
  • You don't need to compile MSblender codes under 'src' directory. Executable file is already available at /corral/utexas/A-cm10/src.MS/local/bin/msblender.

Let's start

$ module load python
$ cd $SCRATCH
$ mkdir my-project
$ cd my-project
$ python ~/git/MS-toolbox/bin/mstb-setup.py

It will make five directories (DB, mzXML, RAW, scripts, tmp), and one text file called 'mstb.conf'. Transfer your mzXML files to 'mzXML' directory. I also keep RAW files on the same directory. But it would be good to transfer them to corral & ranch(tape storage) to archive.

Setup mstb.conf

This is master configuration file for all MS-toolbox run. You may need to change DB_* part for your DB files. You can copy the remaining part as below.

DB_NAME         OMRF20110730_XENLA_EGG1_v4.mpep_trypsin_combined
DB_FASTA        /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/DB/OMRF20110730_XENLA_EGG1_v4.mpep_trypsin_combined.fasta
DB_FASTAPRO     /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/DB/OMRF20110730_XENLA_EGG1_v4.mpep_trypsin_combined.fasta.pro
DB_TRIE         /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/DB/OMRF20110730_XENLA_EGG1_v4.mpep_trypsin_combined.trie
DB_CRUX_INDEX   /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/DB/OMRF20110730_XENLA_EGG1_v4.mpep_trypsin_combined.crux
DB_BLASTDB      /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/DB/your_db.fa
DB_DECOY_PREFIX rv_

PATH_TPP        /corral/utexas/A-cm10/src.MS/local/tpp
PATH_XINTERACT  /corral/utexas/A-cm10/src.MS/local/tppbin/xinteract
PATH_MSCONVERT  /corral/utexas/A-cm10/src.MS/local/tppbin/msconvert

PATH_TANDEMK_EXE                /corral/utexas/A-cm10/src.MS/local/tppbin/tandem.exe
PATH_TANDEM2XML                 /corral/utexas/A-cm10/src.MS/local/tppbin/tandem.exe
PATH_TANDEMK_DEFAULT_PARAM      /corral/utexas/A-cm10/src.MS/local/tppbin/isb_default_input_kscore.xml

PATH_OMSSACL    /usr/local/bin/omssacl

DIR_INSPECT     /corral/utexas/A-cm10/src.MS/inspect/current
PATH_INSPECT    /corral/utexas/A-cm10/src.MS/local/bin/inspect
PATH_MSGFDB_JAR /corral/utexas/A-cm10/src.MS/MSGFDB/current/MSGFDB.jar

PATH_CRUX       /corral/utexas/A-cm10/src.MS/local/bin/crux

Setup your database

Transfer your FASTA file to 'DB' directory. You need 'combined' database, with target and decoy. It is recommended to use 'reverse' decoy sequences. If you use 'fasta-reverse.py' script on MS-toolbox, it generates reverse sequence with 'rv_'prefix.

$ python ~/git/MS-toolbox/bin/fasta-reverse.py XENLA_prot_v4.fasta 
$ mv XENLA_prot_v4.fasta.target XENLA_prot_v4_combined.fasta
$ cat XENLA_prot_v4.fasta.reverse >> XENLA_prot_v4_combined.fasta
$ head -n 1 XENLA_prot_v4.fasta
>10a1.1|XB-GENE-6077477|AAH55957|33416620
$ head -n 1 XENLA_prot_v4.fasta.reverse 
>rv_nadkd1|XB-GENE-991229|AAI46629|148921623 

DB setup for X!tandem

 $~/src.MS/local/bin/fasta_pro.exe (my combined fasta file) 

It makes an index file with '.pro' suffix after your FASTA filename.

 $~/src.MS/local/bin/fasta_pro.exe XENLA_prot_v4_combined.fasta 
fasta_pro file conversion utility, v. 2006.09.15
 input path = XENLA_prot_v4_combined.fasta
output path = XENLA_prot_v4_combined.fasta.pro
db type = plain

DB setup for Crux

 $~/src.MS/local/bin/crux create-index --enzyme trypsin --missed-cleavages 2 --peptide-list T --decoys none (my combined fasta file) (my index name)
  • If you want to use Crux function separately (or other embeded post-processing tool, i.e. percolator or q-ranker), you should use FASTA file with target sequence only, with certain decoy option (default option is protein-shuffle, but peptide-shuffle would be better.)
  • 'peptide-list' is optional.
  • Trypsin digestion pattern in Crux is '[KR]|{P}', so it does not cut K/R if the next AA is P. If you want to ignore this 'Proline' constraint, you can use '--custom-enzyme "[KR]|[X]"' instead of '--enzyme trypsin'.

DB setup for InsPecT

 $~/src.MS/inspect/current/PrepDB.py FASTA (my fasta file)
  • It makes an index file with '.trie' suffix after your FASTA filename.

DB setup for MSGFDB

$ java -cp ~/src.MS/MSGFDB/current/MSGFDB.jar msdbsearch.BuildSA -d (my FASTA file) -tda 0
  • It generates .canno, .cnlcp, .csarr & .cseq files.
  • If you want to use native MS-GFDB function, use -tda 2 (generate target & combined database) with target-only FASTA file.

Prepare search

$ python ~/git/MS-toolbox/bin/prepare-tandemK.py 
Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK.
Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK/tandem-taxonomy.xml.
Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK/20110713_XENLA_Egg1_1.tandemK.xml
...

TandemK is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-tandemK.sh.
$ python ~/git/MS-toolbox/bin/prepare-inspect.py 
Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/inspect.
Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/inspect/20110713_XENLA_Egg1_1.inspect_in.
Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/inspect/20110713_XENLA_Egg1_2.inspect_in.
...

InsPecT is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-inspect.sh.
$ python ~/git/MS-toolbox/bin/prepare-MSGFDB.py 
Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/MSGFDB.
20110713_XENLA_Egg1_1.mzXML
20110713_XENLA_Egg1_2.mzXML
....

MSGFDB is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-MSGFDB.sh.

Run search

In a standalone workstation, you can run ./script/run-(search_engine).sh directly to start. But you shouldn't do this in TACC login terminal. Put the following parameters on each run-*.sh script, then submit a job by qsub.

#!/bin/bash
#$ -V                   # Inherit the submission environment
#$ -cwd                 # Start job in submission directory
#$ -j y                 # Combine stderr and stdout
#$ -o $JOB_NAME.o$JOB_ID
#$ -pe 4way 8
#$ -q long
#$ -l h_rt=24:00:00     # Run time (hh:mm:ss)
#$ -M (your email)
#$ -m be                # Email at Begin and End of job
#$ -P hpc
set -x

(put the remaining part of run-* script after #!/bin/bash line)