Difference between revisions of "MSblender TACC"

From Marcotte Lab
Jump to: navigation, search
(Before you start)
Line 1: Line 1:
 
== Before you start ==
 
== Before you start ==
 
* To use this setting, your TACC account needs to be allocated to our lab project('A-cm10'). If you don't have an account, create it at https://portal.tacc.utexas.edu/. Then, ask Edward to assign your account as a member of lab project.
 
* To use this setting, your TACC account needs to be allocated to our lab project('A-cm10'). If you don't have an account, create it at https://portal.tacc.utexas.edu/. Then, ask Edward to assign your account as a member of lab project.
* Generally, 'longhorn' has shorter queue than 'lonestar'. So, use 'longhorn'.  
+
* This document is for [https://portal.tacc.utexas.edu/user-guides/stampede 'stampede'].  
 
* Always work at $SCRATCH directory, not at /corral or your $HOME.  
 
* Always work at $SCRATCH directory, not at /corral or your $HOME.  
* All source codes are available at /corral/utexas/A-cm10/src.MS/. Personally I use symbolic link for this directory under $HOME so I can use '~' shortcut.
 
<pre>$ ln -s /corral/utexas/A-cm10/src.MS/ ~/src.MS
 
$ ls ~/src.MS </pre>
 
* All packages are installed at /corral/utexas/A-cm10/src.MS/local/.
 
* To run InsPecT, you need to set LD_LIBRARY_PATH for expat library. Type below command before running InsPecT (or put it on '$HOME/.profile_user'
 
<pre> $ export LD_LIBRARY_PATH="/corral/utexas/A-cm10/src.MS/local/lib/" </pre>
 
* Default python interpreter is 2.4 at TACC. You need to load greater than 2.6 as below.
 
<pre> $ module load python</pre>
 
  
== Currently installed packages ==
+
== Install MSblender (and comet, MSGFDB, X!Tandem) ==  
These packages are installed at lonestar.
+
<pre>$ cd ~
* TPP-4.5.1 + X!Tandem (2010.10.01.1)
+
* /corral/utexas/A-cm10/src.MS/local/tppbin/xinteract (integrated wrapper for *Prophet)
+
* /corral/utexas/A-cm10/src.MS/local/tppbin/tandem (X!Tandem with k-score support)
+
* Crux 1.37
+
** /corral/utexas/A-cm10/src.MS/local/bin/crux
+
* Tide 1.0
+
** /corral/utexas/A-cm10/src.MS/local/bin/tide-index
+
** /corral/utexas/A-cm10/src.MS/local/bin/tide-msconvert
+
** /corral/utexas/A-cm10/src.MS/local/bin/tide-search
+
* MSGFDB (20120106)
+
** /corral/utexas/A-cm10/src.MS/MSGFDB/current/MSGFDB.jar
+
* InsPecT (20120109)
+
** /corral/utexas/A-cm10/src.MS/local/bin/inspect
+
 
+
== Install MS-toolbox & MSblender ==
+
* I recommend to install MS-toolbox at your home individually, because everyone may want to use different search parameters.
+
<pre>$ module load git
+
$ cd ~
+
 
$ mkdir git
 
$ mkdir git
 
$ cd git
 
$ cd git
$ git clone git@github.com:marcottelab/MS-toolbox.git
+
$ git clone https://github.com/marcottelab/MSblender.git</pre>
$ git clone git@github.com:marcottelab/MSblender.git</pre>
+
* You don't need to compile MSblender codes under 'src' directory. Executable file is already available at /corral/utexas/A-cm10/src.MS/local/bin/msblender.
+
  
== Let's start ==
+
== Prepare a working space ==
 
<pre>$ module load python
 
<pre>$ module load python
 
$ cd $SCRATCH
 
$ cd $SCRATCH
$ mkdir my-project
+
$ mkdir myProject
$ cd my-project
+
$ cd myProject
$ python ~/git/MS-toolbox/bin/mstb-setup.py</pre>
+
$ mkdir mzXML
 +
$ mkdir DB</pre>
  
It will make five directories (DB, mzXML, RAW, scripts, tmp), and one text file called 'mstb.conf'. Transfer your mzXML files to 'mzXML' directory. I also keep RAW files on the same directory. But it would be good to transfer them to corral & ranch(tape storage) to archive.
+
== Prepare database ==
 
+
* Copy your FASTA file to 'myProject/DB' directory.  
== Setup mstb.conf ==
+
This is master configuration file for all MS-toolbox run. You may need to change DB_* part for your DB files. You can copy the remaining part as below.
+
<pre>DB_NAME        OMRF20110730_XENLA_EGG1_v4.mpep_trypsin_combined
+
DB_FASTA        /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/DB/OMRF20110730_XENLA_EGG1_v4.mpep_trypsin_combined.fasta
+
DB_FASTAPRO    /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/DB/OMRF20110730_XENLA_EGG1_v4.mpep_trypsin_combined.fasta.pro
+
DB_TRIE        /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/DB/OMRF20110730_XENLA_EGG1_v4.mpep_trypsin_combined.trie
+
DB_CRUX_INDEX  /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/DB/OMRF20110730_XENLA_EGG1_v4.mpep_trypsin_combined.crux
+
DB_BLASTDB      /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/DB/your_db.fa
+
DB_DECOY_PREFIX rv_
+
 
+
PATH_TPP        /corral/utexas/A-cm10/src.MS/local/tpp
+
PATH_XINTERACT  /corral/utexas/A-cm10/src.MS/local/tppbin/xinteract
+
PATH_MSCONVERT  /corral/utexas/A-cm10/src.MS/local/tppbin/msconvert
+
 
+
PATH_TANDEMK_EXE                /corral/utexas/A-cm10/src.MS/local/tppbin/tandem.exe
+
PATH_TANDEM2XML                /corral/utexas/A-cm10/src.MS/local/tppbin/tandem.exe
+
PATH_TANDEMK_DEFAULT_PARAM      /corral/utexas/A-cm10/src.MS/local/tppbin/isb_default_input_kscore.xml
+
 
+
PATH_OMSSACL    /usr/local/bin/omssacl
+
 
+
DIR_INSPECT    /corral/utexas/A-cm10/src.MS/inspect/current
+
PATH_INSPECT    /corral/utexas/A-cm10/src.MS/local/bin/inspect
+
PATH_MSGFDB_JAR /corral/utexas/A-cm10/src.MS/MSGFDB/current/MSGFDB.jar
+
 
+
PATH_CRUX      /corral/utexas/A-cm10/src.MS/local/bin/crux</pre>
+
 
+
== Setup your database ==
+
Transfer your FASTA file to 'DB' directory. You need 'combined' database, with target and decoy. It is recommended to use 'reverse' decoy sequences. If you use 'fasta-reverse.py' script on MS-toolbox, it generates reverse sequence with 'rv_'prefix.
+
  
 
<pre>$ python ~/git/MS-toolbox/bin/fasta-reverse.py XENLA_prot_v4.fasta  
 
<pre>$ python ~/git/MS-toolbox/bin/fasta-reverse.py XENLA_prot_v4.fasta  
Line 107: Line 52:
 
* It generates .canno, .cnlcp, .csarr & .cseq files.
 
* It generates .canno, .cnlcp, .csarr & .cseq files.
 
* If you want to use native MS-GFDB function, use -tda 2 (generate target & combined database) with target-only FASTA file.
 
* If you want to use native MS-GFDB function, use -tda 2 (generate target & combined database) with target-only FASTA file.
 +
 +
 +
Copy your mzXML files on this diretory ($SCRATCH/myProject/mzXML).
 +
  
 
== Prepare search ==
 
== Prepare search ==

Revision as of 17:03, 2 March 2015

Contents

Before you start

  • To use this setting, your TACC account needs to be allocated to our lab project('A-cm10'). If you don't have an account, create it at https://portal.tacc.utexas.edu/. Then, ask Edward to assign your account as a member of lab project.
  • This document is for 'stampede'.
  • Always work at $SCRATCH directory, not at /corral or your $HOME.

Install MSblender (and comet, MSGFDB, X!Tandem)

$ cd ~
$ mkdir git
$ cd git
$ git clone https://github.com/marcottelab/MSblender.git

Prepare a working space

$ module load python
$ cd $SCRATCH
$ mkdir myProject
$ cd myProject
$ mkdir mzXML
$ mkdir DB

Prepare database

  • Copy your FASTA file to 'myProject/DB' directory.
$ python ~/git/MS-toolbox/bin/fasta-reverse.py XENLA_prot_v4.fasta 
$ mv XENLA_prot_v4.fasta.target XENLA_prot_v4_combined.fasta
$ cat XENLA_prot_v4.fasta.reverse >> XENLA_prot_v4_combined.fasta
$ head -n 1 XENLA_prot_v4.fasta
>10a1.1|XB-GENE-6077477|AAH55957|33416620
$ head -n 1 XENLA_prot_v4.fasta.reverse 
>rv_nadkd1|XB-GENE-991229|AAI46629|148921623 

DB setup for X!tandem

 $~/src.MS/local/bin/fasta_pro.exe (my combined fasta file) 

It makes an index file with '.pro' suffix after your FASTA filename.

 $~/src.MS/local/bin/fasta_pro.exe XENLA_prot_v4_combined.fasta 
fasta_pro file conversion utility, v. 2006.09.15
 input path = XENLA_prot_v4_combined.fasta
output path = XENLA_prot_v4_combined.fasta.pro
db type = plain

DB setup for Crux

 $~/src.MS/local/bin/crux create-index --enzyme trypsin --missed-cleavages 2 --peptide-list T --decoys none (my combined fasta file) (my index name)
  • If you want to use Crux function separately (or other embeded post-processing tool, i.e. percolator or q-ranker), you should use FASTA file with target sequence only, with certain decoy option (default option is protein-shuffle, but peptide-shuffle would be better.)
  • 'peptide-list' is optional.
  • Trypsin digestion pattern in Crux is '[KR]|{P}', so it does not cut K/R if the next AA is P. If you want to ignore this 'Proline' constraint, you can use '--custom-enzyme "[KR]|[X]"' instead of '--enzyme trypsin'.

DB setup for InsPecT

 $~/src.MS/inspect/current/PrepDB.py FASTA (my fasta file)
  • It makes an index file with '.trie' suffix after your FASTA filename.

DB setup for MSGFDB

$ java -cp ~/src.MS/MSGFDB/current/MSGFDB.jar msdbsearch.BuildSA -d (my FASTA file) -tda 0
  • It generates .canno, .cnlcp, .csarr & .cseq files.
  • If you want to use native MS-GFDB function, use -tda 2 (generate target & combined database) with target-only FASTA file.


Copy your mzXML files on this diretory ($SCRATCH/myProject/mzXML).


Prepare search

$ python ~/git/MS-toolbox/bin/prepare-tandemK.py 
Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK.
Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK/tandem-taxonomy.xml.
Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK/20110713_XENLA_Egg1_1.tandemK.xml
...

TandemK is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-tandemK.sh.
$ python ~/git/MS-toolbox/bin/prepare-inspect.py 
Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/inspect.
Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/inspect/20110713_XENLA_Egg1_1.inspect_in.
Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/inspect/20110713_XENLA_Egg1_2.inspect_in.
...

InsPecT is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-inspect.sh.
$ python ~/git/MS-toolbox/bin/prepare-MSGFDB.py 
Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/MSGFDB.
20110713_XENLA_Egg1_1.mzXML
20110713_XENLA_Egg1_2.mzXML
....

MSGFDB is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-MSGFDB.sh.

Run search

In a standalone workstation, you can run ./script/run-(search_engine).sh directly to start. But you shouldn't do this in TACC login terminal. Put the following parameters on each run-*.sh script, then submit a job by qsub.

  • If you use lonestar, replace '4way 8' to '8way to 24'. See Lonestar user guide and Longhorn user guide for detail.
  • Don't forget to put your email address at -M.
  • Put short job name to check the status easily.
#!/bin/bash
#$ -V                   # Inherit the submission environment
#$ -cwd                 # Start job in submission directory
#$ -j y                 # Combine stderr and stdout
#$ -o $JOB_NAME.o$JOB_ID
#$ -pe 4way 8
#$ -q long
#$ -l h_rt=24:00:00     # Run time (hh:mm:ss)
#$ -M (your email)
#$ -m be                # Email at Begin and End of job
#$ -P hpc
set -x

#$ -N (job name)
(put the remaining part of run-* script after #!/bin/bash line)