This is a local copy of the iTAK v1.2 manual file "manual.txt"
Return
iTAK (current version: v1.2 -08/26/11)
Introduction
iTAK is a program to identify plant transcription factors (TFs), transcriptional
regulators (TRs) and protein kinases (PKs) from protein or nucleotide sequences
and then classify individual TFs, TRs and PKs into gene families. Identification
and classification of TFs and TRs are based on the rules (required and forbidden
pfam protein domains of each gene family) described in Perez-Rodriguez et al
(2010) (http://nar.oxfordjournals.org/content/38/suppl_1/D822.full). More than
sixty family of TFs and twenty families of TRs have been characterized in
plants. A list of these families can be accessed from the PlnTFDB database
(http://plntfdb.bio.uni-potsdam.de/v3.0/).
Plant protein kinases are identified if the sequences have significant hit to
the protein kinase domain (PF00069) in the Pfam database (http://pfam.sanger.ac.uk/).
Protein kinase families were adopted from the PlantsP database
(http://plantsp.genomics.purdue.edu/html/family/class.html) with additions of
several newly identified families/subfamilies in plants, e.g., WNK like kinase -
with no lysine kinase, Male germ cell-associated kinase (mak). Protein sequences
of each kinase family were obtained from the PlantsP database and used to build
Hidden Markov Models (HMMs). The identified plant protein kinases are classified
into gene families based by comparing their sequences to these HMMs.
System requirement and dependencies
Linux (required)
Perl version 5.10.0 or higher (required). Perl was installed by default on most
Linux systems
BioPerl version 1.006 or higher (required). Please check http://www.bioperl.org
and wiki/Installing_BioPerl for more details on installation of BioPerl.
HMMER3 (required). Provided in iTAK.
2.0 GB free disk space for installation.
Release notes
iTAK v1.2 - 06/26/11
iTAK v1.1 - 06/03/11
iTAK v1.0 - 10/10/10
Installation
Installation of iTAK is straightforward. First download the latest version of
iTAK for your system and uncompress the downloaded file. It will generate a
folder named "iTAK-1.0.x32" on a 32-bit machine or "iTAK-1.0.x64" on a 64-bit
machine (we call this folder "iTAK home folder"). iTAK home folder includes
three subfolders, a "bin" folder containing the HMMER3 executable, a "database"
folder containing the domain database files and a "doc" folder containing the
program documentation files. The home folder also contains a perl script,
iTAK.pl, which is the core script to run the whole iTAK pipeline. Next download
the formatted domain database files from the download page. Uncompress and move
the database files to the "database" folder.
Running iTAK
Quick Start
1. Put the protein or nucleotide sequence file in FASTA format in iTAK home
folder
2. Go to iTAK home folder and run iTAK with the following command (assuming the
input file name is input_seq)
>perl iTAK.pl -i input_seq
3. The program will generate an output folder named input_seq_output which
contains all the output files. See below for the description of the output
files.
Parameters
-i [String] Name of the input sequence file in FASTA format (required)
-s [String] Type of input sequence. 'p' for protein sequences | 'n' for
nucleotide sequences. (default = p)
-m [String] Type of analysis ('t' for TF identification | 'p' for PK
identification | 'b' for both) (default = b)
-a [Integer] number of CPUs used for hmmscan (default = 1)
-o [String] Name of the output directory (default = "input file name" +
"_output")
Output files
Six files will be generated in the output directory.
1. input_seq_tf_seq: transcription factor sequences (FASTA format).
2. input_seq_tf_family: transcription factor classificcation. A tab-delimited
txt file containing sequences IDs and their corresponding transcription
factor families.
3. input_seq_tf_align: A tab-delimited txt file containing parsed hmmscan result
of transcription factors.
4. input_seq_pkseq: protein kinase sequences (FASTA format).
5. input_seq_pkcat: protein kinases classification. A tab-delimited txt file
containing sequence IDs and their corresponding protein kinase families.
6. input_seq_pkaln: A tab-delimited txt file containing parsed hmmscan result
of protein kinases.
Performance
We run iTAK on a single CPU on a 32-bit laptop with Intel Core2 Duo P9400 @
2.40GHz and 4GB memory. It took aout 3.5 hours to identify both transcription
factors and protein kinases from 33,410 Arabidopsis protein sequences (TAIR9
release).
Download
Current version of iTAK is v1.2. It's available for both 32- and 64-bit linux
systems. iTAK can be downloaded from the ftp server:
ftp://bioinfo.bti.cornell.edu/pub/program/itak/
Contact
For questions and suggestions, please contact us at bioinfo@cornell.edu