This is a local copy of the iTAK v1.2 manual file "manual.txt"
Return
iTAK (current version: v1.2 -08/26/11) Introduction iTAK is a program to identify plant transcription factors (TFs), transcriptional regulators (TRs) and protein kinases (PKs) from protein or nucleotide sequences and then classify individual TFs, TRs and PKs into gene families. Identification and classification of TFs and TRs are based on the rules (required and forbidden pfam protein domains of each gene family) described in Perez-Rodriguez et al (2010) (http://nar.oxfordjournals.org/content/38/suppl_1/D822.full). More than sixty family of TFs and twenty families of TRs have been characterized in plants. A list of these families can be accessed from the PlnTFDB database (http://plntfdb.bio.uni-potsdam.de/v3.0/). Plant protein kinases are identified if the sequences have significant hit to the protein kinase domain (PF00069) in the Pfam database (http://pfam.sanger.ac.uk/). Protein kinase families were adopted from the PlantsP database (http://plantsp.genomics.purdue.edu/html/family/class.html) with additions of several newly identified families/subfamilies in plants, e.g., WNK like kinase - with no lysine kinase, Male germ cell-associated kinase (mak). Protein sequences of each kinase family were obtained from the PlantsP database and used to build Hidden Markov Models (HMMs). The identified plant protein kinases are classified into gene families based by comparing their sequences to these HMMs. System requirement and dependencies Linux (required) Perl version 5.10.0 or higher (required). Perl was installed by default on most Linux systems BioPerl version 1.006 or higher (required). Please check http://www.bioperl.org and wiki/Installing_BioPerl for more details on installation of BioPerl. HMMER3 (required). Provided in iTAK. 2.0 GB free disk space for installation. Release notes iTAK v1.2 - 06/26/11 iTAK v1.1 - 06/03/11 iTAK v1.0 - 10/10/10 Installation Installation of iTAK is straightforward. First download the latest version of iTAK for your system and uncompress the downloaded file. It will generate a folder named "iTAK-1.0.x32" on a 32-bit machine or "iTAK-1.0.x64" on a 64-bit machine (we call this folder "iTAK home folder"). iTAK home folder includes three subfolders, a "bin" folder containing the HMMER3 executable, a "database" folder containing the domain database files and a "doc" folder containing the program documentation files. The home folder also contains a perl script, iTAK.pl, which is the core script to run the whole iTAK pipeline. Next download the formatted domain database files from the download page. Uncompress and move the database files to the "database" folder. Running iTAK Quick Start 1. Put the protein or nucleotide sequence file in FASTA format in iTAK home folder 2. Go to iTAK home folder and run iTAK with the following command (assuming the input file name is input_seq) >perl iTAK.pl -i input_seq 3. The program will generate an output folder named input_seq_output which contains all the output files. See below for the description of the output files. Parameters -i [String] Name of the input sequence file in FASTA format (required) -s [String] Type of input sequence. 'p' for protein sequences | 'n' for nucleotide sequences. (default = p) -m [String] Type of analysis ('t' for TF identification | 'p' for PK identification | 'b' for both) (default = b) -a [Integer] number of CPUs used for hmmscan (default = 1) -o [String] Name of the output directory (default = "input file name" + "_output") Output files Six files will be generated in the output directory. 1. input_seq_tf_seq: transcription factor sequences (FASTA format). 2. input_seq_tf_family: transcription factor classificcation. A tab-delimited txt file containing sequences IDs and their corresponding transcription factor families. 3. input_seq_tf_align: A tab-delimited txt file containing parsed hmmscan result of transcription factors. 4. input_seq_pkseq: protein kinase sequences (FASTA format). 5. input_seq_pkcat: protein kinases classification. A tab-delimited txt file containing sequence IDs and their corresponding protein kinase families. 6. input_seq_pkaln: A tab-delimited txt file containing parsed hmmscan result of protein kinases. Performance We run iTAK on a single CPU on a 32-bit laptop with Intel Core2 Duo P9400 @ 2.40GHz and 4GB memory. It took aout 3.5 hours to identify both transcription factors and protein kinases from 33,410 Arabidopsis protein sequences (TAIR9 release). Download Current version of iTAK is v1.2. It's available for both 32- and 64-bit linux systems. iTAK can be downloaded from the ftp server: ftp://bioinfo.bti.cornell.edu/pub/program/itak/ Contact For questions and suggestions, please contact us at bioinfo@cornell.edu