Carrot root and DNA VCRU Bioinformatics USDA ARS Vegetable Crops Research Unit

This page was last updated on Friday, 10-Mar-2023 19:33:40 CST

Installation notes for RepeatExplorer version 2016-08-22 fd28551

Home Page

Installation Instructions http://repeatexplorer.umbr.cas.cz/static/html/help/manual.html#installation

Another person's installation notes
http://williams-lab.wikidot.com/repeatexplorer

Prerequisites

Installation

  1. $ cd /programinstallers
  2. Download CDD database
    $ wget -N ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/cddid.tbl.gz
  3. $ wget -N ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/Cdd_LE.tar.gz
  4. $ cd /usr/local/bin
  5. $ hg clone https://bitbucket.org/repeatexplorer/repeatexplorer
  6. Compile louvain
    $ cd repeatexplorer/louvain && make && cd ..
  7. Install CDD database
    $ mkdir cdd && cd cdd
  8. $ tar -zxvf /programinstallers/Cdd_LE.tar.gz
  9. $ cp -puv /programinstallers/cddid.tbl.gz . && gunzip cddid.tbl.gz
  10. $ cd ..
  11. Edit configuration
    $ chmod +x config.sh
  12. $ nano config.sh
    Edit paths
    Despite the comment below, REPEAT_MASKER= must be defined
    GALAXY_DIR=/dev/null   #  this variable is not neccessary for command line version 
    # USE ABSOLUTE PATHS
    
    # directory with RepeatMAsker installation 
    export REPEAT_MASKER=/usr/local/bin/RepeatMasker/RepeatMasker                              # set according your local installation, if RepeatMasker is in path you comment the line out
    # Conserved domain database files :location:
    export RPSBLAST_DATABASE=/usr/local/bin/repeatexplorer/cdd                            # set according your local installation  
    export RPSBLAST_DATABASE_ANNOTATION=/usr/local/bin/repeatexplorer/cdd/cddid.tbl       # set according your local installation
    
    
    PATH=${ROOT}/parallel/src:$PATH
    export PATH
    export TGICL=${ROOT}/tgicl_linux 
    # directory with louvain clustering executables:
    export PROG_COMMUNITY=${ROOT}/louvain      		# make sure that you have compiled source using make!
    export OGDF=${ROOT}/OGDF/runOGDFlayout
    export JSLIB=$ROOT/umbr_programs/interactive_graph/js             # DO NOT MODIFY
    export DOMAINDATABASE=$ROOT/tool-data/domains/TE_domains_newest.fasta   # DO NOT MODIFY!
    export DOMAIN_TYPES=$ROOT/tool-data/domains/classification_newest.csv   # DO NOT MODIFY!
    export MIN_MINCL=20  #limit for minimal size of the MINCL, DO NOT MODIFY
    export MAXEDGES=350000000   # depends on computer memory  determine maximal amount of data which could be process DO NOT MODIFY
    export MAXEDGES_FOR_LAYOUT=25000000 # adjusted down for fmmm layout
    export MAXNODES_FOR_LAYOUT=50000
    export PROC_LAYOUT=4
    # recommendations 16G RAM then MAXEDGES~341576829
  13. $ ./config.sh
  14. To fix the error in output:
    pipeline version:
    failed to get version number!
    This is clustering pipeline

    do this
    $ echo "2016-08-22 - fd28551" > /usr/local/bin/repeatexplorer/umbr_programs/version.log
  15. The file .../tool-data/dna_database/dna_database.fasta is missing, I found it here https://bitbucket.org/repeatexplorer/dna_database/src
    obtain with
    $ cd /usr/local/bin/repeatexplorer/tool-data && hg clone https://bitbucket.org/repeatexplorer/dna_database/src dna_database
    requesting all changes
    adding changesets
    adding manifests
    adding file changes
    added 4 changesets with 11 changes to 7 files                                                        
    updating to branch default
    6 files updated, 0 files merged, 0 files removed, 0 files unresolved
  16. $ ls dna_database/
    total 142780
    -rw-rw-r-- 1 admin admin       156 Nov 21 10:14 README.txt
    -rw-rw-r-- 1 admin admin 117057027 Nov 21 10:14 dna_database.fasta
    -rw-rw-r-- 1 admin admin     10772 Nov 21 10:14 dna_database.fasta.nin
    -rw-rw-r-- 1 admin admin    102307 Nov 21 10:14 dna_database.fasta.nhr
    -rw-rw-r-- 1 admin admin  28903195 Nov 21 10:14 dna_database.fasta.nsq
    -rw-rw-r-- 1 admin admin    122733 Nov 21 10:14 dna_database_codes.csv
  17. Run the test scripts in the tests directory
    create this script to do all tests
    #!/bin/bash
    
    cd /usr/local/bin/repeatexplorer/tests
    rm ../test_data/test_dir/* -rf
    
    ./test1.sh &> test1.log
    ./test2.sh &> test2.log
    ./test3.sh &> test3.log
    ./test4.sh &> test4.log
    ./test5_interlacers.sh &> test5.log
    ./test6_interlacers.sh &> test6.log
    ./test7.sh &> test7.log
    ./test8_fastq_filtering_and_clustering.sh &> test8.log
    ./test9_fastq_filtering.sh &> test9.log
    ./test10_fastq_filtering_and_clustering_example.sh &> test10.log
    ./test11_gl_layouts.sh &> test11.log
  18. Results of the tests, test5 should have non-zero values for three tests, it is testing using the wrong input files and should have a non-zero result.
    test7 has an error in the last test, run13 (typo in echo statement says run12), because the file test_data/cluster2merge3.txt is missing, and this file is not in the repository. Is this a bug? I suspect the "3" in the file name should not be there, there is a file test_data/cluster2merge.txt.
    Correcting this error changes exit status to 0 $ grep 'exit status' *.log
    10_fastq_filtering_and_clustering_example.sh.log: exit status : 0
    test_test10_fastq_filtering_and_clustering_example.sh.log: exit status : 0
    test_test10_fastq_filtering_and_clustering_example.sh.log: exit status : 0
    test_test10_fastq_filtering_and_clustering_example.sh.log: exit status of clustering: 0
    test_test11_gl_layouts.sh.log: exit status : 0
    test_test1.sh.log: exit status of run1: 0
    test_test1.sh.log: exit status of run2: 0
    test_test2.sh.log: exit status of run1: 0
    test_test2.sh.log: exit status of run2: 0
    test_test3.sh.log: exit status of run1: 0
    test_test3.sh.log: exit status of run2: 0
    test_test4.sh.log: exit status of run7: 0
    test_test4.sh.log: exit status of run8: 0
    test_test4.sh.log: exit status of run9: 0
    test_test5_interlacers.sh.log: exit status: 0
    test_test5_interlacers.sh.log: exit status: 0
    test_test5_interlacers.sh.log: exit status: 0
    test_test5_interlacers.sh.log: exit status: 1
    test_test5_interlacers.sh.log: exit status: 1
    test_test5_interlacers.sh.log: exit status: 1
    test_test6_interlacers.sh.log: exit status: 0
    test_test7.sh.log: exit status of run10: 0
    test_test7.sh.log: exit status of run11: 0
    test_test7.sh.log: exit status of run12: 0
    test_test7.sh.log:exit status:1
    test_test7.sh.log: exit status of run13: 0
    test_test8_fastq_filtering_and_clustering.sh.log: exit status : 0
    test_test8_fastq_filtering_and_clustering.sh.log: exit status : 0
    test_test8_fastq_filtering_and_clustering.sh.log: exit status : 0
    test_test8_fastq_filtering_and_clustering.sh.log: exit status : 0
    test_test8_fastq_filtering_and_clustering.sh.log: exit status : 0
    test_test8_fastq_filtering_and_clustering.sh.log: exit status : 0
    test_test8_fastq_filtering_and_clustering.sh.log: exit status of clustering: 0
    test_test8_fastq_filtering_and_clustering.sh.log: exit status of merging: 0
    test_test9_fastq_filtering.sh.log: exit status : 0
    test_test9_fastq_filtering.sh.log: exit status : 0
    test_test9_fastq_filtering.sh.log: exit status : 0
    test_test9_fastq_filtering.sh.log: exit status : 0
    test_test9_fastq_filtering.sh.log: exit status : 0
    test_test9_fastq_filtering.sh.log: exit status : 0
    test_test9_fastq_filtering.sh.log: exit status : 0
    test_test9_fastq_filtering.sh.log: exit status : 0
    test_test9_fastq_filtering.sh.log: exit status : 0