USDA ARS VCRU Data Server This web site, https://www.vcru.wisc.edu/sdata, contains data from the laboratory of Philipp W. Simon, USDA-ARS Vegetable Crops Research Unit [Click here for our web page] |
Programs in the bb project are now stored on GitHub at https://github.com/dsenalik/bb
Reference. If you use this software, you may cite using this reference:
Massimo Iorizzo, Douglas Senalik, Marek Szklarczyk, Dariusz Grzebelus, David Spooner and Philipp Simon
De novo assembly of the carrot mitochondrial genome using next generation sequencing of whole genomic DNA
provides first evidence of DNA transfer into an angiosperm plastid genome
BMC Plant Biology 2012, 12:61
Download. Download bb.454contignet here - current version 1.0.7, May 4, 2012
Overview. This is a Perl program that will take an assembly of Roche 454 sequences generated by the Roche newbler/gsAssembler, and use the connection information to link generated contigs into a graphical map.
Description. A large amount of information about connections between various contigs in the gsAssembler assembly is contained in the 454ContigGraph.txt file generated by gsAssembler. I suggest looking at Lex Nederbragt's excellent description of the 454ContigGraph.txt file for more information about this file.
Thanks to Simon Gladman for adapting bb.454contignet to handle paired-end runs.
Use the parameters
A prerequisite for this program is the availablility of the
This is probably already installed on a standard Linux installation, but if not, the graphviz web size is
http://www.graphviz.org/
On Fedora you would just type
or on Ubuntu
Important. An important point during the gsAssembler assembly is to save
all contigs, no matter how small. Sometimes a very small,
even as small as 1 b.p contig can be found connecting two larger contigs, so discarding small contigs
could generate unnecessary gaps. Or, small indels between alleles could manifest themselves as very
small contigs. So, when creating your assembly in gsAssembler, make sure to set the
minimum contig size to
Example. Here is an example of a de novo chloroplast and mitochondrial genome assembly
from a single region (half of a plate) of 454 whole-genome shotgun sequence:
This assembly and image were used, after some manual enhancements, in the publication
cited above.
Example data files and commands:
Color names are listed at http://www.graphviz.org/doc/info/colors.html
Here is the full program syntax, which you can obtain by typing the name of the program with no parameters:
bb.454contignet version 1.0.7 Required parameters: --indir=xxx path to 454 assembly directory --outfile=xxx output text file of results --contig=xxx[,xxx]... one or more starting contig numbers, separated by comma, or multiple --contig parameters may be used. Use just the numeric portion of the contig Optional parameters: --type=xxx output file format, default is "png" ( anything besides "png" is experimental ) --cmdfile=xxx graphviz command file in .dot language will be created using this name. If not specified, a temporary command file will be created, and it will be deleted when done --imgfile=xxx graph image file will be created with this name. If not specified, will be --outfile with .png extension added --fastaout=xxx create a FASTA file of all contigs in the output, save in this file --abyssexplorer=xxx Generate a .dot file that can be used for visualization with ABySS-Explorer 1.3.0, http://www.bcgsc.ca/platform/bioinfo/software/abyss-explorer --flowthrough include connection information derived from reads that flow through more than two contigs --flowbetween[=x] include connection information derived from reads that flow from one contig into another by default, if the distance value is zero, it will not be shown, the optional value for this parameter is a minimum distance, defaulting to 1, set to --flowbetween=0 to show these links also --pairlinks include connection information derived from paired end reads, only applicable for assemblies containing paired end reads --alllinks sets --flowthrough, --flowbetween, and --pairlinks --tag=tagname,contig[,contig]... list of 1 or more contigs will be given this tag. Multiple --tag allowed. tagname is a text label that will be shown in the final image, e.g. --tag="ATP1,14,34" --label a synonym for --tag --showbp show length in b.p. in graph --shownt a synonym for --showbp --showcoverage show average contig read coverage in graph --color=colorname,contig[,contig]... like --tag, but color the contig. for list of valid color names see http://www.graphviz.org/doc/info/colors.html --forcelink=xxx-5:yyy-3 force a link where none exists between specified ends, xxx and yyy are contig numbers --level=xxx maximum recursion level, default=2 --boldabove=xxx lines with read coverage >= this value will be drawn in bold. no default value --exclude=xxx[,xxx]... one or contigs to never traverse past, for example a repeated region contig --listexcluded print out a list of which excluded contigs are being ignored --invert=xxx[,xxx]... one or more contigs to plot backwards on the graph, i.e. 3' to 5' direction --extend=xxx auto extension for the single best path, value is maximum steps, default=0 --lowlimit=xxx ignore connections < this number of reads --highlimit=xxx ignore connections > this number of reads --len=xxx len parameter to neato, default=1 --nolabel disable highlighting of dead ends, and limit of recursion contigs --overlapmode neato paramter, default is false, one of none, true, scale --nospline disable spline when edges would overlap --help print this screen --quiet only print error messages --debug print extra debugging information In place of lists of contigs, you can use @filename to read in values for that parameter from a file, e.g. --exclude=@excl.txt This program requires that the graphviz program "neato" be available in the default PATH. The graphviz web site is http://www.graphviz.org/
Version 1.0.7 adds experimental support for generation of a
Some other keywords for search engines: Roche 454 graph image, graph structure, edges, contig linkages, contig links, contig network, linked contigs, fork
Download bb.454contiginfo here - current version 1.0, March 21, 2012
This is a Perl program that will take an assembly of Roche 454 sequences generated by the Roche newbler/gsAssembler, and displays all information for one or more specified contigs, in particular, the connection and read flowthrough information.
Here is the full program syntax, which you can obtain by typing the name of the program with no parameters:
bb.454contiginfo version 1.0 This program analyzes some of the output files from a 454 assembly to find out everything available for a particular contig. This information is all contained in the 454ContigGraph.txt file in the assembly directory. Required parameters: --infile=xxx input 454 assembly directory, or path to 454ContigGraph.txt file --contig=xxx contig to analyze ( multiple allowed ) use just the number e.g. --contig=123 or multiple numbers with , or ; as separator, e.g. --contig=123,16389;599 --outfile=xxx output file name, use "-" for stdout Optional parameters: --showscaffold if contig is part of a scaffold, list all contigs and gaps in that scaffold --help print this screen --quiet only print error messages --debug print extra debugging information
Download bb.motif here - current version 1.0, June 14, 2010
This program was used to generate a supplemental file for the publication:
Marina Iovene, Pablo F. Cavagnaro, Douglas Senalik, C. Robin Buell, Jiming Jiang and Philipp W. Simon
Comparative FISH mapping of Daucus species (Apiaceae family)
Chromosome Research Volume 19, Number 4, 493-506, DOI: 10.1007/s10577-011-9202-y
A copy of the output file from the above publication: 10577_2011_9202_MOESM2_ESM.txt
This program will take one or more sequences in a FASTA file, and look for your specified motif sequence in them. Required parameters: --motif=xxx nucleotide sequence of the motif to find --infile=xxx name of input FASTA file, multiple allowed --outfile=xxx name of summary file to create Optional parameters: --tbl2asnfile=xxx create a feature table for tbl2asn import --tempdir=xxx save intermediate files in this directory. If not specified, temporary files are not kept --expect=xxx expect value for blast, default = 10.0 --debug debugging mode=extra info printed, keep temp files --help print this screen
Download bb.orffinder here - current version 1.3.0 - Apr 1, 2013
This is a Perl program that will computationally detect
open reading frames in DNA or RNA sequences in FASTA format.
This is computationally similar to the NCBI program at
http://www.ncbi.nlm.nih.gov/gorf/orfig.cgi,
but allows command-line automation of the process, as well as a few additional features.
This program will detect open reading frames in FASTA DNA or RNA sequences. This is similar to the NCBI program at http://www.ncbi.nlm.nih.gov/gorf/orfig.cgi Required parameters: --infile=xxx input file name --outfile=xxx output file name, use "-" for stdout Optional parameters: --fullstart use full set of start codons: ATG GTG CTG TTG the default is to only use ATG --anystart start of sequence is also a valid orf start --minlen=xxx minimum orf length in b.p., default=100 --guessorientation guess orientation based on strand with the most total orfs, this data will be be saved instead of the list of orfs --fasta output file is in FASTA format, each orf is a separate sequence, information is in header --nonorffasta second FASTA file with all sequence not in the first one. File name is --outfile name with "nonorf" inserted --fastacollapse combine overlapping sequence in the FASTA file --fastalargest if two orfs overlap, keep only the larger one --trimheader remove any text in the FASTA header after the first occurrence of white space --origorder return list in sequence order instead of the default which is sorted by size --origorder=s same, but do + and - strands separately --nsequence include a column with nucleotide sequence --psequence include a column with protein sequence --gffformat generate output in gff3 format. This also enables --trimheader --featureid=xxx column 3 of gff file, default is "CDS" --non do not allow any "N"s in an orf --help print this screen --quiet only print error messages --debug print extra debugging information
Download bb.fastareorder here - current version 1.0, September 3, 2011
This program will allow changing the order or orientation of multiple sequences in FASTA format, or extraction of a subset of sequences. The resulting sequences can optionally be concatenated into a single sequence.
bb.fastareorder Version 1.0 Rearrange the order of sequences in a FASTA file based on your specified contigs and orientations Required parameters: --infile=xxx input FASTA file name with multiple sequences --outfile=xxx output file name, or "-" for stdout --seq=xxx sequences to keep, multiple allowed, a plus "+" for forward orientation is optional, or use "-" anywhere to indicate reverse complement. Use ".." to indicate a range. Use "," to separate entries. Examples: --seq=contig45 --seq=46,49,-21..23 --seq=+32 --seq=-65 --seq=76- --seq=00021..45- -seq=45+..47 or use "s" for a spacer of 20 Ns e.g. --seq=00021+,S,45- The --seq parameter may be omitted if --exclude or --random is used instead Optional parameters: --exclude=xxx use this in place of the --seq parameter to output all contigs except these. Order will be unchanged from the original file in this case. --random=xxx return this many sequences selected at random and placed in random order --coordinates create this output file, which will store the starting and ending position of each contig --onesequence concatenate all sequences into one --blankline for --onesequence mode, put a blank line between each sequence --prefix=xxx if using --onesequence, use this prefix, default = "concatenated" --append append to existing --outfile --startstop add starting and ending base position to FASTA headers --noqual if a .qual file is present, a corresponding output .qual file is created. This flag turns off this quality file processing --help print this screen --quiet only print error messages --debug print extra debugging information
MITOFY was not created by us, but we provide a public web server
that can be used to run a MITOFY analysis.
This page can be accessed at
VCRU MITOFY Public Web Server
A download link may be found on that page.
The MITOFY home page is
http://dogma.ccbb.utexas.edu/mitofy/
This page last modified Monday, 11-Aug-2014 20:16:55 CDT