Zorro: The Masked Assembler

Introduction
Zorro is based on the minimus2 pipeline (AMOS package) and uses MuMMer, 
AMOS and bowtie in its internals. Zorro takes 2 contigs fasta files as 
input (representing assembled contigs from a whole genome assembly) 
and one fasta file containing some of the reads used for assembly 
(only 10X coverage is enough, more will slow down the pipeline and 
consume more resources).

Zorro initial phase detect inconsistencies in the assemblies and split 
the contigs where they occur. Next, zorro counts k-mers (default k=22) 
in the reads and use the k-mer count table to detect and mask repeats 
in both assembly1 and assembly2. After repeat masking, zorro uses nucmer 
to detect overlaps between assembly1 and assembly2 (no overlaps between 
contigs from the same assembly are allowed). All overlaps found in this 
phase are expected to be between unique regions (because repeats are 
masked). The overlaps are used to layout and generate consensus for the 
merged contigs, using AMOS tools. The merged contigs are built using the 
unmasked contigs, so the final merged assembly should include the repeat 
regions.

Another round of assembly, less stringent, tries to merge contigs that 
were not included in the first Zorro phase. All the contigs are outputted 
to <prefix>.ZORRO.fasta. We recommend the use of SSPACE to scaffold the 
ZORRO contigs.

QUICK GUIDE
Usage:
perl zorro.pl -1 assembly1.fasta -2 assembly2.fasta -r reads.fasta

Both assembly fasta files should include contigs (not scaffolds). Use split_at_Ns.pl to break scaffolds into contigs for Zorro.

Reads fasta file should contanin part of the reads used for the assembly (10X is enough). We typically use a subset of the 454 reads, but Illumina is also OK.

BUGS
Please report problems to the ZORRO discussion list at googlegroups.
http://groups.google.com/group/zorro-assembler?hl=en

AUTHORS
Gustavo Gilson Lacerda Costa
Ramon Oliveira Vidal
Marcelo Falsarella Carazzolle