Coral 1.3
Contact: leena.salmela@cs.helsinki.fi

--------
Overview
--------

Coral is a program to correct sequencing errors in short-read
high-throughput data, such as those generated by Illumina Genome
Analyzer and Roche 454 Genome Sequencer.

---------
Reference
---------

L. Salmela and J. Schröder: "Correcting Errors in Short Reads by
Multiple Alignments", To appear in Bioinformatics (also in HiTSeq
2011).

-------------------
System Requirements
-------------------

Coral has been tested on systems running Linux on an x86_64
architecture. Compiling the program requires gcc.

------------
Installation
------------

Unpack coral-1.3.tar.gz.
Run make in directory coral-1.3.

-----
Usage
-----

Usage: ./coral -f[q,s] <input file> -o <output file> [options]
Required parameters:
-f or -fq or -fs <file> Use -f for a fasta file, -fq for standard fastq file
                        and -fs for Solexa fasta file
-o <file>               Output file for corrected reads. Format is fasta if
                        the input file is fasta and fastq if the input file
                        is fastq.

Quick options for sequencing technologies:
-454                    Equal to: -mr 2 -mm 2 -g 3
-illumina               Equal to: -mr 1 -mm 1 -g 1000


Other options:
-k <int>                The length of a k-mer for indexing the reads. [21]
-e <float>              Maximum allowed error rate of a read in the multiple
                        alignment. [0.07]
-t <float>              The minimum proportion of agreeing reads to consider
                        the consensus of a multiple alignment trustworthy.
                        [0.75]
-q <float>              A threshold for proportion of quality values for bases
                        agreeing with the consensus to consider the consensus
                        trustworthy. Not used if input is fasta. [0.75]
-cq [<int>]             If the integer parameter is not present, old quality
                        scores are retained for corrected mismatches. If the
                        integer parameter is present, the quality scores of
                        all corrected bases are set to that value. By default
                        quality scores of corrected bases are computed as
                        explained in the paper.
-a <int>                Maximum number of reads to compute multiple alignments
                        for. [1000]
-p <int>                Number of threads to use. [8]
-r <int>                The number of times the k-mer index is built during a
                        correction run.
-s <file>               Write statistics of computed alignments to a file. By
                        default statistics are not written.
-c <file>               Write the consensus sequences of computed alignments
                        to a file. By default the consensus sequences are not
                        written.
-mr <int>               Reward for matching bases when computing alignments.
                        [2]
-mm <int>               Penalty for mismatches when computing alignments. [2]
-g <int>                Gap penalty when computing alignments. If gap penalty
                        is higher than 100 times mismatch penalty, it is
                        assumed that no gaps will occur. [3]

Options are processed in the order they are given and the last value
for a given parameter will be used. For example giving options -454 -g 2
will use a gap penalty of 2.

-------
Example
-------

coral -fq reads.fastq -o corrected-reads.fastq

------------------
New in Version 1.3
------------------

The comments in the outputted fasta/fastq file now include the
comments of the original file.

The consensus sequence fasta file comments are reformated.

The base read of an alignment is now limited to 50 k-mers. If the
read is longer, it is split into length/50 parts to be used as base
reads.

Some types have been changed from 32-bit integers to 64-bit integers
to avoid potential overflow with very large read sets.

------------------
New in Version 1.2
------------------

The hash table construction now adapts to the size of the data. Thus
the -m option is no longer needed.

The comment lines starting with a + in fastq files are now read
correctly.

------------------
New in Version 1.1
------------------

The thresholds -t and -q now specify proportions of reads or quality
scores rather than absolute values as in version 1.0.

Using the -r option the k-mer index can be rebuilt given number of
times during a correction run.

The reads are now read in two passes, first counting them and then
reading them to memory. Thus the -n option is no longer needed.

The defaults for k-mer length and maximum error rate have been
changed.