Coral 1.3 Contact: leena.salmela@cs.helsinki.fi -------- Overview -------- Coral is a program to correct sequencing errors in short-read high-throughput data, such as those generated by Illumina Genome Analyzer and Roche 454 Genome Sequencer. --------- Reference --------- L. Salmela and J. Schröder: "Correcting Errors in Short Reads by Multiple Alignments", To appear in Bioinformatics (also in HiTSeq 2011). ------------------- System Requirements ------------------- Coral has been tested on systems running Linux on an x86_64 architecture. Compiling the program requires gcc. ------------ Installation ------------ Unpack coral-1.3.tar.gz. Run make in directory coral-1.3. ----- Usage ----- Usage: ./coral -f[q,s] -o [options] Required parameters: -f or -fq or -fs Use -f for a fasta file, -fq for standard fastq file and -fs for Solexa fasta file -o Output file for corrected reads. Format is fasta if the input file is fasta and fastq if the input file is fastq. Quick options for sequencing technologies: -454 Equal to: -mr 2 -mm 2 -g 3 -illumina Equal to: -mr 1 -mm 1 -g 1000 Other options: -k The length of a k-mer for indexing the reads. [21] -e Maximum allowed error rate of a read in the multiple alignment. [0.07] -t The minimum proportion of agreeing reads to consider the consensus of a multiple alignment trustworthy. [0.75] -q A threshold for proportion of quality values for bases agreeing with the consensus to consider the consensus trustworthy. Not used if input is fasta. [0.75] -cq [] If the integer parameter is not present, old quality scores are retained for corrected mismatches. If the integer parameter is present, the quality scores of all corrected bases are set to that value. By default quality scores of corrected bases are computed as explained in the paper. -a Maximum number of reads to compute multiple alignments for. [1000] -p Number of threads to use. [8] -r The number of times the k-mer index is built during a correction run. -s Write statistics of computed alignments to a file. By default statistics are not written. -c Write the consensus sequences of computed alignments to a file. By default the consensus sequences are not written. -mr Reward for matching bases when computing alignments. [2] -mm Penalty for mismatches when computing alignments. [2] -g Gap penalty when computing alignments. If gap penalty is higher than 100 times mismatch penalty, it is assumed that no gaps will occur. [3] Options are processed in the order they are given and the last value for a given parameter will be used. For example giving options -454 -g 2 will use a gap penalty of 2. ------- Example ------- coral -fq reads.fastq -o corrected-reads.fastq ------------------ New in Version 1.3 ------------------ The comments in the outputted fasta/fastq file now include the comments of the original file. The consensus sequence fasta file comments are reformated. The base read of an alignment is now limited to 50 k-mers. If the read is longer, it is split into length/50 parts to be used as base reads. Some types have been changed from 32-bit integers to 64-bit integers to avoid potential overflow with very large read sets. ------------------ New in Version 1.2 ------------------ The hash table construction now adapts to the size of the data. Thus the -m option is no longer needed. The comment lines starting with a + in fastq files are now read correctly. ------------------ New in Version 1.1 ------------------ The thresholds -t and -q now specify proportions of reads or quality scores rather than absolute values as in version 1.0. Using the -r option the k-mer index can be rebuilt given number of times during a correction run. The reads are now read in two passes, first counting them and then reading them to memory. Thus the -n option is no longer needed. The defaults for k-mer length and maximum error rate have been changed.