NAME blasr - Map SMRT Sequences to a reference genome. SYNOPSIS blasr reads.fasta genome.fasta blasr reads.fasta genome.fasta -sa genome.fasta.sa blasr reads.bas.h5 genome.fasta [-sa genome.fasta.sa] blasr reads.bas.h5 genome.fasta -sa genome.fasta.sa -maxScore -100 -minMatch 15 ... blasr reads.bas.h5 genome.fasta -sa genome.fasta.sa -nproc 24 -out alignment.out ... DESCRIPTION blasr is a read mapping program that maps reads to positions in a genome by clustering short exact matches between the read and the genome, and scoring clusters using alignment. The matches are generated by searching all suffixes of a read against the genome using a suffix array. Global chaining methods are used to score clusters of matches. The only required inputs to blasr are a file of reads and a reference genome. It is exremely useful to have read filtering information, and mapping runtime may decrease substantially when a precomputed suffix array index on the reference sequence is specified. Although reads may be input in FASTA format, the recommended input is HDF bas.h5 and pls.h5 files because these contain qualtiy value information that is used in the alignment and produces higher quality variant detection. Read filtering information is contained in the .bas.h5 input files as well as generated by other post-processing programs with analysis of pulse files and read in from a separate .region.h5 file. The current set of filters that are applied to reads are high quality region filtering, and adapter filtering. Regions outside high-quality regions are ignored in mapping. Reads that contain regions annotated as adapter are split into non-adapter (template) regions, and mapped separately. When suffix array index of a genome is not specified, the suffix array is built before producing alignment. This may be prohibitively slow when the genome is large (e.g. Human). It is best to precompute the suffix array of a genome using the program sawriter, and then specify the suffix array on the command line using -sa genome.fa.sa. The optional parameters are roughly divided into three categories: control over anchoring, alignment scoring, and output. The default anchoring parameters are optimal for small genomes and samples with up to 5% divergence from the reference genome. The main parameter governing speed and sensitivity is the -minMatch parameter. For human genome alignments, a value of 11 or higher is recommended. Several methods may be used to speed up alignments, at the expense of possibly decreasing sensitivity. Regions that are too repetitive may be ignored during mapping by limiting the number of positions a read maps to with the -maxAnchorsPerPosition option. Values between 500 and 1000 are effective in the human genome. For small genomes such as bacterial genomes or BACs, the default parameters are sufficient for maximal sensitivity and good speed.