NAME
         blasr - Map SMRT Sequences to a reference genome.

SYNOPSIS
         blasr reads.fasta genome.fasta 

         blasr reads.fasta genome.fasta -sa genome.fasta.sa

         blasr reads.bas.h5 genome.fasta [-sa genome.fasta.sa] 

         blasr reads.bas.h5 genome.fasta -sa genome.fasta.sa -maxScore -100 -minMatch 15 ... 

         blasr reads.bas.h5 genome.fasta -sa genome.fasta.sa -nproc 24 -out alignment.out ... 

DESCRIPTION 
  blasr is a read mapping program that maps reads to positions 
  in a genome by clustering short exact matches between the read and
  the genome, and scoring clusters using alignment. The matches are
  generated by searching all suffixes of a read against the genome
  using a suffix array. Global chaining methods are used to score 
  clusters of matches.

  The only required inputs to blasr are a file of reads and a
  reference genome.  It is exremely useful to have read filtering
  information, and mapping runtime may decrease substantially when a
  precomputed suffix array index on the reference sequence is
  specified.
  
  Although reads may be input in FASTA format, the recommended input is HDF
  bas.h5 and pls.h5 files because these contain qualtiy value
  information that is used in the alignment and produces higher quality
  variant detection.  
  
  Read filtering information is contained in the .bas.h5 input files as
  well as generated by other post-processing programs with analysis of
  pulse files and read in from a separate .region.h5 file.  The current
  set of filters that are applied to reads are high quality region
  filtering, and adapter filtering.  Regions outside high-quality
  regions are ignored in mapping.  Reads that contain regions annotated
  as adapter are split into non-adapter (template) regions, and mapped
  separately.
  
  When suffix array index of a genome is not specified, the suffix array is
  built before producing alignment.   This may be prohibitively slow
  when the genome is large (e.g. Human).  It is best to precompute the
  suffix array of a genome using the program sawriter, and then specify
  the suffix array on the command line using -sa genome.fa.sa.
  
  The optional parameters are roughly divided into three categories:
  control over anchoring, alignment scoring, and output. 
  
  The default anchoring parameters are optimal for small genomes and
  samples with up to 5% divergence from the reference genome.  The main
  parameter governing speed and sensitivity is the -minMatch parameter.
  For human genome alignments, a value of 11 or higher is recommended.  
  Several methods may be used to speed up alignments, at the expense of
  possibly decreasing sensitivity.  
  
  Regions that are too repetitive may be ignored during mapping by
  limiting the number of positions a read maps to with the
  -maxAnchorsPerPosition option.  Values between 500 and 1000 are effective
  in the human genome.
  
  For small genomes such as bacterial genomes or BACs, the default parameters 
  are sufficient for maximal sensitivity and good speed.