GASV and GASVPro Release October 1, 2013 ================= * Corrected a bug in GASVPro-HQ.sh and GASVPro.sh so that the default behavior is to ignore translocations. * Added support for Illumina Mate Pair orientation to BAM file processing. GASV and GASVPro Release June 27, 2013 ================= GASVPro (Version 1.2.1) * Corrected a problem with GASVPro coverage model for reciprocal inversions (IR) and reciprocal translocations (TR+ and TR-). GASV and GASVPro Release October 12, 2012 ================= GASVPro (Version 1.2): Modifications to GASVPro-HQ.sh and GASVPro.sh scripts: * Added Translocation Flag scripts Now users more clearly specify if translocations should be analyzed. * Updated GASVPro.sh script to clean-up temporary files created as default behavior. * Added OUTPUT format option, users can now specify the same clusters formats as GASV (clusters, reads, intervals) (As for GASV, intervals is the default.) scripts/: * Updated GASV Pruning code (GASVPruneClusters.pl) to also handle translocations * Added convertClusters program, which carries out the final conversion of GASVPro output to appropriate GASV formats. GASV and GASVPro Release August 16, 2012 ================= * Updated documentation and work flow description of GASV software. GASVPro (Version 1.1): * Corrected a problem with updating read depth counts. GASV and GASVPro Release August 2, 2012 ================= GASVPro (Version 1.0): * Sripts for simplified execution of standard GASVPro pipelines: bin/GASVPro-HQ.sh bin/GASVPro.sh * Documentation detailing GASVPro options and parameters. BAMToGASV (Version 2.0.1): * GASVPro flag, automatically creates input parameters for GASVPro BAMToGASV_Ambig (Version 1.0): * BAM file preprocessor for "low-quality" mappings. That is, reads may have more than one reported mapping. RELEASE 2.0 ============================================ GASV: * Simplified GASV usage and execution by defaulting to clustering mode. * Default clustering now reports variants with at least 4 supporting mappings. Using "--minClusterSize 1" option when running GASV.jar will revert to previous behavior. * Changed GASV translocation reporting. * Revised/expanded GASV documentation. * Throughout the term ESP (end-sequenced pair) has been replaced by PR (paired-read) * Although still functional, we have deprecated the following modes: --nocluster, --filter BAMToGASV: * Replaced BAM Preprocessor by "BAMToGASV.jar" * More robust handling of BAM files. * If paired reads are not detected in a BAM file, a Fixmate() function is called automatically and a new BAM file is written. * Coordinates in Paired-Read files are now reported as alignment start and alignment end, rather than alignment start and (alignment end + 1) * Lmin must be non-negative and smaller than Lmax, but it is allowed to be smaller than 2*(read length). * New beta options are available, including writing low-quality pairs and concordant pairs NEW IN RELEASE 1.5.2 =================================== *Corrected typo in BAM Preprocessor & Updated Documentation. NEW IN RELEASE 1.5.1 =================================== *Corrected typo in BAM Preprocessor & Updated Documentation. NEW IN RELEASE 1.5 ===================================== * Added support for SOLiD sequencing format. * Added command-line specification of Lmin/Lmax for individual libraries. (See BAM_preprocessor.txt for more information.) * Corrected bug in BAM_preprocessor.pl NEW IN RELEASE 1.4.1 ==================================== * Added the --noreciprocal option. NEW IN RELEASE 1.4 ==================================== * Simplified BAM_preprocessor.pl execution. - USAGE: % perl BAM_preprocessor.pl * Added support for remote BAM files (http:// or ftp://) (See BAM_preprocessor.txt for more information.) * Added additional error checking to BAM Preprocessor: - Checks for read pairing information in BAM file. To fix BAM file, run samtools fixmate command. - Checks if four types of structural variants (inversion, divergent, deletion, translocation). If an SV type is not found, a warning is printed but the program continues. * Added support for alternate chromosome naming (See BAM_preprocessor.txt for more information.) * Added maximum length of mapped ends "-PROPER_LENGTH " to BAM_preprocessor.pl in computation of standard deviation to avoid biasing Lmin/Lmax by extreme outliers. (See BAM_preprocessor.txt for more information.) * Fixed two critical translocation bugs, one of which arises when references in the BAM files are described as chr3, chr4, etc. rather than just plain 3, 4 etc. The other arises when X and Y are used rather than 23 and 24. * Changed the GASV clusters type output for translocations: - Reciprocal translocations "TR" have two distinct types "TR" to represent classic reciprocal translocations (+- and -+) "TO" to represent Robertsonian-type translocations (++ and --) - Non-reciprocal translocations "T" type have four distinct types: "TNR+" to represent (+-) classic non-reciprocal translocations "TNR-" to represent (-+) classic non-reciprocal translocations "TNO+" to represent (++) Robertsonian-type translocations "TNO-" to represent (--) Robertsonian-type translocations NEW IN RELEASE 1.3.1 ==================================== * Fixed BAM_preprocessor.pl bugs which arise when running multiple instances of the BAM_preprocessor.pl concurrently and within the same directory. * Now the *.discordant file originally output by BAM_preprocessor.pl is automatically removed to conserve disk space NEW IN RELEASE 1.3 ==================================== * New --cluster and --nocluster output modes as determined by the --output [mode] option: [mode] = standard (which is the default) Cluster_ID: LeftChr: LeftBreakPoint: RightChr: RightBreakPoint: Num PES: Localization: Type: [mode] = reads Cluster_ID: LeftChr: LeftBreakPoint: RightChr: RightBreakPoint: Num PES: Localization: Type: List of PES: [mode] = regions Cluster_ID: Num PES: Localization: Type: List of PES: LeftChr: RightChr: Boundary Points: Note that the "--output regions" mode closely matches the output from release 1.2, except for the new "Type:" field, which is common to all three output modes. This new Type field indicates the type of variant predicted by the cluster, according to the following key: D = Deletion IR = Reciprocal Inversion (both ++ and --) I+ = Inverted Orientation (++ side only) I- = Inverted Orientation (-- side only) T = Translocation TR = Reciprocal Translocation V = Divergent DV = Deletion and Divergent * Improved BAM_preprocessor.pl functionality - Fixed bug with translocation processing and other misc. bugs - Changed "non-library" mode to "all" - Renamed BAM_Preprocessor.pl to BAM_preprocessor.pl - Easier installation (just run install.sh, then use) - Now allows user to specify lmin/lmax values directly - Sorting of ESP files now integrated, so no longer need to run sorting scripts separately * Improved, more detailed documentation and examples. See README.txt GASV_Quickstart_Guide.txt MANUAL.txt FAQ.txt BAM_preprocessor.txt RELEASE_NOTES.txt * New --maxPairedEndsPerWin Option Allows user to run GASV quickly by skipping extremely dense genomic regions. * New --numChrom Option Users can specify the number of chromosomes in the genome (default is 24). NEW IN RELEASE 1.2 ==================================== *Additional Speed Increases Further Optimizations allow GASV clustering to run even faster with fewer memory problems *New Simplified BAM_preprocessor interface Compute the min and max fragment sizes (Lmin and Lmax) from BAM files and generate GASV input files all with a single command. See BAM_preprocessor.txt for details. *New --maxCliqueSize Option This new option is a compromise between using --maxClusterSize to screen out extremely large clusters of ESPs and the default setting of always printing out all info for all clusters of ESPs. (See OPTIONS in MANUAL). NEW IN RELEASE 1.1 ==================================== *Speed increase Optimizations allow GASV clustering to now run significantly faster *New Options in BAM Preprocessor for Finding Lmin/Lmax Can now find Lmin/Lmax either based on percentiles or standard deviations. Also must now specify the number of reads to use in making the Lmin/Lmax determination. *--maxClusterSize Option For clustering modes, can now avoid processing excessively large clusters. This option can save time by simply skipping over nebulous, hard-to-explain clusters with hundreds or thousands of ESP's. (See OPTIONS in MANUAL) NEW IN RELEASE 1.0 ==================================== *SAM/BAM Format Compatibility: A conversion tool for SAM/BAM alignment files is provided. *Memory Efficient Mode: Ability to process larger data sets (See OPTIONS in MANUAL) *Maximal Model: Outputs candidate maximal breakpoint regions as described in our ISMB publication. (See OPTIONS in MANUAL) *Array Comparative Genomics Hybridization Data: GASV can compute breakpoint regions from both mapped paired ends sequences and array CGH data. This method can also be used to compute candidate fusion genes. (See part III in MANUAL) * Split Read Mode: For a given structural variant we identify candidate "split reads" containing the breakpoint of the variant. (See OPTIONS in MANUAL.)