PRICE Documentation: Sample Job

Back to PRICE Documentation main page

PRICE Sample Overview

This document will walk you through a provided PRICE sample job. It will assemble the genome of an isolate of parainfluenza 4. The data is real, but it has been filtered, truncated, and supplemented for the purpose of this demonstration.

A total of 288,456 actual Solexa paired-end reads are provided in two . A small number of "seeds" are provided from which the assembler will build the entire Parainfluenza-4 genome. The coverage of the reads to the Para-4 genome is highly uneven and very typical of metagenomic viral data we see in many other samples.

For more general information on how to use PRICE, consult the README.txt file.

Sample PRICE Command

To be used in conjunction with the included read files:
./PriceTI -fpp s_2_1_sequence.txt s_2_2_sequence.txt 300 95 -icf sangerReads.fasta 1 1 5 -nc 30 -dbmax 72 -mol 30 -tol 20 -mpi 80 -target 90 2 1 1 -o practice.fasta

(more complete and extensive explanation of commands can be found in the User Manual)

Run Notes

If your compiled PRICE is only locally executable, use "./PriceTI" instead of just "PriceTI".

The target is one contig of almost 17.4kb. If you got more contigs, try supplementing the input contigs using -picf (see below) or running the job for additional cycles.

Speed up the assembly by using the -a tag along with the number of CPU cores that your computer has. For example, for an 8-core Mac Pro with hyperthreading (2X per core): -a 16

Note that the small size of this sample job makes threading less effective than it would be for a real job, as there is a trade-off between additional set-up steps to insure thread safety versus the actual wallclock advantage gained by threading. The utility of threading increases with job size.

What to Expect

The assembly will generally result in a single contig of ~17,362nt, representing the full genome of Para-4. Try BLASTing it to GenBank to check the assembly.

Due to non-deterministic aspects of PRICE (the order in which discordant sequences are collapsed, for instance), the output may vary slightly from run to run or machine to machine.

TIME TO RUN System description
3m 24s Ubuntu Linux, Intel X7542, 2.26Ghz, running on a single core
5m 29s MacOS 10.6.8, Intel Xeon 2.26Ghz, running on a single core


Found a Bug?