Navigate to Iterative Trainingset Construction. Training AUGUSTUS.

Tutorial on Gene Prediction with AUGUSTUS

UCSC, June 24th, 2015, Mario Stanke

If you want to follow tomorrow in real-time, download the data and install the software. In this lab session we practice the most common steps when predicting the protein-coding genes in a eukaryotic genome with AUGUSTUS. We will assume the case of a "new" genome, for which AUGUSTUS has not been trained before, but will use a well-studied species as example because example data is readily available and visualization is easier.

Styles

Assignments are in this color. The lazy ones may go through very fast through this tutorial by just reading these assignments and cutting and pasting the commands that follow them (more or less).

Results are in this color.

[+] Details are hidden...

Example Input Data

All example files are in the data directory. I recommend you work directly in this directory.

Drosophila melanogaster

For Cheaters: Result Files

You can use the files in the results directory to catch on if you are behind or to compare your results.

Software

In order to run these examples, you will need to have installed below software. As all important results are in the results folder, you can skip any step/program.

Exercise 1: Compile a Training Set

There are several typical options for creating a training set to estimate the parameters of gene finders. We will here go through option 6. We assume that we have RNA-Seq data only and no substantial homology data. We will reuse an existing parameter set for AUGUSTUS.
  1. Follow the tutorial on "Iteratative Training Set Construction" and create a training set genes.gb.
  2. Partition genes.gb into a training set and a holdout test setas described in 1.2 Split gene structure set....

Exercise 2: Train the Coding Regions of AUGUSTUS

Let's name our species "bug". Pretending that there was not already a parameters set of AUGUSTUS for Drosophila (named "fly"), we will estimate the parameters from the training set.
  1. Create a meta parameters file for bug as described in 2. CREATE A META PARAMETERS FILE...
  2. Estimate the parameters using your training set as described in 3. MAKE AN INITIAL TRAINING

For further tutorial parts on prediction, hint preparation and homology-based training set construction and prediction, see lab session tutorial.