User Tools

Site Tools


scinet

Ceres and SCINet

What is SCINet?

ARS is building a next-generation network as the backbone of SCINet. It connects six ARS locations (Ames IA, Stoneville MS, Albany CA, Beltsville MD, Clay Center NE, Fort Collins CO) and will serve as a research data conduit among agency locations. ARS is also creating a large capacity storage and HPC infrastructure deployed in Ames, IA. The HPC is a cluster of 64 computers that support a variety of scientific software and access to 1.3 petabytes of storage space. In addition, the SCINet HPC includes five high-memory nodes, each with 1.5 TB of shared memory and 60 processing cores. These resources will be integrated into Amazon Web Services to allow certain workloads to use and data to reside in the public cloud.

ARS is deploying a “virtual” core group of personnel with diverse skills in scientific computing to provide support or training to ARS researchers wanting to expand their research to big data techniques. This virtual core group will serve as resource for computing tasks, or as a mentoring resource for ARS scientists to integrate cutting edge technology into their research.

msn compute nodes

The Madison location has purchased several dedicated compute nodes. We have first priority for these nodes. To be added to the group which has access to these nodes, contact derek.bickhart@usda.gov

We have four nodes in the “msn” partition, each with 72 threads, 350+ Gb of ram and a 14 day timelimit. With our forthcoming purchase in 2019 that capacity will more than triple with 12 additional nodes, and one “high mem” node (72 threads, 1.5 Tb of ram).

We have preferred access, which means that our location is the only one that can access these nodes for up to 14 days at a time. Other scinet users can use the nodes while they are idle and queue up jobs that are only 2 hours in duration on the “brief-low” partition, but they must wait for our jobs to finish first.

How do I get started?
  • If you have a lot of data, you will need to request a project directory (e.g. we have one for Carrot at /scinet01/gov/usda/ars/scinet/project/carrot_genome ) by filling out this form on the SharePoint site
Examples

Just a few examples of a few of the programs that can be run are Structure, RAxML, bwa, here is an example of a RAxML script

#!/bin/bash
#SBATCH -p medium             #name of the queue you are submitting job to
#SBATCH -N 1                  #number of nodes in this job
#SBATCH -n 40                 #number of cores/tasks in this job; there are 20 physical cores on a node * 2 = 40 logical cores
#SBATCH -t 7-00:00:00         #time allocated for this job days-hours:mins:seconds 168h=7d
#SBATCH --mail-user=drpeaks@gmail.com   #enter your email address to receive emails
#SBATCH --mail-type=BEGIN,END,FAIL #will receive an email when job starts, ends or fails
#SBATCH -o "%j.%N.stdout"     # standard out %j adds job number to outputfile name and %N adds the no$
#SBATCH -e "%j.%N.stderr"     #optional but it prints our standard error



echo "+++ $0 Starting  $(date)"
set -o pipefail
set -u # error if reference uninitialized variable
shopt -s nullglob
set -e



# modules
module load raxml/gcc/64/8.2.3
raxmlbin='/software/apps/raxml/gcc/64/8.2.3/bin/raxmlHPC-PTHREADS-AVX'



# configuration
randomseed=18295
ntrees=100
cpu=40
outdir="/scinet01/gov/usda/ars/scinet/project/carrot_genome/job0003"
phylipfile="$outdir/Cult1.v13.PhylipTree.phy"  # generated locally and previously uploaded here
outroot="run4.$randomseed.$ntrees.pthreads"  # cannot have path here
outconsensusroot="$outroot.consensus"


# RAxML parameters used:
# -s sequenceFileName -n outputFileName -m substitutionModel
# -J MR|MR_DROP|MRE|STRICT|STRICT_DROP|T_<PERCENT> Compute majority rule consensus tree with "-J MR"
# -p parsimonyRandomSeed
# -#|-N numberOfRuns|autoFC|autoMR|autoMRE|autoMRE_IGN
# -T numberOfThreads
# -x apidBootstrapRandomNumberSee
# -f a rapid Bootstrap analysis and search for best­scoring ML tree in one progrram run
# -f b draw bipartition information on a tree provided with t (typically the best­ known
#      ML tree) based on multiple trees (e.g., from a bootstrap) in a file specified by -z
# -f c check if the alignment can be properly read by RAxML



cd "$outdir"
if [ ! -s RAxML_bootstrap."$outroot" ] || [ RAxML_bootstrap."$outroot" -ot "$phylipfile" ]; then

  echo "+++ Check input file \"$phylipfile\"  $(date)"
  $raxmlbin -m GTRGAMMA -s "$phylipfile" -T 2 -n filetest -f c
  # output file can be deleted
  cat RAxML_info.filetest
  rm -f RAxML_info.filetest

  # produce $ntrees trees
  echo "+++ Generating $ntrees trees  $(date)"
  $raxmlbin -x $randomseed -p $randomseed -T $cpu -N $ntrees -m GTRGAMMA -s "$phylipfile" -n "$outroot" -f a

fi



if [ ! -s RAxML_MajorityRuleConsensusTree."$outconsensusroot".tree ] || [ RAxML_MajorityRuleConsensusTree."$outconsensusroot".tree -ot "$phylipfile" ]; then

  # make a consensus tree of the $ntrees trees
  echo "+++ Generating consensus tree  $(date)"
  $raxmlbin -T $cpu -f b -m GTRGAMMA -J MR -z RAxML_bootstrap."$outroot" -n "$outconsensusroot"
  r=$?; if [ $r != 0 ]; then echo "Error code $r line $LINENO script $0, halting"; exit $r; fi

  # run Holly's correction Perl command
  # e.g. "(43048,53002):1.0[100]" becomes "(43048,53002)100:1.0)"
  echo "+++ Holly's correction method  $(date)"
  perl -p -e 's|([:]\d+[.]\d+)[[](\d+)[]]|$2$1|g' \
    RAxML_MajorityRuleConsensusTree."$outconsensusroot" \
    > RAxML_MajorityRuleConsensusTree."$outconsensusroot".tree
  r=$?; if [ $r != 0 ]; then echo "Error code $r line $LINENO script $0, halting"; exit $r; fi

  #rm -f RAxML_MajorityRuleConsensusTree."$outconsensusroot"

fi



echo "+++ $0 Done  $(date)
exit 0
scinet.txt · Last modified: 2021/12/09 14:15 by dsenalik

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki