===== Ceres and SCINet ===== == Useful Links == * [[https://forum.scinet.usda.gov/|SCINet Forum]] (requires eauth) * [[http://axon.ars.usda.gov/Science Links/Pages/Big-Data.aspx|SCINet on axon]] (requires eauth) * [[http://www.ars.usda.gov/SCINet|Publicly accessible SCINet page]] * [[https://e.arsnet.usda.gov/sites/OCIO/scinet/default.aspx|Sharepoint SCINet page]] * [[scinet_rclone_box|Transferring files between USDA Box account and SciNet]] * Support email: scinet_vrsc@usda.gov == What is SCINet? == ARS is building a next-generation network as the backbone of SCINet. It connects six ARS locations (Ames IA, Stoneville MS, Albany CA, Beltsville MD, Clay Center NE, Fort Collins CO) and will serve as a research data conduit among agency locations. ARS is also creating a large capacity storage and HPC infrastructure deployed in Ames, IA. The HPC is a cluster of 64 computers that support a variety of scientific software and access to 1.3 petabytes of storage space. In addition, the SCINet HPC includes five high-memory nodes, each with 1.5 TB of shared memory and 60 processing cores. These resources will be integrated into Amazon Web Services to allow certain workloads to use and data to reside in the public cloud. ARS is deploying a "virtual" core group of personnel with diverse skills in scientific computing to provide support or training to ARS researchers wanting to expand their research to big data techniques. This virtual core group will serve as resource for computing tasks, or as a mentoring resource for ARS scientists to integrate cutting edge technology into their research. == msn compute nodes == The Madison location has purchased several dedicated compute nodes. We have first priority for these nodes. To be added to the group which has access to these nodes, contact [[derek.bickhart@usda.gov]] We have four nodes in the “msn” partition, each with 72 threads, 350+ Gb of ram and a 14 day timelimit. With our forthcoming purchase in 2019 that capacity will more than triple with 12 additional nodes, and one "high mem" node (72 threads, 1.5 Tb of ram). We have preferred access, which means that our location is the only one that can access these nodes for up to 14 days at a time. Other scinet users can use the nodes while they are idle and queue up jobs that are only 2 hours in duration on the “brief-low” partition, but they must wait for our jobs to finish first. == How do I get started? == * Visit the [[http://www.ars.usda.gov/SCINet|SCINet home page]] and at the bottom click on [[https://scinet.usda.gov/signup|Get an account]] * For help in setting up job scripts, you can use the [[https://e.arsnet.usda.gov/sites/OCIO/scinet/accounts/ceres_job_script_generator/Home.aspx|Ceres Job Script Generator]] on the SharePoint site * If you have a lot of data, you will need to request a project directory (//e.g.// we have one for Carrot at ''/scinet01/gov/usda/ars/scinet/project/carrot_genome'' ) by filling out [[https://e.arsnet.usda.gov/sites/OCIO/scinet/accounts/SitePages/Project_Allocation_Request.aspx|this form]] on the SharePoint site == Examples == Just a few examples of a few of the programs that can be run are Structure, RAxML, bwa, here is an example of a RAxML script #!/bin/bash #SBATCH -p medium #name of the queue you are submitting job to #SBATCH -N 1 #number of nodes in this job #SBATCH -n 40 #number of cores/tasks in this job; there are 20 physical cores on a node * 2 = 40 logical cores #SBATCH -t 7-00:00:00 #time allocated for this job days-hours:mins:seconds 168h=7d #SBATCH --mail-user=drpeaks@gmail.com #enter your email address to receive emails #SBATCH --mail-type=BEGIN,END,FAIL #will receive an email when job starts, ends or fails #SBATCH -o "%j.%N.stdout" # standard out %j adds job number to outputfile name and %N adds the no$ #SBATCH -e "%j.%N.stderr" #optional but it prints our standard error echo "+++ $0 Starting $(date)" set -o pipefail set -u # error if reference uninitialized variable shopt -s nullglob set -e # modules module load raxml/gcc/64/8.2.3 raxmlbin='/software/apps/raxml/gcc/64/8.2.3/bin/raxmlHPC-PTHREADS-AVX' # configuration randomseed=18295 ntrees=100 cpu=40 outdir="/scinet01/gov/usda/ars/scinet/project/carrot_genome/job0003" phylipfile="$outdir/Cult1.v13.PhylipTree.phy" # generated locally and previously uploaded here outroot="run4.$randomseed.$ntrees.pthreads" # cannot have path here outconsensusroot="$outroot.consensus" # RAxML parameters used: # -s sequenceFileName -n outputFileName -m substitutionModel # -J MR|MR_DROP|MRE|STRICT|STRICT_DROP|T_ Compute majority rule consensus tree with "-J MR" # -p parsimonyRandomSeed # -#|-N numberOfRuns|autoFC|autoMR|autoMRE|autoMRE_IGN # -T numberOfThreads # -x apidBootstrapRandomNumberSee # -f a rapid Bootstrap analysis and search for best­scoring ML tree in one progrram run # -f b draw bipartition information on a tree provided with t (typically the best­ known # ML tree) based on multiple trees (e.g., from a bootstrap) in a file specified by -z # -f c check if the alignment can be properly read by RAxML cd "$outdir" if [ ! -s RAxML_bootstrap."$outroot" ] || [ RAxML_bootstrap."$outroot" -ot "$phylipfile" ]; then echo "+++ Check input file \"$phylipfile\" $(date)" $raxmlbin -m GTRGAMMA -s "$phylipfile" -T 2 -n filetest -f c # output file can be deleted cat RAxML_info.filetest rm -f RAxML_info.filetest # produce $ntrees trees echo "+++ Generating $ntrees trees $(date)" $raxmlbin -x $randomseed -p $randomseed -T $cpu -N $ntrees -m GTRGAMMA -s "$phylipfile" -n "$outroot" -f a fi if [ ! -s RAxML_MajorityRuleConsensusTree."$outconsensusroot".tree ] || [ RAxML_MajorityRuleConsensusTree."$outconsensusroot".tree -ot "$phylipfile" ]; then # make a consensus tree of the $ntrees trees echo "+++ Generating consensus tree $(date)" $raxmlbin -T $cpu -f b -m GTRGAMMA -J MR -z RAxML_bootstrap."$outroot" -n "$outconsensusroot" r=$?; if [ $r != 0 ]; then echo "Error code $r line $LINENO script $0, halting"; exit $r; fi # run Holly's correction Perl command # e.g. "(43048,53002):1.0[100]" becomes "(43048,53002)100:1.0)" echo "+++ Holly's correction method $(date)" perl -p -e 's|([:]\d+[.]\d+)[[](\d+)[]]|$2$1|g' \ RAxML_MajorityRuleConsensusTree."$outconsensusroot" \ > RAxML_MajorityRuleConsensusTree."$outconsensusroot".tree r=$?; if [ $r != 0 ]; then echo "Error code $r line $LINENO script $0, halting"; exit $r; fi #rm -f RAxML_MajorityRuleConsensusTree."$outconsensusroot" fi echo "+++ $0 Done $(date) exit 0