Skip to content

Installing TrioTrain

Warning

If you're unfamiliar with SLURM, navigating your computing cluster, or what shared software is available to you, reach out to your HPC system's Cluster Administrator for trouble-shooting these dependencies.

Prerequisites

  • Unix-like operating system (cannot run on Windows)
  • Python 3.8
  • Access to a SLURM-based High Performance Computing Cluster
    • TrioTrain uses both CPU and GPU resources
    • You must have at least 2 GPU cards on a single compute node to execute re-training effectively!

System Requirements

Warning

Deviating from these versions will likely cause errors with TrioTrain. Proceed with caution!

Warning

If you have to manually install any tools, be sure to add export PATH=/cluster/path/to/local/software/bin/:$PATH to your edited modules.sh file.

TrioTrain expects the software listed below to be pre-built, and available locally on your SLURM-based HPC cluster. The following minimum software versions are required:

Tool Version Tool Version
DeepVariant 1.4.0
gcc 10.2.0 cuda 11.1.0
curl 7.72.0 picard 2.26.10
minicoda3 4.9 bcftools 1.14
java/openjdk 17.0.3 htslib 1.14
apptainer 1.1.7-1.el7 samtools 1.14

TrioTrain assumes that the default modules.sh script works for your cluster, as this is how the pipeline finds all required software. YOU WILL NEED TO EDIT THIS FILE TO MATCH YOUR CLUSTER, or specify an alternative helper script by adding the following flag with your cluster-specific script.

Add the following flag whenever you run TrioTrain:
python3 triotrain/model_train/run_trio_train.py --modules </path/to/your/module.sh> 
Example | modules.sh
#!/usr/bin/bash
## scripts/setup/modules.sh

echo "=== scripts/setup/modules.sh start > $(date)"

echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Wiping modules... "
module purge
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Done wipe modules"

echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Loading modules... "

# Enable loading of pkgs from prior manager
module load rss/rss-2020

# Update to a newer, but still old, version of Curl
module load curl/7.72.0

# Update to a newer version of git,
# Required for Git extensions on VSCode
module load git/2.29.0

# Enable "conda activate" rather than,
# using "source activate"
module load miniconda3/4.9
export CONDA_BASE=$(conda info --base)

# System Requirement to use 'conda activate' 
source ${CONDA_BASE}/etc/profile.d/conda.sh
conda deactivate

# Modules required for re-training
module load java/openjdk/java-1.8.0-openjdk
module load singularity/singularity
module load picard/2.26.10

# Modules required for post-procesing variants
module load cuda/11.1.0
module load bcftools/1.14
module load htslib/1.14
module load samtools/1.14
module load gcc/10.2.0

echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Done Loading Modules"

echo -e "$(date '+%Y-%m-%d %H:%M:%S') INFO: Conda Base Environment:\n${CONDA_BASE}"
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Python Version:"
python3 --version
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Java Version:"
java -version
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Apptainer Version:"
apptainer --version

# Source DeepVariant version and CACHE Dir
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Adding Apptainer variables... "
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: This step is required to build DeepVariant image(s)"

if [ -z "$1" ]
then
    echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Using defaults, DeepVariant version 1.4.0"
    export BIN_VERSION_DV="1.4.0"
    export BIN_VERSION_DT="1.4.0"
else
    echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Using inputs, DeepVariant version $1"
    export BIN_VERSION_DV="$1"
    export BIN_VERSION_DT="$1"
fi

export APPTAINER_CACHEDIR="${PWD}/APPTAINER_CACHE"
export APPTAINER_TMPDIR="${PWD}/APPTAINER_TMPDIR"
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Done adding Apptainer variables"

# Confirm that it worked
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: DeepVariant Version: ${BIN_VERSION_DV}"
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Apptainer Cache: ${APPTAINER_CACHEDIR}"
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Apptainer Tmp: ${APPTAINER_TMPDIR}"

# Activating the Bash Sub-Routine to handle errors
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Loading bash helper functions... "
source scripts/setup/helper_functions.sh
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Done Loading bash helper functions"

echo "=== scripts/setup/modules.sh > end $(date)"

Install TrioTrain

Run the following at the command line:
git clone git@github.com:jkalleberg/DV-TrioTrain.git

Configure TrioTrain


Last update: March 8, 2024