Using Mendenian Inheritance Expectations to Assess Models
If you have trio-binned test genomes, TrioTrain can help calculate Mendelian Inheritance Error rate using rtg-tools mendelian
. However, you must create a Sequence Data File (SDF) for each reference genome in the same directory as the reference genome in a sub-directory called rtg_tools/
. Additional details about rtg-tools
can be found on GitHub, or by reviewing the PDF documentation here.
Create a Reference Sequence Data File
Warning
This step is specific to the Human reference genome GRCh38. Cattle-specific input files are packaged with TrioTrain. If you are working with a new species, you will need to create this file for your reference genome.
After completing the tutorial walk-through, create the Human reference SDF by running the following at the command line:
source ./scripts/start_conda.sh # Ensure the previously built conda env is active
bash scripts/setup/setup_rtg_tools.sh
For other species, use the following template:
Example | Creating the SDF
#!/bin/bash
# scripts/setup/build_rtg_tools.sh
echo -e "=== scripts/setup/build_rtg_tools.sh > start $(date)"
##======= Create RTG-TOOLS SDF ======================================##
# required for using rtg-tools 'mendelian'
if [ ! -f ./triotrain/variant_calling/data/GIAB/reference/rtg_tools/reference.txt ]; then
rtg format -o ./triotrain/variant_calling/data/GIAB/reference/rtg_tools/ ./triotrain/variant_calling/data/GIAB/reference/GRCh38_no_alt_analysis_set.fasta
else
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: RTG-TOOLS SDF already exists... SKIPPING AHEAD"
fi
echo -e "=== scripts/setup/build_rtg_tools.sh> end $(date)"