Dear GATK-team
I recently started my PhD and I'm working with large Illumina datasets (250-300mio Hiseq 150bp paired end reads) of pooled samples (10-12 genomes/pool). After the alignment against a reference genome, InDel realignment and marking of duplicates, I started the variant calling with the Unified Genotyper (command down below). The proposed runtime is 6+ weeks per pool and after talking to the senior bioinformatics scientist of my working group, she said that this is an unusually long runtime and she never had such a runtime even with similar projects of size.
Now to my question, is this runtime due to my setting of the UG, the size of my pools, expacted due to UG or did I do a crucial mistake?
Some technical properties:
reference genome
ARS1 is the newest and "best" goat reference genome, 29 chromosomes and 29000 unplaced scaffolds, 2.9Gb lenght
working environment
I am working on a HPC cluster with SGE as a batchsystem. Depending on the node, ~252Gb of RAM
command
java -Djava.io.tmpdir=tmp -jar GenomeAnalysisTK/3.7/bin/GenomeAnalysisTK.jar -T UnifiedGenotyper -nt 8 -nct 4 -glm SNP -stand_call_conf 20 -ploidy 24 -out_mode EMIT_VARIANTS_ONLY -R ASR1.fa -I INPUTFILE -o OUTPUTFILE >>LOGFILE 2>&1
Hopefully someone can help me.