Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

GATK 3.7 and GATK 4 beta2

$
0
0

Dear team,

I am using GATK 4 Beta2 for testing HaplotypeCaller for our NGS workflow.

The command which I used is:

time -p /gpfs/software/genomics/GATK/4b.2/gatk/gatk-launch HaplotypeCaller \
--reference /gpfs/data_jrnas1/ref_data/Hsapiens/hs37d5/hs37d5.fa \
--input NA12892.recal.bam \
--dbsnp /gpfs/data_jrnas1/ref_data/Hsapiens/GRCh37/variation/dbsnp_138.vcf.gz \
--emitRefConfidence GVCF \
--readValidationStringency LENIENT \
--nativePairHmmThreads 32 \
--createOutputVariantIndex true \
--output NA12892.raw.snps.indels.g.vcf

This execution time for GATK 4 Beta2 is: 51 Hours, 32 min

Alternatively, I was running the same sample (NA12892) using GATK 3.7 using the following command:

_time -p java -XX:+UseParallelGC -XX:ParallelGCThreads=32 -Xmx128g \
-jar /gpfs/software/genomics/GATK/3.7/base/GenomeAnalysisTK.jar -T HaplotypeCaller \
-nct 8 -pairHMM VECTOR_LOGLESS_CACHING \
-R /gpfs/data_jrnas1/ref_data/Hsapiens/hs37d5/hs37d5.fa \
-I NA12892.realigned.recal.bam -\
-emitRefConfidence GVCF \
--variant_index_type LINEAR \
--variant_index_parameter 128000 \
--dbsnp /gpfs/data_jrnas1/ref_data/Hsapiens/GRCh37/variation/dbsnp_138.vcf.gz \
-o NA12892.raw.snps.indels.g.vcf _

This execution time for GATK 3.7 is: 18 Hours, 12 min

I don't know, how to use multithreads (e.g. -nct) for GATK 4 version to reduce the execution time on the single node. Because, we have 32 cores per node with 512GB memory available for benchmarking. To parallelize the GATK 4 workload, I used the Spark version also.

I used GATK 4 Beta2 Spark job on the cluster of 32 nodes (32 nodes x 32 cores, totaling 1024 cores). The execution time is almost same as GATK 4 Beta2 ( 50 Hours, 21 min).

Please help me, how to reduce the execution time for GATK 4 Beta2 HaplotypeCaller?

Please see this below Spark logs:

  • /gpfs/software/spark/spark-2.1.0-bin-hadoop2.7//bin/spark-submit --master spark://nsnode11:6311 --driver-java-options -Dsamjdk.use_async_io_read_samtools=false,-Dsamjdk.use_async_io_write_samtools=true,-Dsamjdk.use_async_io_write_tribble=false,-Dsamjdk.compression_level=1 --conf spark.io.compression.codec=snappy --conf spark.yarn.executor.memoryOverhead=6000 --conf spark.kryoserializer.buffer.max=512m --conf spark.driver.userClassPathFirst=true --conf spark.driver.maxResultSize=0 --conf spark.executor.cores=1024 --conf spark.reducer.maxSizeInFlight=100m --conf spark.shuffle.file.buffer=512k --conf spark.akka.frameSize=512 --conf spark.akka.threads=10 --conf spark.executor.memory=50g --conf spark.driver.memory=150g --conf spark.local.dir=/gpfs/projects/NAGA/naga/NGS/pipeline/GATK_Best_Practices/GATK4b2Spark/1024cores/tmp --class org.broadinstitute.hellbender.Main /gpfs/software/genomics/GATK/4b.2/gatk/build/libs/hellbender-spark.jar HaplotypeCaller --reference /gpfs/data_jrnas1/ref_data/Hsapiens/hs37d5/hs37d5.fa --input /gpfs/projects/NAGA/naga/NGS/pipeline/GATK_Best_Practices/GATK4b2/bam//NA12892.recal.bam --dbsnp /gpfs/projects/NAGA/naga/SparkTest/SPARKCALLER/REF/dbsnp_138.vcf --emitRefConfidence GVCF --readValidationStringency LENIENT --nativePairHmmThreads 1024 --createOutputVariantIndex true --output NA12892.raw.snps.indels.g.vcf
    [August 9, 2017 10:13:02 AM AST] HaplotypeCaller --nativePairHmmThreads 1024 --dbsnp /gpfs/projects/NAGA/naga/SparkTest/SPARKCALLER/REF/dbsnp_138.vcf --emitRefConfidence GVCF --output NA12892.raw.snps.indels.g.vcf --input /gpfs/projects/NAGA/naga/NGS/pipeline/GATK_Best_Practices/GATK4b2/bam//NA12892.recal.bam --readValidationStringency LENIENT --reference /gpfs/data_jrnas1/ref_data/Hsapiens/hs37d5/hs37d5.fa --createOutputVariantIndex true --group StandardAnnotation --group StandardHCAnnotation --GVCFGQBands 1 --GVCFGQBands 2 --GVCFGQBands 3 --GVCFGQBands 4 --GVCFGQBands 5 --GVCFGQBands 6 --GVCFGQBands 7 --GVCFGQBands 8 --GVCFGQBands 9 --GVCFGQBands 10 --GVCFGQBands 11 --GVCFGQBands 12 --GVCFGQBands 13 --GVCFGQBands 14 --GVCFGQBands 15 --GVCFGQBands 16 --GVCFGQBands 17 --GVCFGQBands 18 --GVCFGQBands 19 --GVCFGQBands 20 --GVCFGQBands 21 --GVCFGQBands 22 --GVCFGQBands 23 --GVCFGQBands 24 --GVCFGQBands 25 --GVCFGQBands 26 --GVCFGQBands 27 --GVCFGQBands 28 --GVCFGQBands 29 --GVCFGQBands 30 --GVCFGQBands 31 --GVCFGQBands 32 --GVCFGQBands 33 --GVCFGQBands 34 --GVCFGQBands 35 --GVCFGQBands 36 --GVCFGQBands 37 --GVCFGQBands 38 --GVCFGQBands 39 --GVCFGQBands 40 --GVCFGQBands 41 --GVCFGQBands 42 --GVCFGQBands 43 --GVCFGQBands 44 --GVCFGQBands 45 --GVCFGQBands 46 --GVCFGQBands 47 --GVCFGQBands 48 --GVCFGQBands 49 --GVCFGQBands 50 --GVCFGQBands 51 --GVCFGQBands 52 --GVCFGQBands 53 --GVCFGQBands 54 --GVCFGQBands 55 --GVCFGQBands 56 --GVCFGQBands 57 --GVCFGQBands 58 --GVCFGQBands 59 --GVCFGQBands 60 --GVCFGQBands 70 --GVCFGQBands 80 --GVCFGQBands 90 --GVCFGQBands 99 --indelSizeToEliminateInRefModel 10 --useAllelesTrigger false --dontTrimActiveRegions false --maxDiscARExtension 25 --maxGGAARExtension 300 --paddingAroundIndels 150 --paddingAroundSNPs 20 --kmerSize 10 --kmerSize 25 --dontIncreaseKmerSizesForCycles false --allowNonUniqueKmersInRef false --numPruningSamples 1 --recoverDanglingHeads false --doNotRecoverDanglingBranches false --minDanglingBranchLength 4 --consensus false --maxNumHaplotypesInPopulation 128 --errorCorrectKmers false --minPruning 2 --debugGraphTransformations false --kmerLengthForReadErrorCorrection 25 --minObservationsForKmerToBeSolid 20 --likelihoodCalculationEngine PairHMM --base_quality_score_threshold 18 --gcpHMM 10 --pair_hmm_implementation FASTEST_AVAILABLE --pcr_indel_model CONSERVATIVE --phredScaledGlobalReadMismappingRate 45 --useDoublePrecision false --debug false --useFilteredReadsForAnnotations false --bamWriterType CALLED_HAPLOTYPES --disableOptimizations false --justDetermineActiveRegions false --dontGenotype false --dontUseSoftClippedBases false --captureAssemblyFailureBAM false --errorCorrectReads false --doNotRunPhysicalPhasing false --min_base_quality_score 10 --useNewAFCalculator false --annotateNDA false --heterozygosity 0.001 --indel_heterozygosity 1.25E-4 --heterozygosity_stdev 0.01 --standard_min_confidence_threshold_for_calling 10.0 --max_alternate_alleles 6 --max_genotype_count 1024 --sample_ploidy 2 --genotyping_mode DISCOVERY --contamination_fraction_to_filter 0.0 --output_mode EMIT_VARIANTS_ONLY --allSitePLs false --readShardSize 5000 --readShardPadding 100 --minAssemblyRegionSize 50 --maxAssemblyRegionSize 300 --assemblyRegionPadding 100 --maxReadsPerAlignmentStart 50 --activeProbabilityThreshold 0.002 --maxProbPropagationDistance 50 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 40 --cloudIndexPrefetchBuffer -1 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --disableToolDefaultReadFilters false --minimumMappingQuality 20
    [August 9, 2017 10:13:02 AM AST] Executing as nkathiresan@nsnode11 on Linux 3.10.0-229.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Version: 4.beta.2-14-g4229219-SNAPSHOT
    [INFO] Available threads: 32
    [INFO] Requested threads: 1024
    [WARNING] Using 32 available threads, but 1024 were requested
    log4j:WARN No appenders could be found for logger (org.broadinstitute.hellbender.utils.MathUtils$Log10Cache).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    **[August 11, 2017 12:34:22 PM AST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. **Elapsed time: 3,021.34 minutes.****
    Runtime.totalMemory()=57773916160

  • /gpfs/software/spark/spark-2.1.0-bin-hadoop2.7//sbin/stop-master.sh

Thanks a lot,
With Regards,
Naga


Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>