Hello,
I'm trying to combine 6 GVCF files into a single VCF file using GenomicsDBImport + GenotypeGVCFs with GATK4. I'm still using GATK4.beta5, since I'm not able to find the link to GATK4.beta6.
My target.interval file looks like this:
1:8022745-8023035
1:8025283-8025585
1:8029304-8029564
If I run the following command:
gatk-launch GenomicsDBImport \
-V sample_001.gvcf.gz \
-V sample_002.gvcf.gz \
-V sample_003.gvcf.gz \
-V sample_004.gvcf.gz \
-V sample_005.gvcf.gz \
-V sample_006.gvcf.gz \
--genomicsDBWorkspace testDB \
--L 1:8022745-8023035
GATK will work and generate a database with genotypes for this specific interval.
However, if I run the following script:
#!/bin/bash
while read in; do \
gatk-launch GenomicsDBImport \
-V sample_001.gvcf.gz \
-V sample_002.gvcf.gz \
-V sample_003.gvcf.gz \
-V sample_004.gvcf.gz \
-V sample_005.gvcf.gz \
-V sample_006.gvcf.gz \
--genomicsDBWorkspace testDB \
--L "$in"; done < target.interval
GATK runs, but I get the following error for every single interval and nothing is added to the database:
: Problem parsing start/end value in interval string. Value was: 8023035arse Genome Location string: 1:8022745-8023035
: Problem parsing start/end value in interval string. Value was: 8025585arse Genome Location string: 1:8025283-8025585
I understand that the software is recognizing each line of my file correctly but, for some reason, it cannot parse the start and end position of each interval.
I have already looked for this error in the forum, but I haven't found any posts that answer my question.
I also tried to use a regular bed file as input file with the following format:
1 8022745 8023035
1 8025283 8025585
1 8029304 8029564
In that case the error was the following one:
A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig '1 8022745 8023035' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?
Can anybody help me with this issue?
Thank you very much,
Here you have the entire error message for the first two intervals:
09:50:28.057 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/opt/gatk-4.beta.5/gatk-package-4.beta.5-local.jar!/com/inte l/gkl/native/libgkl_compression.so
[December 15, 2017 9:50:27 AM PST] GenomicsDBImport --genomicsDBWorkspace SAmerican_panel --variant 17-50618.gatk.gvcf.gz --variant 17-50619.gatk.gvcf.gz --varian t 17-50620.gatk.gvcf.gz --variant 17-50621.gatk.gvcf.gz --variant 17-50622.gatk.gvcf.gz --variant 17-50623.gatk.gvcf.gz --variant 17-50624.gatk.gvcf.gz --variant 1 7-50625.gatk.gvcf.gz --variant 17-50626.gatk.gvcf.gz --variant 17-50627.gatk.gvcf.gz --variant 17-50628.gatk.gvcf.gz --variant 17-50629.gatk.gvcf.gz --variant 17-5 0630.gatk.gvcf.gz --variant 17-50631.gatk.gvcf.gz --variant 17-50632.gatk.gvcf.gz --variant 17-50633.gatk.gvcf.gz --variant 17-50634.gatk.gvcf.gz --variant 17-5063 --genomicsDBSegmentSize 1048576 --genomicsDBVCFBufferSize 16384 --overwriteExistingGenomicsDBWorkspace false --batchSize 0 --consolidate false --validateSampleNa meMap false --readerThreads 1 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --interval_merging_rule ALL --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVarian tIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 0 --cloudIndexPref etchBuffer 0 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_infla ter false --gcs_max_retries 20 --disableToolDefaultReadFilters false
[December 15, 2017 9:50:27 AM PST] Executing as user@server on Linux 3.10.0-693.2.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_151-b12; Version: 4.beta .5
09:50:28.443 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 5
09:50:28.443 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:50:28.443 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false
09:50:28.444 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:50:28.444 INFO GenomicsDBImport - Deflater: IntelDeflater
09:50:28.444 INFO GenomicsDBImport - Inflater: IntelInflater
09:50:28.444 INFO GenomicsDBImport - GCS max retries/reopens: 20
09:50:28.444 INFO GenomicsDBImport - Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree /dr_all_nio_fixes
09:50:28.444 INFO GenomicsDBImport - Initializing engine
09:50:38.228 INFO GenomicsDBImport - Shutting down engine
[December 15, 2017 9:50:38 AM PST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.17 minutes.
Runtime.totalMemory()=1368915968
***********************************************************************
: Problem parsing start/end value in interval string. Value was: 8023035arse Genome Location string: 1:8022745-8023035
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--javaOptions '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
09:50:47.419 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/opt/gatk-4.beta.5/gatk-package-4.beta.5-local.jar!/com/inte l/gkl/native/libgkl_compression.so
[December 15, 2017 9:50:47 AM PST] GenomicsDBImport --genomicsDBWorkspace SAmerican_panel --variant 17-50618.gatk.gvcf.gz --variant 17-50619.gatk.gvcf.gz --varian t 17-50620.gatk.gvcf.gz --variant 17-50621.gatk.gvcf.gz --variant 17-50622.gatk.gvcf.gz --variant 17-50623.gatk.gvcf.gz --variant 17-50624.gatk.gvcf.gz --variant 1 7-50625.gatk.gvcf.gz --variant 17-50626.gatk.gvcf.gz --variant 17-50627.gatk.gvcf.gz --variant 17-50628.gatk.gvcf.gz --variant 17-50629.gatk.gvcf.gz --variant 17-5 0630.gatk.gvcf.gz --variant 17-50631.gatk.gvcf.gz --variant 17-50632.gatk.gvcf.gz --variant 17-50633.gatk.gvcf.gz --variant 17-50634.gatk.gvcf.gz --variant 17-5063 --genomicsDBSegmentSize 1048576 --genomicsDBVCFBufferSize 16384 --overwriteExistingGenomicsDBWorkspace false --batchSize 0 --consolidate false --validateSampleNa meMap false --readerThreads 1 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --interval_merging_rule ALL --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVarian tIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 0 --cloudIndexPref etchBuffer 0 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_infla ter false --gcs_max_retries 20 --disableToolDefaultReadFilters false
[December 15, 2017 9:50:47 AM PST] Executing as user@server on Linux 3.10.0-693.2.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_151-b12; Version: 4.beta .5
09:50:47.976 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 5
09:50:47.976 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:50:47.976 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false
09:50:47.976 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:50:47.976 INFO GenomicsDBImport - Deflater: IntelDeflater
09:50:47.976 INFO GenomicsDBImport - Inflater: IntelInflater
09:50:47.976 INFO GenomicsDBImport - GCS max retries/reopens: 20
09:50:47.976 INFO GenomicsDBImport - Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree /dr_all_nio_fixes
09:50:47.976 INFO GenomicsDBImport - Initializing engine
09:50:53.175 INFO GenomicsDBImport - Shutting down engine
[December 15, 2017 9:50:53 AM PST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.10 minutes.
Runtime.totalMemory()=1389887488
***********************************************************************
: Problem parsing start/end value in interval string. Value was: 8025585arse Genome Location string: 1:8025283-8025585
***********************************************************************