Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

Problem parsing interval list file to GenomicsDBImport

$
0
0

Hello,

I'm trying to combine 6 GVCF files into a single VCF file using GenomicsDBImport + GenotypeGVCFs with GATK4. I'm still using GATK4.beta5, since I'm not able to find the link to GATK4.beta6.

My target.interval file looks like this:
1:8022745-8023035
1:8025283-8025585
1:8029304-8029564

If I run the following command:

gatk-launch GenomicsDBImport \
-V sample_001.gvcf.gz \
-V sample_002.gvcf.gz \
-V sample_003.gvcf.gz \
-V sample_004.gvcf.gz \
-V sample_005.gvcf.gz \
-V sample_006.gvcf.gz \
--genomicsDBWorkspace testDB \
--L 1:8022745-8023035

GATK will work and generate a database with genotypes for this specific interval.

However, if I run the following script:

            #!/bin/bash
            while read in; do \
            gatk-launch GenomicsDBImport \
            -V sample_001.gvcf.gz \
            -V sample_002.gvcf.gz \
            -V sample_003.gvcf.gz \
            -V sample_004.gvcf.gz \
            -V sample_005.gvcf.gz \
            -V sample_006.gvcf.gz \
            --genomicsDBWorkspace testDB \
            --L "$in"; done < target.interval

GATK runs, but I get the following error for every single interval and nothing is added to the database:

: Problem parsing start/end value in interval string. Value was: 8023035arse Genome Location string: 1:8022745-8023035

: Problem parsing start/end value in interval string. Value was: 8025585arse Genome Location string: 1:8025283-8025585

I understand that the software is recognizing each line of my file correctly but, for some reason, it cannot parse the start and end position of each interval.

I have already looked for this error in the forum, but I haven't found any posts that answer my question.

I also tried to use a regular bed file as input file with the following format:
1 8022745 8023035
1 8025283 8025585
1 8029304 8029564

In that case the error was the following one:

A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig '1  8022745 8023035' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?

Can anybody help me with this issue?

Thank you very much,

Here you have the entire error message for the first two intervals:

09:50:28.057 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/opt/gatk-4.beta.5/gatk-package-4.beta.5-local.jar!/com/inte                                                                     l/gkl/native/libgkl_compression.so
[December 15, 2017 9:50:27 AM PST] GenomicsDBImport  --genomicsDBWorkspace SAmerican_panel --variant 17-50618.gatk.gvcf.gz --variant 17-50619.gatk.gvcf.gz --varian                                                                     t 17-50620.gatk.gvcf.gz --variant 17-50621.gatk.gvcf.gz --variant 17-50622.gatk.gvcf.gz --variant 17-50623.gatk.gvcf.gz --variant 17-50624.gatk.gvcf.gz --variant 1                                                                     7-50625.gatk.gvcf.gz --variant 17-50626.gatk.gvcf.gz --variant 17-50627.gatk.gvcf.gz --variant 17-50628.gatk.gvcf.gz --variant 17-50629.gatk.gvcf.gz --variant 17-5                                                                     0630.gatk.gvcf.gz --variant 17-50631.gatk.gvcf.gz --variant 17-50632.gatk.gvcf.gz --variant 17-50633.gatk.gvcf.gz --variant 17-50634.gatk.gvcf.gz --variant 17-5063                                                                       --genomicsDBSegmentSize 1048576 --genomicsDBVCFBufferSize 16384 --overwriteExistingGenomicsDBWorkspace false --batchSize 0 --consolidate false --validateSampleNa                                                                     meMap false --readerThreads 1 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --interval_merging_rule ALL --readValidationStringency                                                                      SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVarian                                                                     tIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 0 --cloudIndexPref                                                                     etchBuffer 0 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_infla                                                                     ter false --gcs_max_retries 20 --disableToolDefaultReadFilters false
[December 15, 2017 9:50:27 AM PST] Executing as user@server on Linux 3.10.0-693.2.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_151-b12; Version: 4.beta                                                                     .5
09:50:28.443 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 5
09:50:28.443 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:50:28.443 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false
09:50:28.444 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:50:28.444 INFO  GenomicsDBImport - Deflater: IntelDeflater
09:50:28.444 INFO  GenomicsDBImport - Inflater: IntelInflater
09:50:28.444 INFO  GenomicsDBImport - GCS max retries/reopens: 20
09:50:28.444 INFO  GenomicsDBImport - Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree                                                                     /dr_all_nio_fixes
09:50:28.444 INFO  GenomicsDBImport - Initializing engine
09:50:38.228 INFO  GenomicsDBImport - Shutting down engine
[December 15, 2017 9:50:38 AM PST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.17 minutes.
Runtime.totalMemory()=1368915968
***********************************************************************

: Problem parsing start/end value in interval string. Value was: 8023035arse Genome Location string: 1:8022745-8023035

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--javaOptions '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
09:50:47.419 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/opt/gatk-4.beta.5/gatk-package-4.beta.5-local.jar!/com/inte                                                                     l/gkl/native/libgkl_compression.so
[December 15, 2017 9:50:47 AM PST] GenomicsDBImport  --genomicsDBWorkspace SAmerican_panel --variant 17-50618.gatk.gvcf.gz --variant 17-50619.gatk.gvcf.gz --varian                                                                     t 17-50620.gatk.gvcf.gz --variant 17-50621.gatk.gvcf.gz --variant 17-50622.gatk.gvcf.gz --variant 17-50623.gatk.gvcf.gz --variant 17-50624.gatk.gvcf.gz --variant 1                                                                     7-50625.gatk.gvcf.gz --variant 17-50626.gatk.gvcf.gz --variant 17-50627.gatk.gvcf.gz --variant 17-50628.gatk.gvcf.gz --variant 17-50629.gatk.gvcf.gz --variant 17-5                                                                     0630.gatk.gvcf.gz --variant 17-50631.gatk.gvcf.gz --variant 17-50632.gatk.gvcf.gz --variant 17-50633.gatk.gvcf.gz --variant 17-50634.gatk.gvcf.gz --variant 17-5063                                                                       --genomicsDBSegmentSize 1048576 --genomicsDBVCFBufferSize 16384 --overwriteExistingGenomicsDBWorkspace false --batchSize 0 --consolidate false --validateSampleNa                                                                     meMap false --readerThreads 1 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --interval_merging_rule ALL --readValidationStringency                                                                      SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVarian                                                                     tIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 0 --cloudIndexPref                                                                     etchBuffer 0 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_infla                                                                     ter false --gcs_max_retries 20 --disableToolDefaultReadFilters false
[December 15, 2017 9:50:47 AM PST] Executing as user@server on Linux 3.10.0-693.2.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_151-b12; Version: 4.beta                                                                     .5
09:50:47.976 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 5
09:50:47.976 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:50:47.976 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false
09:50:47.976 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:50:47.976 INFO  GenomicsDBImport - Deflater: IntelDeflater
09:50:47.976 INFO  GenomicsDBImport - Inflater: IntelInflater
09:50:47.976 INFO  GenomicsDBImport - GCS max retries/reopens: 20
09:50:47.976 INFO  GenomicsDBImport - Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree                                                                     /dr_all_nio_fixes
09:50:47.976 INFO  GenomicsDBImport - Initializing engine
09:50:53.175 INFO  GenomicsDBImport - Shutting down engine
[December 15, 2017 9:50:53 AM PST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.10 minutes.
Runtime.totalMemory()=1389887488
***********************************************************************

: Problem parsing start/end value in interval string. Value was: 8025585arse Genome Location string: 1:8025283-8025585

***********************************************************************

Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>