VCF file is malformed at approximately line number - GATK ASEReadCounter

November 22, 2017, 12:19 am

≪ Previous: --sparkMaster yarn failed: org.apache.spark.SparkException: Could not parse Master URL: 'yarn'

Dear GATK Team,

While running GATK ASEReadCounter using 3.4 and 3.7, i am getting errors related to known sites vcf file.

As per article I have done alignment and processed bam file.

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0762-6

Here is my command

java -jar GATK/3.4/GenomeAnalysisTK.jar \
-T ASEReadCounter \
-I ERR188021.rg.md.bam \
-R hs37d5.fa \
-sites ALL.phase1_release_v3.20101123.snps_indels_sv.sites.gdid.gdannot.v2.vcf.gz \
-o ERR188021.ASEReadCounter_results_ver2.csv \
-U ALLOW_N_CIGAR_READS

I have downloaded geuvadis genotype data from geuvadis browser and indexed the ALL sites file.

Error:

ERROR MESSAGE: The provided VCF file is malformed at approximately line number 512: The VCF specification does not allow for whitespace in the INFO field. Offending field value was "AA=.;AC=51;AF=0.02;AFR_AF=0.02;ALLELE=A;AMR_AF=0.02;AN=2184;AVGPOST=0.9975;DAF_GLOBAL=.;ERATE=0.0004;EUR_AF=0.04;GENE_TRCOUNT_AFFECTED=1;GENE_TRCOUNT_TOTAL=1;GERP=.;LDAF=0.0238;RSQ=0.9610;SEVERE_GENE=ENSG00000197049;SEVERE_IMPACT=NON_SYNONYMOUS_CODON;SNPSOURCE=LOWCOV;THETA=0.0007;TR_AFFECTED=FULL;VT=SNP;ANNOTATION_CLASS=NON_SYNONYMOUS_CODON,ACTIVE_CHROM,NC_TRANSCRIPT_VARIANT&INTRON_VARIANT;A_A_CHANGE=F/I,.,.;A_A_LENGTH=169,.,.;A_A_POS=118,.,.;CELL=.,GM12878,.;CHROM_STATE=.,11,.;EXON_NUMBER=1/1,.,.;GENE_ID=ENSG00000197049,.,ENSG00000237491;GENE_NAME=AL669831.1,.,RP11-206L10.6;HGVS=c.352N>A,.,n.37+7285N>A;INTRON_NUMBER=.,.,1/2;POLYPHEN=probably damaging:0.982,.,-:-;SIFT=-:-,.,-:-;TR_BIOTYPE=PROTEIN_CODING,.,PROCESSED_TRANSCRIPT;TR_ID=ENST00000358533,.,ENST00000429505;TR_LENGTH=1194,.,441;TR_POS=438,.,.;TR_STRAND=1,.,1", for input source: ALL.phase1_release_v3.20101123.snps_indels_sv.sites.gdid.gdannot.v2.vcf.gz

I tried to remove spaces in INFO field of vcf and ran again the same with no success.

Could you please help me to resolve this issue.

Thanks in Advance
Fazulur Rehaman

↧

MergeBamAlignment Failure

November 22, 2017, 9:40 am

≫ Next: MuTect2 ->Reference name for '1283' not found in sequence dictionary

≪ Previous: VCF file is malformed at approximately line number - GATK ASEReadCounter

Hello,

I'm trying to use Picard 2.9.0 MergeBamAlignment, but get this error:

Command: $ java -jar /data/murphy/shared_bins/picard-tools-2.9.0/picard.jar MergeBamAlignment REFERENCE_SEQUENCE=/data/murphy/home/skim/Olga/Vera/mm10/mm10.fasta UNMAPPED_BAM=/data/murphy/home/skim/Olga/Vera/unaligned_mc_tagged_polyA_filtered_P5.bam ALIGNED_BAM=/data/murphy/home/skim/Olga/Vera/Aligned.out_P5.sorted.bam OUTPUT=merged_P5.bam INCLUDE_SECONDARY_ALIGNMENTS=false PAIRED_RUN=false
Picked up JAVA_TOOL_OPTIONS: -Xmx16g -Xss2560k
[Wed Nov 22 17:52:01 CET 2017] picard.sam.MergeBamAlignment UNMAPPED_BAM=/data/murphy/home/skim/Olga/Vera/unaligned_mc_tagged_polyA_filtered_P5.bam ALIGNED_BAM=[/data/murphy/home/skim/Olga/Vera/Aligned.out_P5.sorted.bam] OUTPUT=merged_P5.bam REFERENCE_SEQUENCE=/data/murphy/home/skim/Olga/Vera/mm10/mm10.fasta PAIRED_RUN=false INCLUDE_SECONDARY_ALIGNMENTS=false CLIP_ADAPTERS=true IS_BISULFITE_SEQUENCE=false ALIGNED_READS_ONLY=false MAX_INSERTIONS_OR_DELETIONS=1 ATTRIBUTES_TO_REVERSE=[OQ, U2] ATTRIBUTES_TO_REVERSE_COMPLEMENT=[E2, SQ] READ1_TRIM=0 READ2_TRIM=0 ALIGNER_PROPER_PAIR_FLAGS=false SORT_ORDER=coordinate PRIMARY_ALIGNMENT_STRATEGY=BestMapq CLIP_OVERLAPPING_READS=true ADD_MATE_CIGAR=true UNMAP_CONTAMINANT_READS=false MIN_UNCLIPPED_BASES=32 MATCHING_DICTIONARY_TAGS=[M5, LN] UNMAPPED_READ_STRATEGY=DO_NOT_CHANGE VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Wed Nov 22 17:52:01 CET 2017] Executing as skim@sl-rajew-p-cs1 on Linux 3.10.0-514.21.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_141-b16; Picard version: 2.9.0-1-gf5b9f50-SNAPSHOT
INFO 2017-11-22 17:52:01 SamAlignmentMerger Processing SAM file(s): [/data/murphy/home/skim/Olga/Vera/Aligned.out_P5.sorted.bam]
[Wed Nov 22 17:52:01 CET 2017] picard.sam.MergeBamAlignment done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: Do not use this function to merge dictionaries with different sequences in them. Sequences must be in the same order as well. Found [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr1_GL456210_random, chr1_GL456211_random, chr1_GL456212_random, chr1_GL456213_random, chr1_GL456221_random, chr2, chr3, chr4, chr4_GL456216_random, chr4_JH584292_random, chr4_GL456350_random, chr4_JH584293_random, chr4_JH584294_random, chr4_JH584295_random, chr5, chr5_JH584296_random, chr5_JH584297_random, chr5_JH584298_random, chr5_GL456354_random, chr5_JH584299_random, chr6, chr7, chr7_GL456219_random, chr8, chr9, chrM, chrX, chrX_GL456233_random, chrY, chrY_JH584300_random, chrY_JH584301_random, chrY_JH584302_random, chrY_JH584303_random, chrUn_GL456239, chrUn_GL456367, chrUn_GL456378, chrUn_GL456381, chrUn_GL456382, chrUn_GL456383, chrUn_GL456385, chrUn_GL456390, chrUn_GL456392, chrUn_GL456393, chrUn_GL456394, chrUn_GL456359, chrUn_GL456360, chrUn_GL456396, chrUn_GL456372, chrUn_GL456387, chrUn_GL456389, chrUn_GL456370, chrUn_GL456379, chrUn_GL456366, chrUn_GL456368, chrUn_JH584304] and [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1, 1_GL456210_random, 1_GL456211_random, 1_GL456212_random, 1_GL456213_random, 1_GL456221_random, 2, 3, 4, 4_GL456216_random, 4_GL456350_random, 4_JH584292_random, 4_JH584293_random, 4_JH584294_random, 4_JH584295_random, 5, 5_GL456354_random, 5_JH584296_random, 5_JH584297_random, 5_JH584298_random, 5_JH584299_random, 6, 7, 7_GL456219_random, 8, 9, MT, Un_GL456239, Un_GL456359, Un_GL456360, Un_GL456366, Un_GL456367, Un_GL456368, Un_GL456370, Un_GL456372, Un_GL456378, Un_GL456379, Un_GL456381, Un_GL456382, Un_GL456383, Un_GL456385, Un_GL456387, Un_GL456389, Un_GL456390, Un_GL456392, Un_GL456393, Un_GL456394, Un_GL456396, Un_JH584304, X, X_GL456233_random, Y, Y_JH584300_random, Y_JH584301_random, Y_JH584302_random, Y_JH584303_random].
at htsjdk.samtools.SAMSequenceDictionary.mergeDictionaries(SAMSequenceDictionary.java:305)
at picard.sam.SamAlignmentMerger.getDictionaryForMergedBam(SamAlignmentMerger.java:197)
at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:346)
at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:181)
at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:282)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)

And the header for both BAM files and fasta dict is below:

samtools view -H unaligned_mc_tagged_polyA_filtered_P5.bam
@HD VN:1.5 SO:queryname
@RG ID:A SM:P5

samtools view -H Aligned.out_P5.sorted.bam
@HD VN:1.5 SO:queryname
@SQ SN:chr1 LN:195471971
@SQ SN:chr10 LN:130694993
@SQ SN:chr11 LN:122082543
@SQ SN:chr12 LN:120129022
@SQ SN:chr13 LN:120421639
@SQ SN:chr14 LN:124902244
@SQ SN:chr15 LN:104043685
@SQ SN:chr16 LN:98207768
@SQ SN:chr17 LN:94987271
@SQ SN:chr18 LN:90702639
@SQ SN:chr19 LN:61431566
@SQ SN:chr1_GL456210_random LN:169725
@SQ SN:chr1_GL456211_random LN:241735
@SQ SN:chr1_GL456212_random LN:153618
@SQ SN:chr1_GL456213_random LN:39340
@SQ SN:chr1_GL456221_random LN:206961
@SQ SN:chr2 LN:182113224
@SQ SN:chr3 LN:160039680
@SQ SN:chr4 LN:156508116
@SQ SN:chr4_GL456216_random LN:66673
@SQ SN:chr4_JH584292_random LN:14945
@SQ SN:chr4_GL456350_random LN:227966
@SQ SN:chr4_JH584293_random LN:207968
@SQ SN:chr4_JH584294_random LN:191905
@SQ SN:chr4_JH584295_random LN:1976
@SQ SN:chr5 LN:151834684
@SQ SN:chr5_JH584296_random LN:199368
@SQ SN:chr5_JH584297_random LN:205776
@SQ SN:chr5_JH584298_random LN:184189
@SQ SN:chr5_GL456354_random LN:195993
@SQ SN:chr5_JH584299_random LN:953012
@SQ SN:chr6 LN:149736546
@SQ SN:chr7 LN:145441459
@SQ SN:chr7_GL456219_random LN:175968
@SQ SN:chr8 LN:129401213
@SQ SN:chr9 LN:124595110
@SQ SN:chrM LN:16299
@SQ SN:chrX LN:171031299
@SQ SN:chrX_GL456233_random LN:336933
@SQ SN:chrY LN:91744698
@SQ SN:chrY_JH584300_random LN:182347
@SQ SN:chrY_JH584301_random LN:259875
@SQ SN:chrY_JH584302_random LN:155838
@SQ SN:chrY_JH584303_random LN:158099
@SQ SN:chrUn_GL456239 LN:40056
@SQ SN:chrUn_GL456367 LN:42057
@SQ SN:chrUn_GL456378 LN:31602
@SQ SN:chrUn_GL456381 LN:25871
@SQ SN:chrUn_GL456382 LN:23158
@SQ SN:chrUn_GL456383 LN:38659
@SQ SN:chrUn_GL456385 LN:35240
@SQ SN:chrUn_GL456390 LN:24668
@SQ SN:chrUn_GL456392 LN:23629
@SQ SN:chrUn_GL456393 LN:55711
@SQ SN:chrUn_GL456394 LN:24323
@SQ SN:chrUn_GL456359 LN:22974
@SQ SN:chrUn_GL456360 LN:31704
@SQ SN:chrUn_GL456396 LN:21240
@SQ SN:chrUn_GL456372 LN:28664
@SQ SN:chrUn_GL456387 LN:24685
@SQ SN:chrUn_GL456389 LN:28772
@SQ SN:chrUn_GL456370 LN:26764
@SQ SN:chrUn_GL456379 LN:72385
@SQ SN:chrUn_GL456366 LN:47073
@SQ SN:chrUn_GL456368 LN:20208
@SQ SN:chrUn_JH584304 LN:114452
@PG ID:STAR PN:STAR VN:STAR_2.5.3a CL:STAR --runThreadN 12 --genomeDir star_Index --readFilesIn /data/murphy/home/skim/Olga/Vera/unaligned_mc_tagged_polyA_filtered_P5.fastq --sjdbGTFfile mm10_transcriptome/gencode.vM15.annotation.gtf --sjdbOverhang 74
@CO user command line: STAR --runThreadN 12 --genomeDir star_Index --sjdbGTFfile mm10_transcriptome/gencode.vM15.annotation.gtf --sjdbOverhang 74 --readFilesIn /data/murphy/home/skim/Olga/Vera/unaligned_mc_tagged_polyA_filtered_P5.fastq

vi mm10.dict
@HD VN:1.4 SO:unsorted
@SQ SN:10 LN:130694993 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:7831ecda5dd6bcf838e2452ea0139eac
@SQ SN:11 LN:122082543 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:e168c7a3194813f597181f26bb1bd13f
@SQ SN:12 LN:120129022 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:671f85bb54a6e097d631e2e2dd813ad4
@SQ SN:13 LN:120421639 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:7f9b9fa3fbd9a38634107dfdc7fd8dc8
@SQ SN:14 LN:124902244 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:bf4e1efa25a8fd23b41c91f9bcb86388
@SQ SN:15 LN:104043685 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:106358dace00e5825ae337c1f1821b5e
@SQ SN:16 LN:98207768 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:5482110a6cedd3558272325eee4c5a17
@SQ SN:17 LN:94987271 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:0d21e8edbfcd8410523b2b94e6dae892
@SQ SN:18 LN:90702639 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:46fda2f7e6dbf91bff91d6703e004afb
@SQ SN:19 LN:61431566 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:7d363594531514ce41dfacfd97bc995d
@SQ SN:1 LN:195471971 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:c4ec915e7348d42648eefc1534b71c99
@SQ SN:1_GL456210_random LN:169725 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:0cc560d98f9f22f4385397db82e1c108
@SQ SN:1_GL456211_random LN:241735 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:36e85680c669756c9a1554cf31c9de03
@SQ SN:1_GL456212_random LN:153618 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:4c4dc3bfe987e3bc4ef4756bef269373
@SQ SN:1_GL456213_random LN:39340 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:d4cb9051fe171205dd39980e110bf63e
@SQ SN:1_GL456221_random LN:206961 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:e21da65d7276b256b8edf92660a928b0
@SQ SN:2 LN:182113224 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:fe020a692e23f8468b376e14e304a10f
@SQ SN:3 LN:160039680 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:50f9385167e70825931231ddf1181b80
@SQ SN:4 LN:156508116 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:e7bdfb3ce7f54d2720b0718ed2ea063c
@SQ SN:4_GL456216_random LN:66673 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:7960016dce00dda7d58501e8f5799ec4
@SQ SN:4_GL456350_random LN:227966 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:5749f57d6c9e7ffbb4f82294d28598ba
@SQ SN:4_JH584292_random LN:14945 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:c2ff41899e0f684fd93b28c58756e02f
@SQ SN:4_JH584293_random LN:207968 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:dcd15cdff49363080fd1a719fd03d69b
@SQ SN:4_JH584294_random LN:191905 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:6c5948764eea003ab2f734ecd1f8295f
@SQ SN:4_JH584295_random LN:1976 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:ebc2f8438cbd080b53dc1cf528bf070e
@SQ SN:5 LN:151834684 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:095f3d4ebe1f0bafff057cc9b130186d
@SQ SN:5_GL456354_random LN:195993 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:61643b629b3105fd2f32cc82871ca8e0
@SQ SN:5_JH584296_random LN:199368 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:9b5b5f3af54ac1c2e91964a2c8b3f9ee
@SQ SN:5_JH584297_random LN:205776 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:efb1b00ffad6dd710ffd5d46ce94a25c
@SQ SN:5_JH584298_random LN:184189 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:1910644b4393b414d16774d3a1b73c49
@SQ SN:5_JH584299_random LN:953012 UR:file:/broad/mccarroll/software/metadata/individual_reference/mm10/mm10.fasta M5:b6bc88bfe26ef155b5fe2a7b90830ca5
..................................................................

Could you figure out the problem?

Thank you

↧

MuTect2 ->Reference name for '1283' not found in sequence dictionary

November 22, 2017, 10:11 am

≫ Next: GATK Runtime Error on GenotypeGVCFs: java.lang.Double cannot be cast to java.lang.Integer

≪ Previous: MergeBamAlignment Failure

Hi, I am trying to run
java -jar GenomeAnalysisTK.jar -T MuTect2 -R ucsc.hg19.fasta -I:tumor tumor_rg.bam -I:normal normal_rg.bam -o output.vcf

I am using ucsc.hg19.fasta as a reference and I used:
1. samtools faidx to create the ucsc.hg19.fasta.fai file
2. java -jar picard.jar CreateSequenceDictionary REFERENCE=ucsc.hg19.fasta OUTPUT=ucsc.hg19.fasta.dict

I added a ReadGroup to my two BAM Files:
java -jar picard.jar AddOrReplaceReadGroups I=normal.bam O=normal_rg.bam RGID=4 RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM=20

I also validated the BAM files

Now I get the following error:

##### ERROR MESSAGE: SAM/BAM/CRAM file /analysis/tumnormalor.bam is malformed. Error details: Reference name for '1283' not found in sequence dictionary.

If I exchange the tumor and normal .bam files as command line arguments I get the error
##### ERROR MESSAGE: SAM/BAM/CRAM file /analysis/tumor.bam is malformed. Error details: Reference name for '1283' not found in sequence dictionary.

I tried to google it and found nothing. Can someone give me a clue where to look ?

Is the file ucsc.hg19.fasta.dict the sequence dictionary ?

↧

GATK Runtime Error on GenotypeGVCFs: java.lang.Double cannot be cast to java.lang.Integer

November 22, 2017, 10:14 am

≫ Next: What is the significance of "Depth across all samples" (DP) in INFO ?

≪ Previous: MuTect2 ->Reference name for '1283' not found in sequence dictionary

Hi GATK Team,
I've run into the following error when trying to genotype ~1200 GVCFs:

java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Integer

I've replicated this error on both version nightly-2017-11-22-1 and v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50 using OpenJDK 64-Bit Server VM 1.8.0_65-b17 running on CentOS7.

This error has only occurred on Chromosome 10--all others have run correctly or are still pending (chr14). All input files are gzipped and tabix-indexed and I've verified that I can properly access all 1202 using tabix.

My best guess would be that there's some unexpected input buried in a line in one of the GVCFs--I've found this on a small subset of these data where I ran into tabix issues due to an occasional malformed line, which I then fixed--but I have no idea how I'd easily determine the culprit.

GATK input and output below (trimmed to remove repeated input flags and warnings).

Any guidance would be much appreciated.
Thank you!

Sender: LSF System <lsfadmin@hpc0010>
Subject: Job 388878: <batchcall.vcf_10> in cluster <helion-poc> Exited

Job <batchcall.vcf_10> was submitted from host <login01> by user <jlawlor> in cluster <helion-poc>.
Job was executed on host(s) <16*hpc0010>, in queue <c7normal>, as user <jlawlor> in cluster <helion-poc>.
</gpfs/gpfs1/home/jlawlor> was used as the home directory.
</gpfs/gpfs2/cooperlab/CSER_batch_calls/cser_wgs_batch_cumulative_201710> was used as the working directory.
Started at Wed Nov 22 11:28:18 2017
Results reported on Wed Nov 22 11:32:14 2017

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
java -Xmx385g -Xms385g -jar /gpfs/gpfs2/cooperlab/test_batch/inadvisable_batch_call/nightly_1122/GenomeAnalysisTK.jar   -T GenotypeGVCFs    -R /gpfs/gpfs1/myerslab/reference/genomes/bwa-0.7.8/GRCh37.fa   -nt 16 -L 10 -o batchcall.vcf_10.vcf -V /gpfs/gpfs2/cooperlab/CSER_batch_calls/cser_wgs_batch_cumulative_201710/SL102359_10.g.vcf.gz -V /gpfs/gpfs2/cooperlab/CSER_batch_calls/cser_wgs_batch_cumulative_201710/SL102360_10.g.vcf.gz [ REPEATED FOR 1200 MORE SAMPLES ]

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   573.35 sec.
    Max Memory :                                 131697 MB
    Average Memory :                             31937.71 MB
    Total Requested Memory :                     460800.00 MB
    Delta Memory :                               329103.00 MB
    Max Processes :                              3
    Max Threads :                                89

The output (if any) follows:

INFO  11:28:22,217 HelpFormatter - --------------------------------------------------------------------------------------
INFO  11:28:22,221 HelpFormatter - The Genome Analysis Toolkit (GATK) vnightly-2017-11-22-1, Compiled 2017/11/22 00:01:18
INFO  11:28:22,224 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO  11:28:22,224 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO  11:28:22,225 HelpFormatter - [Wed Nov 22 11:28:22 CST 2017] Executing on Linux 3.10.0-327.3.1.el7.x86_64 amd64
INFO  11:28:22,226 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_65-b17
INFO  11:28:22,231 HelpFormatter - Program Args: -T GenotypeGVCFs -R /gpfs/gpfs1/myerslab/reference/genomes/bwa-0.7.8/GRCh37.fa -nt 16 -L 10 -o batchcall.vcf_10.vcf -V /gpfs/gpfs2/cooperlab/CSER_batch_calls/cser_wgs_batch_cumulative_201710/SL102359_10.g.vcf.gz -V /gpfs/gpfs2/cooperlab/CSER_batch_calls/cser_wgs_batch_cumulative_201710/SL102360_10.g.vcf.gz [ REPEATED FOR 1200 MORE SAMPLES ]
INFO  11:28:22,241 HelpFormatter - Executing as jlawlor@hpc0010 on Linux 3.10.0-327.3.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_65-b17.
INFO  11:28:22,242 HelpFormatter - Date/Time: 2017/11/22 11:28:22
INFO  11:28:22,242 HelpFormatter - --------------------------------------------------------------------------------------
INFO  11:28:22,243 HelpFormatter - --------------------------------------------------------------------------------------
INFO  11:29:15,805 NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs/gpfs2/cooperlab/test_batch/inadvisable_batch_call/nightly_1122/GenomeAnalysisTK.jar!/com/intel/gkl/native/libgkl_compression.so
INFO  11:29:15,830 GenomeAnalysisEngine - Deflater: IntelDeflater
INFO  11:29:15,830 GenomeAnalysisEngine - Inflater: IntelInflater
INFO  11:29:15,831 GenomeAnalysisEngine - Strictness is SILENT
INFO  11:29:16,120 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO  11:30:40,514 IntervalUtils - Processing 135534747 bp from intervals
WARN  11:30:40,515 IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation
WARN  11:30:40,515 IndexDictionaryUtils - Track variant2 doesn't have a sequence dictionary built in, skipping dictionary validation

[ REPEATED FOR 1200 MORE SAMPLES ]

INFO  11:30:40,654 MicroScheduler - Running the GATK in parallel mode with 16 total threads, 1 CPU thread(s) for each of 16 data thread(s), of 64 processors available on this machine
INFO  11:30:40,701 GenomeAnalysisEngine - Preparing for traversal
INFO  11:30:40,702 GenomeAnalysisEngine - Done preparing for traversal
INFO  11:30:40,702 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  11:30:40,703 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
INFO  11:30:40,703 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
WARN  11:30:42,986 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
WARN  11:30:42,988 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
INFO  11:30:42,988 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
INFO  11:31:10,708 ProgressMeter -        10:26601         0.0    30.0 s      49.6 w        0.0%    42.5 h      42.5 h
WARN  11:31:42,051 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not GenotypeGVCFs
INFO  11:32:11,121 ProgressMeter -        10:64601         0.0    90.0 s     149.5 w        0.0%    52.5 h      52.4 h
##### ERROR --
##### ERROR stack trace
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Integer
    at java.lang.Integer.compareTo(Integer.java:52)
    at java.util.ComparableTimSort.binarySort(ComparableTimSort.java:262)
    at java.util.ComparableTimSort.sort(ComparableTimSort.java:207)
    at java.util.Arrays.sort(Arrays.java:1312)
    at java.util.Arrays.sort(Arrays.java:1506)
    at java.util.ArrayList.sort(ArrayList.java:1454)
    at java.util.Collections.sort(Collections.java:141)
    at org.broadinstitute.gatk.utils.MathUtils.median(MathUtils.java:1010)
    at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.combineAnnotationValues(ReferenceConfidenceVariantContextMerger.java:84)
    at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:206)
    at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:303)
    at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:136)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.ShardTraverser.call(ShardTraverser.java:98)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version nightly-2017-11-22-1):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: java.lang.Double cannot be cast to java.lang.Integer
##### ERROR ------------------------------------------------------------------------------------------

↧

What is the significance of "Depth across all samples" (DP) in INFO ?

March 27, 2016, 5:54 pm

≫ Next: Questions about BQSR (2014-2016)

≪ Previous: GATK Runtime Error on GenotypeGVCFs: java.lang.Double cannot be cast to java.lang.Integer

Hi,

Although I have read through the related topics, I'm still quite confused about the significance of "Depth across all samples" (DP) in INFO in the vcf file. Does "across samples" mean it addition the read depth of all the samples together or is it a mean over all the samples?
In my vcf file (after joint genotyping in gvcf mode), I obtained DP in INFO between 30 and 99 while the sample-DP are much less in general.
I think the DP in INFO is a sum of depth, am I right?

↧

Questions about BQSR (2014-2016)

January 15, 2017, 10:01 am

≫ Next: Bug in HaplotypeCaller：GT is called “./.”，but AD and DP isn't 0

≪ Previous: What is the significance of "Depth across all samples" (DP) in INFO ?

This discussion was created from comments split from: Base Quality Score Recalibration (BQSR).

↧

Bug in HaplotypeCaller：GT is called “./.”，but AD and DP isn't 0

November 22, 2017, 6:48 pm

≫ Next: libVectorLoglessPairHMM is not present in GATK 3.8 - HaplotypeCaller is slower than 3.4-46!

≪ Previous: Questions about BQSR (2014-2016)

HI, I'd like to report a weird result from HaplotypeCaller.
We have a patient sequenced by targeted sequencing,We expected to see no heterozygous variants called in this locus,it have found the insert ,but GT is "./.", and miss the other information;
Therefore I'm really confused why the substitution has "./." called by the HaplotypeCaller and why it passed the filter.

Many Thanks
Minghui

↧

libVectorLoglessPairHMM is not present in GATK 3.8 - HaplotypeCaller is slower than 3.4-46!

November 23, 2017, 3:59 am

≫ Next: How to create an indexed VCF for BaseRecalibrator?

≪ Previous: Bug in HaplotypeCaller：GT is called “./.”，but AD and DP isn't 0

We are running GATK on a multi-core Intel Xeon that does not have AVX. We have just upgraded from running 3.4-46 to running 3.8, and HaplotypeCaller runs much more slowly. I noticed that our logs used to say:

Using SSE4.1 accelerated implementation of PairHMM
INFO 06:18:09,932 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file
INFO 06:18:09,933 VectorLoglessPairHMM - Using vectorized implementation of PairHMM

But now they say:

WARN 07:10:21,304 PairHMMLikelihoodCalculationEngine$1 - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
WARN 07:10:21,310 PairHMMLikelihoodCalculationEngine$1 - AVX-accelerated native PairHMM implementation is not supported. Falling back to slower LOGLESS_CACHING implementation

I'm guessing the newfangled Intel GKL isn't working so well for us. Note that I had a very similar problem with GATK 3.4-0, in http://gatk.vanillaforums.com/entry/passwordreset/21436/OrxbD0I4oRDaj8y1hDSE and this was resolved in GATK 3.4-46.

↧

How to create an indexed VCF for BaseRecalibrator?

November 23, 2017, 4:04 am

≫ Next: MuTect2 not found using GATK version 3.8 and Java 1.8

≪ Previous: libVectorLoglessPairHMM is not present in GATK 3.8 - HaplotypeCaller is slower than 3.4-46!

First, I searched the forum for answers to my question, but none of them seem to be working. I have a non-model organism and I'm trying to use BaseRecalibrator as recommended in the manual: run the HaplotypeCaller, filter the most robust variants and apply BaseRecalibrator using these variants as a database.

Let's say I select the most robust variants based on QUAL>10000 (an arbitrary limit). I use bcftools to do this:

bcftools filter --include 'QUAL>10000' --output-type v --output qual.vcf combined.vcf

This creates a new uncompressed VCF file with filtered variants. When I try it with BaseRecalibrator

BaseRecalibrator -nt 1 -nct 8 -R genome/A.pisum_genome_AphidBase_fixed.fasta -I bam/F1avr.bam -knownSites vcf/qual.vcf -o bam/F1avr.recal.table

I get an error

ERROR MESSAGE: An index is required, but none found., for input source: .../vcf/qual.vcf

I tried creating an index with bcftools, but BaseRecalibrator does not recognise a csi index. Renaming .csi to .idx threw an error about an incompatible index.

I tried following one of the suggestions from the forum and used a .bgzip compressed VCF. Again, "no index" error. When I created a bcf file with bcftools, it was not recognised.

I'm running out of options. What do I do?

↧

MuTect2 not found using GATK version 3.8 and Java 1.8

November 23, 2017, 6:27 am

≫ Next: Service note: forum on break, support resumes Nov 27

≪ Previous: How to create an indexed VCF for BaseRecalibrator?

Hi,

I receive an error of not finding MuTect2 when trying to run it with GATK version 3.8 and Java 1.8. Please, find below the Java version:
java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode)

The error message:

ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836):

ERROR

ERROR This means that one or more arguments or inputs in your command are incorrect.

ERROR The error message below tells you what is the problem.

ERROR

ERROR If the problem is an invalid argument, please check the online documentation guide

ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.

ERROR

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://software.broadinstitute.org/gatk

ERROR

ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.

ERROR

ERROR MESSAGE: Invalid command line: Multiple values associated with given definition, but this argument expects only one: analysis_type

ERROR ------------------------------------------------------------------------------------------

And this was the call that I made:
java -Xmx512g -jar ${GATK_TO_USE} -T MuTect2 -R $ReferenceGenome -I:normal $MuTect21I -T:tumor $MuTect22I -ploidy 2 --max_alt_alleles_in_normal_count 5 --min_base_quality_score 20 -o $Mutect2O

Am I missing something here, I understood that Java 1.8 is required for GATK 3.8 and MuTect2 should be included in the GATK version? Could you please help?

best regards,

Paula

↧

Service note: forum on break, support resumes Nov 27

November 23, 2017, 9:09 am

≫ Next: Why do half of my variants get rejected in picard LiftoverVcf?

≪ Previous: MuTect2 not found using GATK version 3.8 and Java 1.8

Today in the US we're celebrating the national "Stuffing Your Face" holiday known as Thanksgiving, and we get the day off tomorrow to recover, so the forum is going to be unattended until Monday Nov 27.

So whether you're in the US and looking for an escape from your in-laws, or you're in some other part of the world and waiting for an answer to your pressing GATK question... Why not take a break, slip away for a bit and read the HaplotypeCaller paper, which is finally out in preprint form on bioarxiv here, under the title "Scaling accurate genetic variant discovery to tens of thousands of samples".

Or do both us and yourself a favor by filling in our GATK survey and winning one of the 100 prizes we're giving away! Seriously, we have a whole bunch of $50 Amazon gift cards (which you can get in your local currency if you live outside the USA) and prizes of up to $500 compute credits to spend in FireCloud, our cloud-based analysis platform. You can read more about the goal of the survey here.

↧

Why do half of my variants get rejected in picard LiftoverVcf?

March 15, 2016, 3:09 am

≫ Next: t_lod_fstar big value

≪ Previous: Service note: forum on break, support resumes Nov 27

Dear team,

You may not be providing support for picard, but at least the LiftoverVCF tool is referred to several places in this forum. I am working on two batches of bam-level data from the same project that have unhappily been aligned to b37 and hg19 respectively. My plan has been to lift over the hg19 files to b37 in the g.vcf stage and use GenotypeGVCFs on the full dataset. This is crucial to obtain a VQSR-worthy dataset downstream, which will then be 50-60 exomes. However, when I use picard LiftoverVCF with hg91tob37.chain, about 50% of variants are rejected due to mismatching reference alleles:

INFO 2016-03-14 13:58:26 LiftoverVcf Processed 11354836 variants.
INFO 2016-03-14 13:58:26 LiftoverVcf 0 variants failed to liftover.
INFO 2016-03-14 13:58:26 LiftoverVcf 5319041 variants lifted over but had mismatching reference alleles after lift over.
INFO 2016-03-14 13:58:26 LiftoverVcf 46.8438% of variants were not successfully lifted over and written to the output.

Any comments as to what might be the problem would be deeply appreciated!
It should be possible to leftover and GenotypeGVCFs together, right?

Best regards,
Lasse

↧

t_lod_fstar big value

November 24, 2017, 6:18 am

≫ Next: Question when I use FastqToSam to confert my fast Q files to sam files

≪ Previous: Why do half of my variants get rejected in picard LiftoverVcf?

t_lod_fstar is a probability value, because my values are above a thousand?

Example:
t_lod_fstar
982.172809
2812.706107
1818.586417
899.623243
736.025622
3239.922082
3242.985246

↧

Question when I use FastqToSam to confert my fast Q files to sam files

November 24, 2017, 4:18 pm

≫ Next: Half of my variants get rejected by using picard LiftoverVcf

≪ Previous: t_lod_fstar big value

Hi, I have the problem when I use the FastqToSam to convert my fastqfiles to sam files, I use the latest picard version.

I use the command line below:
java -jar picard.jar FastqToSam F1=94210_CGTACTAG_S2_L001_R1_001.fastq F2=94210_CGTACTAG_S2_L001_R2_001.fastq O=fastq_to_bam.bam USE_SEQUENTIAL_FASTQS=true QUALITY_FORMAT=Illumina ALLOW_AND_IGNORE_EMPTY_LINES=true USE_JDK_DEFLATER=true USE_JDK_INFLATER=true SM=for_tool_testing

And it has the bug like these
[Fri Nov 24 19:10:16 EST 2017] picard.sam.FastqToSam done. Elapsed time: 19.69 minutes.
Runtime.totalMemory()=3359113216
Exception in thread "Thread-106" Exception in thread "Thread-140" Exception in thread "Thread-83" Exception in thread "Thread-22" Exception in thread "Thread-20" Exception in thread "Thread-100" Exception in thread "Thread-14" Exception in thread "Thread-107" Exception in thread "Thread-51" Exception in thread "Thread-35" Exception in thread "Thread-125" Exception in thread "Thread-50" Exception in thread "Thread-94" Exception in thread "Thread-135" Exception in thread "Thread-31" Exception in thread "Thread-129" Exception in thread "Thread-136" htsjdk.samtools.util.RuntimeIOException: java.nio.file.NoSuchFileException: /tmp/yingma/sortingcollection.6119188463332115860.tmp
at htsjdk.samtools.util.IOUtil$DeletePathThread.run(IOUtil.java:374)
Caused by: java.nio.file.NoSuchFileException: /tmp/yingma/sortingcollection.6119188463332115860.tmp
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)

↧

Half of my variants get rejected by using picard LiftoverVcf

November 24, 2017, 8:29 pm

≫ Next: I had a exome bam file, ist preprocessing is required or directly I can run bam to find variants.

≪ Previous: Question when I use FastqToSam to confert my fast Q files to sam files

Hi,
I want to include samples which are aligned on b37 and hg19 for genotypeGVCFs and VQSR with b37. I have used picard LiftoverVCF on gvcf files with hg19tob37.chain. But 60% of variants are rejected due to mismatching reference alleles. In this scenario should I realign the samples on b37 ? .Please give some suggestions on this regard.

↧

I had a exome bam file, ist preprocessing is required or directly I can run bam to find variants.

November 25, 2017, 12:08 am

≫ Next: VQSR: Bad input: Values for DP annotation not detected for ANY training variant in the input callset

≪ Previous: Half of my variants get rejected by using picard LiftoverVcf

↧

VQSR: Bad input: Values for DP annotation not detected for ANY training variant in the input callset

May 7, 2015, 6:45 am

≫ Next: IMPORTANT -- Bug alert for GATK4 GenomicsDBImport

≪ Previous: I had a exome bam file, ist preprocessing is required or directly I can run bam to find variants.

Hi team,
I'm have a vcf callset file generated using HaplotypeCaller in --emitRefConfidence GVCF mode with subsequent GenotypeGVCFs.
I used the generated output.vcf file as input for VariantRecalibration
The command:
java -jar $GATK_HOME/GenomeAnalysisTK.jar \ -T VariantRecalibrator \ -R $REFERENCE \ -input exome_set_output.vcf \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 $GOLD_STANDARD_HAPMAP \ -resource:omni,known=false,training=true,truth=true,prior=12.0 $GOLD_STANDARD_OMNI \ -resource:1000G,known=false,training=true,truth=false,prior=10.0 $GOLD_STANDARD_1000G \ -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 $GOLD_STANDARD_DBSNP \ -an DP -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -an InbreedingCoeff \ -mode SNP \ -tranche 100.0 -tranche 99.9 \ -recalFile exome_set_output_SNP.recal \ -tranchesFile recal_SNP.tranches \ -rscriptFile recal_SNP_plots.R \

But I get the following error:
##### ERROR MESSAGE: Bad input: Values for DP annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.

My input vcf file looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT RTN005 RTN007 RTN009 RTN024 RTN028 RTN038 RTN039 RTN045 RTN051 RTN097 RTN102 RTN108 RTN122 RTN126 RTN127 RTN133 1 762273 . G A 23942.06 . AC=31;AF=0.969;AN=32;BaseQRankSum=-9.420e-01;ClippingRankSum=-6.500e-02;DP=743;FS=2.053;GQ_MEAN=139.56;GQ_STDDEV=44.68;InbreedingCoeff=-0.0323;MLEAC=31;MLEAF=0.969;MQ=42.48;MQ0=0;MQRankSum=-2.517e+00;NCC=0;QD=32.27;ReadPosRankSum=-1.485e+00 GT:AD:DP:GQ:PL 1/1:0,42:42:99:1426,126,0 1/1:0,28:28:84:945,84,0 1/1:0,69:69:99:2430,208,0 1/1:0,38:38:99:1295,114,0 1/1:0,54:54:99:1876,162,0 1/1:0,28:28:84:977,84,0 1/1:0,37:37:99:1282,111,0 1/1:0,65:65:99:2207,195,0 1/1:0,46:46:99:1572,138,0 1/1:0,45:45:99:1565,135,0 1/1:0,60:60:99:2019,180,0 1/1:0,52:52:99:1758,156,0 1/1:0,69:69:99:2404,208,0 1/1:0,19:19:57:657,57,0 1/1:0,41:41:99:1413,123,0 0/1:40,9:49:99:152,0,1320 1 762353 . G C 59.07 . AC=1;AF=0.031;AN=32;BaseQRankSum=0.111;ClippingRankSum=-5.560e-01;DP=321;FS=0.000;GQ_MEAN=51.81;GQ_STDDEV=19.63;InbreedingCoeff=-0.0328;MLEAC=1;MLEAF=0.031;MQ=42.47;MQ0=0;MQRankSum=0.779;NCC=0;QD=1.97;ReadPosRankSum=0.501 GT:AD:DP:GQ:PL 0/0:22,0:22:63:0,63,945 0/0:23,0:23:60:0,60,832 0/0:15,0:15:23:0,23,574 0/0:22,0:22:60:0,60,900 0/0:26,0:26:63:0,63,945 0/0:8,0:8:21:0,21,303 0/0:15,0:15:42:0,42,489 0/0:26,0:26:60:0,60,900 0/0:11,0:11:30:0,30,450 0/0:23,0:23:63:0,63,945 0/1:25,5:30:95:95,0,783 0/0:23,0:23:60:0,60,900 0/0:29,0:29:63:0,63,945 0/0:20,0:20:60:0,60,685 0/0:15,0:15:36:0,36,540 0/0:13,0:13:30:0,30,450 1 861630 . G A 1958.44 . AC=22;AF=0.688;AN=32;BaseQRankSum=-1.380e+00;ClippingRankSum=0.198;DP=88;FS=7.101;GQ_MEAN=24.38;GQ_STDDEV=23.34;InbreedingCoeff=0.0754;MLEAC=25;MLEAF=0.781;MQ=60.00;MQ0=0;MQRankSum=0.00;NCC=0;QD=25.43;ReadPosRankSum=0.720 GT:AD:DP:GQ:PL 0/0:3,0:3:0:0,0,51 0/1:4,2:6:46:46,0,146 1/1:0,4:4:12:133,12,0 1/1:0,4:4:12:133,12,0 1/1:0,6:6:18:197,18,0 1/1:0,2:2:6:64,6,0 1/1:0,7:7:21:209,21,0 1/1:0,8:8:24:264,24,0 1/1:0,7:7:21:232,21,0 0/1:4,3:7:76:76,0,137 0/0:4,0:4:0:0,0,93 0/1:2,4:6:64:113,0,64 0/1:2,5:7:51:135,0,51 1/1:0,10:10:30:321,30,0 1/1:0,3:3:9:100,9,0 0/0:4,0:4:0:0,0,88

Can someone point me to what I'm doing wrong? I can see that there are DP values in the vcf file so I don't understand why it complains there aren't any annotations.

Thanks very much,
Tesa

↧

IMPORTANT -- Bug alert for GATK4 GenomicsDBImport

November 5, 2017, 5:40 pm

≫ Next: GATK 3.8 log4j error

≪ Previous: VQSR: Bad input: Values for DP annotation not detected for ANY training variant in the input callset

We have identified a major bug in the GenomicsDBImport tool that affects all released beta versions of GATK4 up to 4.beta.5 (inclusive). The bug occurs under specific conditions (detailed below) and causes the output of joint calling to be scrambled across samples, i.e. the sample names will not be associated with the correct sample data. For example, the data for sample1 may be labeled as sample3. The good news is that the results are recoverable as long as you have a record of the exact parameters used in the original command.

So if you have used this tool, please read the detailed description of the bug conditions and recovery procedure below. We apologize for any inconvenience this may cause you.

Going forward, everyone who plans to use GenomicsDBImport should upgrade to GATK4 version 4.beta.6 or later.

Conditions under which the bug occurs

The bug occurs when all of the following conditions are met:

You used a version of GenomicsDBImport from before GATK4 version 4.beta.6 (for nightlies, up to 4.beta.5-66-g9609cb3-SNAPSHOT);
You used the --batchSize argument with a setting other than 0;
You imported multiple batches, meaning you had more input GVCFs than what your batch size was set to;
The input GVCFs were not sorted according to Java’s natural sorting of strings (see details below).

That last point applies if you specified files using the -V argument or if you used a sampleNameMap file. Note that Java sorts strings lexicographically, not in a numerically aware fashion, so sample1, sample2, …, sample9, sample10, sample11 would be considered "Out of order".

Concrete example

Given the following samples.txt file:

HG00096 HG00096.g.vcf.gz
NA19625 NA19625.g.vcf.gz
HG00268 HG00268.g.vcf.gz

This command using sampleNameMap:

 ./gatk-launch GenomicsDBImport --sampleNameMap samples.txt --batchSize 2 --genomicsDBWorkspace workspace -L chr21

or this one using -V:

./gatk-launch GenomicsDBImport -V HG00096.g.vcf.gz -V NA19625.g.vcf.gz -V HG00268.g.vcf.gz --batchSize 2 --genomicsDBWorkspace workspace -L chr1

would be affected by the bug and produce corrupted output.

Recovery procedure

We have a new tool in GATK4 version 4.beta.6 called FixCallSetSampleOrdering. This tool will take a corrupted callset and reassign the correct names. It requires that you use the EXACT SAME settings you used in your GenomicsDBImport command for the --batchSize argument and for the method and order you used to specify input files. For the examples shown above, the recovery command would be:

./gatk-launch FixCallSetSampleOrder --sampleNameMap samples.txt --batchSize 2 -O fixedCallSet.vcf

To be clear, if you specified the input GVCFs with -V originally, you’ll need to create a sample name map with the ordering you used in your command in order to run the fixup tool.

Note that the settings aren’t recorded anywhere in the final output, so if you didn't keep a record of what those settings were, you may need to re-call your dataset from the original BAMs. However, please contact us before doing so. We are working on a new fingerprinting tool that should be able to recover the correct names in many cases without recalling. It will also be able to validate that the sample names are correct.

We apologize again for any disruption this may cause to your work.

↧

GATK 3.8 log4j error

July 31, 2017, 5:35 pm

≫ Next: When there is a GATK update, when do you know if you should rerun old data?

≪ Previous: IMPORTANT -- Bug alert for GATK4 GenomicsDBImport

I just upgraded from GATK 3.7 to the newly released GATK 3.8 (3.8-0-ge9d806836) and I am getting a StatusLogger error:

ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/path/GenomeAnalysisTK-3.8-0/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...

Despite the error message, the tools seem to work just fine as far as I can tell.

Is this really an error? Is there a way to fix it?

↧

When there is a GATK update, when do you know if you should rerun old data?

June 6, 2016, 8:44 am

≫ Next: Abnormally high per-sample depth after running GenotypeGVCFs

≪ Previous: GATK 3.8 log4j error

When there is a new version of GATK released, how do you know whether it is necessary to rerun all of your old data for an ongoing project? I realize there is probably not one good answer to this question because it would depend on the updates involved but I am just looking for some general recommendations. For instance, my last batch of sequencing data was done under GATK v3.4 and an update was just released (v3.6). I am currently using the workflow where I generate a GVCF for each sample and then every new batch of exome data I get, I use the previous GVCF's and the new GVCF's and do variant calling (GenotypeGVCFs) on all of them at once. So the next time I run, I would plan on using v3.6 (because it is the most updated), but does that negate the usefulness of the other files generated with older versions? From 3.4 to 3.6 the only major workflow change appears to be the removal of "local realignment around indels" from the workflow according to the release notes but there are also some bug fixes it appears.

I am just trying to get some recommendations for dealing with program updates because I have a sequencing project that will go on for a few years and I am sure there will be countless upgrades to make the GATK workflow more accurate and this concern will just keep rearing its ugly head.

Thank you for your program and your help,

Annie

↧