Missing or Inconsistent call between single-sample and multi-sample SNP calling

February 13, 2018, 7:16 am

≫ Next: QD Annotation error in GATK in VariantRecalibration step

≪ Previous: WGS+WES combined discovery/genotyping

Dear all,

I have generated a gVCF file using HaplotypeCaller (v3.7)and searched for a specific variant of interest which looks like below:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT    N417
 chr1   627033  .   C   <NON_REF>   .   .   END=627033  GT:DP:GQ:MIN_DP:PL  0/0:4:0:4:0,0,0

The same gVCF is used for genotype gVCF across multiple samples and the site looks like below (genotype is shown only for this sample):

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  N417
chr1    627033  .   C   T   101.69  .   AC=2;AF=0.011;AN=182;DP=500;ExcessHet=0.0127;FS=0.000;InbreedingCoeff=0.1696;MLEAC=2;MLEAF=0.011;MQ=51.17;QD=12.71;SOR=1.179    GT:AD:DP:GQ:PL  ./.:4,0:4

A bam out is generated which looks like below which supports an alternate allele "T".

The IGV or bamout result is supported by PCR. However, the call is not made by GATK. Could someone comment about this behaviour and best practices to rescue such variants?

↧

QD Annotation error in GATK in VariantRecalibration step

February 13, 2018, 12:05 am

≫ Next: Spanning or overlapping deletions (* allele)

≪ Previous: Missing or Inconsistent call between single-sample and multi-sample SNP calling

hi all, i downloaded a BAM file from 1000 genomes to work on and converted it to FASTQ and aligned with bwa mem to hg38, this worked fine.

later I wanted to run VariantRecalibration and use the following commands for SNP and they worked fine as well.

java -Xmx8g -jar algorithms/gatk3/gatk.jar -T VariantRecalibrator -R references/hg38gatkbundle/Homo_sapiens_assembly38.fasta -input data/HG100/HG100.output.raw.combined.vcf -mode SNP -resource:hapmap,known=false,training=true,truth=true,prior=15.0 references/hg38gatkbundle/hapmap_3.3.hg38.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 references/hg38gatkbundle/1000G_omni2.5.hg38.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 references/hg38gatkbundle/1000G_phase1.snps.high_confidence.hg38.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 references/hg38gatkbundle/Homo_sapiens_assembly38.dbsnp138.vcf -an DP -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum --maxGaussians 4 -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile data/HG100/HG100.recalibrate.SNP.recal -tranchesFile data/HG100/HG100.recalibrate.SNP.tranches -rscriptFile data/HG100/HG100.recalibrate.SNP.plots.R

and

java -Xmx8g -jar algorithms/gatk3/gatk.jar -T ApplyRecalibration -R references/hg38gatkbundle/Homo_sapiens_assembly38.fasta -input data/HG100/HG100.output.raw.combined.vcf -mode SNP --ts_filter_level 99.5 -recalFile data/HG100/HG100.recalibrate.SNP.recal -tranchesFile data/HG100/HG100.recalibrate.SNP.tranches -o data/HG100/HG100.recalibrated.snp.vcf
Next I wanted to perform VariantRecalibration for InDels so use the following command:

java -Xmx8g -jar algorithms/gatk3/gatk.jar -T VariantRecalibrator -R references/hg38gatkbundle/Homo_sapiens_assembly38.fasta -input data/HG100/HG100.output.raw.combined.vcf -mode INDEL -resource:mills,known=false,training=true,truth=true,prior=12.0 references/hg38gatkbundle/Mills_and_1000G_gold_standard.indels.hg38.vcf -resource:dnsnp,known=true,training=false,truth=false,prior=2.0 references/hg38gatkbundle/Homo_sapiens_assembly38.dbsnp138.vcf -an QD -an DP -an FS -an SOR -an ReadPosRankSum -an MQRankSum -an InbreedingCoeff --maxGaussians 4 -recalFile data/HG100/HG100.recalibrate.INDEL.recal -tranchesFile data/HG100/HG100.recalibrate.INDEL.tranches -rscriptFile data/HG100/HG100.recalibrate.INDEL.plots.R
The issue I am facing is that when i run the above command I am getting an annotation related error for e.g QD annotation is not found on any input callsets. I have confirmed that the input VCf has the annotations since it was specified on the UnifiedGenotyper.

Any ideas why this is happening, i am trying to detect variants from a single BAM file.

Any help will be highly appreciated.

↧

Spanning or overlapping deletions (* allele)

February 3, 2016, 9:28 am

≫ Next: Missing field QD in vc variant (ERROR)

≪ Previous: QD Annotation error in GATK in VariantRecalibration step

We use the term spanning deletion or overlapping deletion to refer to a deletion that spans a position of interest.

The presence of a spanning deletion affects how we can represent genotypes at any site(s) that it spans for those samples that carry the deletion, whether in heterozygous or homozygous variant form. Page 8, item 5 of the VCF v4.3 specification reserves the * allele to reference overlapping deletions. This is not to be confused with the bracketed asterisk <*> used to denote symbolic alternate alleles.

Here we illustrate with four human samples. Bob and Lian each have a heterozygous A to T single polymorphism at position 20, our position of interest. Kyra has a 9 bp deletion from position 15 to 23 on both homologous chromosomes that extends across position 20. Lian and Omar each are heterozygous for the same 9 bp deletion. Omar and Bob's other allele is the reference A.

What are the genotypes for each individual at position 20? For Bob, the reference A and variant T alleles are clearly present for a genotype of A/T.

What about Lian? Lian has a variant T allele plus a 9 bp deletion overlapping position 20. To notate the deletion as we do single nucleotide deletions is technically inaccurate. We need a placeholder notation to signify absent sequence that extends beyond the position of interest and that is listed for an earlier position, in our case position 14. The solution is to use a star or asterisk * at position 20 to refer to the spanning deletion. Using this convention, Lian's genotype is T/*.

At the sample-level, Kyra and Omar would not have records for position 20. However, we are comparing multiple samples and so we indicate the spanning deletion at position 20 with *. Omar's genotype is A/* and Kyra's is */*.

In the VCF, depending on the format used by tools, positions equivalent to our example position 20 may or may not be listed. If listed, such as in the first example VCF shown, the spanning deletion is noted with the asterisk * under the ALT column. The spanning deletion is then referred to in the genotype GT for Kyra, Lian and Omar. Alternatively, a VCF may altogether avoid referencing the spanning deletion by listing the variant with the spanning deletion together with the deletion. This is shown in the second example VCF at position 14.

↧

Missing field QD in vc variant (ERROR)

February 20, 2018, 9:01 am

≫ Next: Differences between CollectFragmentCounts and CalculateTargetCoverage

≪ Previous: Spanning or overlapping deletions (* allele)

Hello!

I'm trying to extract SNPs of interest from my filtered VCF file of passing SNPs. I just used SelectVariants to create that file, and am now trying to use VariantsToTable to put them into a table format for analysis in R. Unfortunately, I am getting this error:

The help pages for this issue seem to be a bit scarce, and I'm not sure why it gets terminated. I can see that the command is running, and it does create a table but stops prematurely. What should I do to fix this issue?

Thank you!!

↧

Differences between CollectFragmentCounts and CalculateTargetCoverage

February 13, 2018, 10:38 am

≫ Next: Ploidy level in HaplotypeCaller in GATK 4.0

≪ Previous: Missing field QD in vc variant (ERROR)

Good afternoon,

I was hoping somebody could help illuminate the differences/similarities between CollectFragmentCounts (4.0 official release) and CalculateTargetCoverage (4.3 Beta). Are these the same tools / are they using same algorithm?

Also, is the Picard-style header produced in CollectFragmentCounts necessary for downstream tools now?

Thank you so much!

↧

Ploidy level in HaplotypeCaller in GATK 4.0

January 27, 2018, 11:18 pm

≫ Next: Picard Sort Vcf Error

≪ Previous: Differences between CollectFragmentCounts and CalculateTargetCoverage

Hi,

Thanks for the new version of GATK (GATK4.0).

We have a pooling of 48 samples and the organism is diploid, we are using ploidy of 96 (48x2=96). earlier when I am using HaplotypeCaller for variant calling in older versions of GATK, I am getting the error not enough memory to run this program., so was unable to run this with HaplotypeCaller earlier. Now when I tried it with GATK 4.0 version I am not getting this error, but a warn message mentioned below

12:40:23.159 WARN HaplotypeCallerGenotypingEngine - Removed alt alleles where ploidy is 96 and original allele count is 3, whereas after trimming the allele count becomes 2. Alleles kept are:[T*, C]

The command line which we have used is below

java -jar -Xmx64g gatk-package-4.0.0.0-local.jar HaplotypeCaller -R tilling.fa -I C1_S1.sorted.bam -O C1_S1.vcf -stand-call-conf 20.0 -ploidy 96

Can you please help us what does the warn message means, whether the command and the options which I am using are right, or I need to include more options for efficient variant calling.

Thanks in advance.

Regards,
Prateek

↧

Picard Sort Vcf Error

January 24, 2017, 7:27 pm

≫ Next: picard ./gradlew test error

≪ Previous: Ploidy level in HaplotypeCaller in GATK 4.0

Hello.

I am using GATK version 3.6, picard-2.8.2.jar

I downloaded hapmap_3.3.hg38.vcf from gatk resource bundle. I then used the below command to remove chr notation.
awk '{gsub(/^chr/,""); print}' hapmap_3.3.hg38.vcf > no_chr_hapmap_3.3.hg38.vcf.vcf

Before (hapmap_3.3.hg38.vcf)
chr1 2242065 rs263526 T C . PASS AC=724;AF=0.259;AN=2792
chr1 2242417 rs16824926 C . . PASS AN=530
chr1 2242880 rs11581436 A . . PASS AN=540

After (no_chr_hapmap_3.3.hg38.vcf.vcf)
1 6421563 rs4908891 G A . PASS AC=1086;AF=0.389;AN=2792
1 6421782 rs4908892 A G . PASS AC=1692;AF=0.606;AN=2792
1 6421856 rs12078257 T C . PASS AC=368;AF=0.132;AN=2790

Then, use Picard SortVcf to sort the no_chr_hapmap_3.3.hg38.vcf.vcf
java -jar picard-2.8.2.jar SortVcf I=removedChr_HapMap.vcf O=sortedHapMap.vcf SEQUENCE_DICTIONARY=hg38.dict

hg38.dict
@SQ SN:1 LN:248956422 UR:file:/media/ubuntu/Elements/TOOL/hg38.fa M5:2648ae1bacce4ec4b6cf337dcae37816
@SQ SN:10 LN:133797422 UR:file:/media/ubuntu/Elements/TOOL/hg38.fa M5:907112d17fcb73bcab1ed1c72b97ce68
@SQ SN:11 LN:135086622 UR:file:/media/ubuntu/Elements/TOOL/hg38.fa M5:1511375dc2dd1b633af8cf439ae90cec
@SQ SN:12 LN:133275309 UR:file:/media/ubuntu/Elements/TOOL/hg38.fa M5:e81e16d3f44337034695a29b97708fce

I have then encountered this error:

Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=20) was found when SAMSequenceRecord(name=1,length=248956422,dict_index=0,assembly=null) was expected.
at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:126)
at picard.vcf.SortVcf.doWork(SortVcf.java:95)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=20) was found when SAMSequenceRecord(name=1,length=248956422,dict_index=0,assembly=null) was expected.
at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:170)
at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:124)
... 4 more

I have tried a lot of times but still getting back the same error. Kindly do advise how can I solve this problem.

I would then like to perform SelectVariants to extract variants that missed in HapMap but present in my dataset.

Thank you so much in advance.

Cheers,
Moon

↧

picard ./gradlew test error

February 20, 2018, 11:20 am

≫ Next: [ERROR] GATK4 Mutect2

≪ Previous: Picard Sort Vcf Error

I have an error when tried to run ./gradle tests on my system.
I am running Linux Mint 18 system. Could you please let me know what should I do?
Thank you.

$ java -version
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)
OpenJDK Server VM (build 25.151-b12, mixed mode)

$ git --version
git version 2.7.4

$ ll
total 108
drwxrwxr-x 10 usser usser 4096 Feb 20 21:50 ./
drwxr-xr-x 23 usser usser 4096 Feb 20 21:56 ../
drwxrwxr-x 3 usser usser 4096 Feb 20 21:51 build/
-rw-rw-r-- 1 usser usser 12763 Feb 20 20:52 build.gradle
-rw-rw-r-- 1 usser usser 1098 Feb 20 20:52 build.xml
-rw-rw-r-- 1 usser usser 358 Feb 20 20:52 .classpath
-rw-rw-r-- 1 usser usser 869 Feb 20 20:52 Dockerfile
-rw-rw-r-- 1 usser usser 179 Feb 20 20:52 .dockerignore
drwxrwxr-x 3 usser usser 4096 Feb 20 20:52 etc/
drwxrwxr-x 8 usser usser 4096 Feb 20 20:52 .git/
drwxrwxr-x 2 usser usser 4096 Feb 20 20:52 .github/
-rw-rw-r-- 1 usser usser 190 Feb 20 20:52 .gitignore
drwxrwxr-x 3 usser usser 4096 Feb 20 20:52 gradle/
-rwxrwxr-x 1 usser usser 5046 Feb 20 20:52 gradlew*
-rw-rw-r-- 1 usser usser 1072 Feb 20 20:52 LICENSE.txt
drwxrwxr-x 2 usser usser 4096 Feb 20 20:52 project/
-rw-rw-r-- 1 usser usser 356 Feb 20 20:52 .project
-rw-rw-r-- 1 usser usser 6045 Feb 20 20:52 README.md
-rw-rw-r-- 1 usser usser 56 Feb 20 20:52 settings.gradle
drwxrwxr-x 4 usser usser 4096 Feb 20 20:52 src/
drwxrwxr-x 3 usser usser 4096 Feb 20 20:52 testdata/
-rw-rw-r-- 1 usser usser 601 Feb 20 20:52 .travis.yml

$ ./gradlew test

FAILURE: Build failed with an exception.

Where:
Build file '/home/usser/picard/build.gradle' line: 55
What went wrong:
A problem occurred evaluating root project 'picard'.

Cannot invoke method getURLs() on null object
Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 1.695 secs

↧

[ERROR] GATK4 Mutect2

February 20, 2018, 12:08 pm

≫ Next: [ERROR] make_acnv_pon_config

≪ Previous: picard ./gradlew test error

Hi,

I want to call mutations for my WGS data. But, the workflow for one of my pairs is failed. I got the following ERROR message in MergeVcfs task. It seems that one of the VCF files is not in correct format. But, I checked it and did not find any format issue. Could you please help me? Thanks a lot!

Workspace ID: 8d717792-41af-4339-ac2b-634540370e63
Submission ID: 26cd48f6-5815-4472-a185-c9e38b2fa2c7

00:00:04s. Time for last 10,000: 0s. Last read position: 18:19,501,893 INFO 2018-02-20 19:36:15 MergeVcfs Processed 320,000 records. Elapsed time: 00:00:04s. Time for last 10,000: 0s. Last read position: 19:43,221,725 [Tue Feb 20 19:36:15 UTC 2018] picard.vcf.MergeVcfs done. Elapsed time: 0.08 minutes. Runtime.totalMemory()=886046720 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp htsjdk.tribble.TribbleException: Line 5829: there aren't enough columns for line 20 57842652 . A AT . . DP=68;ECNT=1;NLOD=3.53;N_ART_LOD=1.58;P (we expected 9 tokens, and saw 8 ), for input source: file:///cromwell_root/fc-8d717792-41af-4339-ac2b-634540370e63/26cd48f6-5815-4472-a185-c9e38b2fa2c7/Mutect2/27df3a6c-a00c-4cd0-a3e3-81ff18676b25/call-M2/shard-45/output.vcf.gz at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:281) at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:262) at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:64) at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70) at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37) at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:372) at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:353) at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:314) at htsjdk.samtools.util.PeekableIterator.advance(PeekableIterator.java:71) at htsjdk.samtools.util.PeekableIterator.next(PeekableIterator.java:57) at htsjdk.samtools.util.MergingIterator.next(MergingIterator.java:107) at picard.vcf.MergeVcfs.doWork(MergeVcfs.java:221) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269) at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:24) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195) at org.broadinstitute.hellbender.Main.main(Main.java:277)

The line:
20 57842652 . A AT . . DP=68;ECNT=1;NLOD=3.53;N_ART_LOD=1.58;POP_AF=1.000e-03;P_GERMLINE=-2.786e+00;RPA=11,12;RU=T;STR;TLOD=3.44 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:24,3:0.206:9,2:13,1:24:272,216:60:33:0.111,0.00,0.111:7.290e-03,0.069,0.923 0/0:28,2:0.116:17,1:9,1:29:245,231:60:48

↧

[ERROR] make_acnv_pon_config

September 11, 2017, 6:34 am

≫ Next: HaplotypeCaller warnings DepthPerSampleHC

≪ Previous: [ERROR] GATK4 Mutect2

Hi,

I want build a CNV PoN by "make_acnv_pon_config". But, I got a message as following. Any suggestions? Thanks a lot! I will really appreciate your help.

Failures:
message: Task make_acnv_pon_workflow.make_acnv_pon:NA:1 failed. JES error code 10. Message: 11: Docker run failed: command failed: /cromwell_root/exec.sh: line 12: syntax error near unexpected token echo' /cromwell_root/exec.sh: line 12: echo "writing $F to agg.pcovs"' . See logs at gs://fc-b9cc3b48-69c9-4013-8eba-ff7973fc7bc9/599d74a4-5a7e-4b33-823c-0e3ad1f87f5e/make_acnv_pon_workflow/e8fd4bdb-6e0b-4003-b413-cfea7a535ec2/call-make_acnv_pon/

JES log:
2017/09/11 02:00:15 E: command failed: /cromwell_root/exec.sh: line 12: syntax error near unexpected token echo' /cromwell_root/exec.sh: line 12: echo "writing $F to agg.pcovs"'
(exit status 2)

submission id: 599d74a4-5a7e-4b33-823c-0e3ad1f87f5e
link: https://portal.firecloud.org/#workspaces/nci-cbao-bi-org/reference_hg19_cbao/monitor/599d74a4-5a7e-4b33-823c-0e3ad1f87f5e/e8fd4bdb-6e0b-4003-b413-cfea7a535ec2

Best,
Chunyang

↧

HaplotypeCaller warnings DepthPerSampleHC

February 14, 2018, 2:36 am

≫ Next: -nct not present in GATK 4.0.0.0

≪ Previous: [ERROR] make_acnv_pon_config

Hi I'm trying to do a multisample variant call using several bam files in the following cmd

/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk HaplotypeCaller -R /mnt/fastdata/md1jale/reference/hs37d5.fa -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24150_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24144_2#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24712_6#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_2#1.bam -O /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/raw_variants.vcf

Using GATK jar /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -jar /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar HaplotypeCaller -R /mnt/fastdata/md1jale/reference/hs37d5.fa -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24150_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24144_2#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24712_6#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_2#1.bam -O /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/mshef7_wt_vs_ips_raw_variants.vcf
10:26:29.719 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
10:26:29.935 INFO HaplotypeCaller - ------------------------------------------------------------
10:26:29.935 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.1.0
10:26:29.935 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
10:26:29.935 INFO HaplotypeCaller - Executing as md1jale@sharc-node122.shef.ac.uk on Linux v3.10.0-693.11.6.el7.x86_64 amd64
10:26:29.936 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_102-b14
10:26:29.936 INFO HaplotypeCaller - Start Date/Time: 14 February 2018 10:26:29 GMT
10:26:29.936 INFO HaplotypeCaller - ------------------------------------------------------------
10:26:29.936 INFO HaplotypeCaller - ------------------------------------------------------------
10:26:29.936 INFO HaplotypeCaller - HTSJDK Version: 2.14.1
10:26:29.936 INFO HaplotypeCaller - Picard Version: 2.17.2
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:26:29.937 INFO HaplotypeCaller - Deflater: IntelDeflater
10:26:29.937 INFO HaplotypeCaller - Inflater: IntelInflater
10:26:29.937 INFO HaplotypeCaller - GCS max retries/reopens: 20
10:26:29.937 INFO HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
10:26:29.937 INFO HaplotypeCaller - Initializing engine
10:26:30.520 INFO HaplotypeCaller - Done initializing engine
10:26:30.528 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
10:26:31.119 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
10:26:31.154 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
10:26:31.259 WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
10:26:31.259 INFO IntelPairHmm - Available threads: 16
10:26:31.259 INFO IntelPairHmm - Requested threads: 4
10:26:31.259 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
10:26:31.298 INFO ProgressMeter - Starting traversal
10:26:31.298 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
10:26:33.832 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:33.865 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:33.880 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:33.911 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:34.733 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:41.497 INFO ProgressMeter - 1:15485 0.2 80 470.6

Despite having slight memory issues with running the above, the now command runs on providing large amount of memory, although i do get lots of WARN DepthPerSampleHC. Is this normal?

↧

-nct not present in GATK 4.0.0.0

January 27, 2018, 3:43 pm

≫ Next: no. of cores utilization in haplotypcaller in GVCF mode

≪ Previous: HaplotypeCaller warnings DepthPerSampleHC

Hi,

Is --native-pair-hmm-threads in GATK 4.0.0.0 same as -nct in older GATK versions?

↧

no. of cores utilization in haplotypcaller in GVCF mode

February 14, 2018, 4:23 am

≫ Next: Variant calls in mixed samples - HaplotypeCaller GVCF

≪ Previous: -nct not present in GATK 4.0.0.0

Hi,

I am running Haplotypecaller (v4.0.1.2) (not the spark version) on some WGS samples on a SGE (Sun grid Engine) cluster. When I am submitting a job to my cluster, I am asking for 1 core (on an 8 core processor having 1 thread each). I am aware that in native haplotypecaller, I cannot mention the number of cores it should utilize for parallelization and only use --native-pair-hmm-threads to make that step faster (whose default is 4).

Does Haplotypecaller utilize cores according to the availability? I mean if I am assigning 1 core to that job, will it still try to utilize other cores on that processor?

Kindly let me know if you need any more information for clarity.

↧

Variant calls in mixed samples - HaplotypeCaller GVCF

February 14, 2018, 5:59 am

≫ Next: unmapped BAM from Ion 16s Metagenomics Kit using MergeBamAlignment

≪ Previous: no. of cores utilization in haplotypcaller in GVCF mode

I have a question about the behavior of HaplotypeCaller when used incorrectly Specifically, how would it call variants from a diploid sequence if the ploidy was set to 1? Would it simply call the most abundant base call at a given position or would it do something else?

We have been investigating a number of bacterial isolates and it has recently come to our attention that some of the samples are co-infected with more than one strain. These will likely be discarded but I was wondering how the pipeline would behave in case there is a chance that the data for these samples can be salvaged.

↧

unmapped BAM from Ion 16s Metagenomics Kit using MergeBamAlignment

February 20, 2018, 10:36 pm

≫ Next: GATK4 Realign around indels

≪ Previous: Variant calls in mixed samples - HaplotypeCaller GVCF

I have sequenced 16s rRNA from waste water samples using Ion 16s Metagenomics kit. After paired-end sequencing using the machine Ion S5, I have received unmapped BAM files for all samples. Each uBAM contains both forward and reverse reads unaligned. I want to map those unmapped BAM files (uBam --> Bam) using the reference genome (16s gene) before I can convert them to fastq for analysis in QIIME 2.0. However, I am stuck at how to map the ubams? I am new to bioinformatics so any help will be appreciated! Do I have to sort uBAMs in some order before mapping them? I have installed picards but I am not sure which plugin should I sure to get fastq files which I want in the end for QIIME analysis.

↧

GATK4 Realign around indels

February 21, 2018, 12:13 am

≫ Next: VariantsToBinaryPed java.lang.ArrayIndexOutOfBoundsException: -1

≪ Previous: unmapped BAM from Ion 16s Metagenomics Kit using MergeBamAlignment

Hi,

Looking through GATK4 best practices for pre-processing fastq files and do not see a "realign around indels" ( in GATK3 : java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator ) . Should I use GATK3 RealignerTargetCreator in my analysis pipeline ? I guess no as it's not in GATK4 anymore ..

Thanks for the great job on GATK4 though !

↧

VariantsToBinaryPed java.lang.ArrayIndexOutOfBoundsException: -1

September 5, 2017, 5:02 pm

≫ Next: Need clarification on Picard CollectHsMetrics (ZERO_CVG_TARGETS_PCT)

≪ Previous: GATK4 Realign around indels

Hello, can you please help me sort out the following error in running VariantsToBinaryPed:

java -jar /sb/project/fkr-592-aa/data/GalWaRat/bin/third/gatk-3.7/GenomeAnalysisTK.jar -T VariantsToBinaryPed -R /sb/project/fkr-592-aa/genomes/CfloGapsClosed6/Cflo_3.3_gaps_closed6.fasta -V /sb/project/fkr-592-aa/Danzqianqi/Cflo/WGS/filteredSNPss.vcf -m sample_phenotypeinfo2.fam --minGenotypeQuality 0 --bed filteredSNPss.bed --bim filteredSNPss.bim --fam filteredSNPss.fam
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/gs/scratch/zqianqi
INFO 19:31:00,898 HelpFormatter - ----------------------------------------------------------------------------------
INFO 19:31:00,901 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO 19:31:00,902 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 19:31:00,902 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 19:31:00,902 HelpFormatter - [Tue Sep 05 19:31:00 EDT 2017] Executing on Linux 2.6.32-642.13.1.el6.x86_64 amd64
INFO 19:31:00,902 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02
INFO 19:31:00,906 HelpFormatter - Program Args: -T VariantsToBinaryPed -R /sb/project/fkr-592-aa/genomes/CfloGapsClosed6/Cflo_3.3_gaps_closed6.fasta -V /sb/project/fkr-592-aa/Danzqianqi/Cflo/WGS/filteredSNPss.vcf -m sample_phenotypeinfo2.fam --minGenotypeQuality 0 --bed filteredSNPss.bed --bim filteredSNPss.bim --fam filteredSNPss.fam
INFO 19:31:00,910 HelpFormatter - Executing as zqianqi@lg-1r17-n03 on Linux 2.6.32-642.13.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02.
INFO 19:31:00,911 HelpFormatter - Date/Time: 2017/09/05 19:31:00
INFO 19:31:00,911 HelpFormatter - ----------------------------------------------------------------------------------
INFO 19:31:00,911 HelpFormatter - ----------------------------------------------------------------------------------
INFO 19:31:00,922 GenomeAnalysisEngine - Strictness is SILENT
INFO 19:31:47,656 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 19:32:39,018 GenomeAnalysisEngine - Preparing for traversal
INFO 19:32:39,044 GenomeAnalysisEngine - Done preparing for traversal
INFO 19:32:39,045 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 19:32:39,045 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 19:32:39,046 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime

ERROR --

ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: -1
at htsjdk.variant.variantcontext.GenotypeLikelihoods.getGQLog10FromLikelihoods(GenotypeLikelihoods.java:220)
at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.checkGQIsGood(VariantsToBinaryPed.java:442)
at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.getStandardEncoding(VariantsToBinaryPed.java:406)
at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.getEncoding(VariantsToBinaryPed.java:398)
at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.writeIndividualMajor(VariantsToBinaryPed.java:282)
at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.map(VariantsToBinaryPed.java:267)
at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.map(VariantsToBinaryPed.java:103)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------

ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):

ERROR

ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.

ERROR If not, please post the error message, with stack trace, to the GATK forum.

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://software.broadinstitute.org/gatk

ERROR

ERROR MESSAGE: -1

ERROR ------------------------------------------------------------------------------------------

My .vcf file was made with HaplotypeCaller/GenotypeGVCFs/SelectVariants/VariantFiltration. I used ValidateVariants as well.

This is a snapshot of the .vcf file:

reference=file:///sb/project/fkr-592-aa/genomes/CfloGapsClosed6/Cflo_3.3_gaps_closed6.fasta

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1 12 13 15 2 9

1 30 . T C 36.19 PASS AC=1;AF=0.100;AN=10;BaseQRankSum=0.712;ClippingRankSum=0.00;DP=16;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.100;MQ=30.46;MQRankSum=1.98;QD=5.17;ReadPosRankSum=0.303;SOR=0.892 GT:AD:DP:GQ:PGT:PID:PL ./.:0,0:0:.:.:.:0,0,0 0/0:1,0:1:3:.:.:0,3,37 0/0:2,0:2:6:.:.:0,6,74 0/0:4,0:4:9:.:.:0,9,135 0/1:5,2:7:66:0|1:30_T_C:66,0,246 0/0:2,0:2:6:.:.:0,6,49
1 45 . A G 33.97 PASS AC=1;AF=0.100;AN=10;BaseQRankSum=1.09;ClippingRankSum=0.00;DP=23;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.100;MQ=30.65;MQRankSum=2.20;QD=3.77;ReadPosRankSum=0.765;SOR=1.179 GT:AD:DP:GQ:PGT:PID:PL ./.:0,0:0:.:.:.:0,0,0 0/0:1,0:1:3:.:.:0,3,37 0/0:5,0:5:15:.:.:0,15,157 0/0:6,0:6:1:.:.:0,1,155 0/1:7,2:9:63:0|1:30_T_C:63,0,288 0/0:2,0:2:6:.:.:0,6,49
1 53 . C CA 24.57 PASS AC=1;AF=0.083;AN=12;BaseQRankSum=1.09;ClippingRankSum=0.00;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.083;MQ=30.65;MQRankSum=2.20;QD=2.73;ReadPosRankSum=0.765;SOR=1.179 GT:AD:DP:GQ:PGT:PID:PL 0/0:1,0:1:3:.:.:0,3,37 0/0:1,0:1:3:.:.:0,3,37 0/0:5,0:5:15:.:.:0,15,157 0/0:6,0:6:1:.:.:0,1,169 0/1:7,2:9:63:0|1:30_T_C:63,0,288 0/0:2,0:2:6:.:.:0,6,49

My .fam file looks like this
Cflo 1 0 0 0 5047.16
Cflo 12 0 0 0 6249.9
Cflo 13 0 0 0 6007.21
Cflo 15 0 0 0 7123.6
Cflo 2 0 0 0 5581.36
Cflo 9 0 0 0 7462.87

Thank you! Please let me know if you require more information!

↧

Need clarification on Picard CollectHsMetrics (ZERO_CVG_TARGETS_PCT)

February 21, 2018, 1:05 am

≫ Next: Recommendation Interval length for genomicsdb import

≪ Previous: VariantsToBinaryPed java.lang.ArrayIndexOutOfBoundsException: -1

I used Picard CollectHsMetrics on whole exome data of 96 individuals. We used xGen® Exome Research Panel v1.0 exome capture kit.

1) In the output file ZERO_CVG_TARGETS_PCT ranges from 37-41. The definition of "ZERO_CVG_TARGETS_PCT" is "The fraction of targets that did not reach coverage=1 over any base". Does that mean that I do not have any coverage in approximately 40% of my intended target bases? Is that normal?

2) PCT_TARGET_BASES_20X ranges from 0.92-0.95. The definition of "PCT_TARGET_BASES_20X" is "The fraction of all target bases achieving 20X or greater coverage". How is it possible to have at least 20X coverage in 93% of all target bases while having zero coverage in 40% of targets?

I think I misinterpreted something here. I was wandering if someone could shed light on it. Thanks in advance.

↧

Recommendation Interval length for genomicsdb import

February 21, 2018, 3:10 am

≫ Next: VariantAnnotator is not annotating variants with InbreedingCoeff

≪ Previous: Need clarification on Picard CollectHsMetrics (ZERO_CVG_TARGETS_PCT)

Dear GATK Team

Is there any recommendations for the length of one interval specified in genomicsdb import?
I am running a genomicsdb import for about 50 bam samples (from cattles) for whole length of chromosome 20 (length:72042655) and it takes forever to run and after finished and joinly genotyped it also doesnot really work (progress meter does not run).

I use gatk3 on the same data and it runs without any problems. I inputted all sample gvcf and genotype gvcfs on interval of chromosome 20.

Regards

Any helps will be much appreciated.

↧

VariantAnnotator is not annotating variants with InbreedingCoeff

February 21, 2018, 3:19 am

≫ Next: Help with BaitDepthWalker analysis

≪ Previous: Recommendation Interval length for genomicsdb import

Hi,

I am using GATK VariantAnnotator to annotate my VCF with the InbreedingCoeff but when I check the output VCF I see that no variant was annotated with the InbreedingCoeff.

I've used a pedigree file in the .ped format and my VCF has more than 10 samples.

You can see here the command line I've used:

java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R ${ref.fa} -V ${input.vcf.gz} --annotation Coverage --annotation QualByDepth --annotation FisherStrand --annotation StrandOddsRatio --annotation RMSMappingQuality --annotation MappingQualityRankSumTest --annotation ReadPosRankSumTest --annotation InbreedingCoeff --annotation ChromosomeCounts -I bamfiles.list -L ${input.vcf.gz} -ped ${file.ped} |bgzip -c > ${out.vcf.gz}

And the first lines of the ${file.ped}:
$ head -n 10 ${file.ped}
HG00096 HG00096 0 0 1 0
HG00097 HG00097 0 0 2 0
HG00098 HG00098 0 0 1 0
HG00099 HG00099 0 0 2 0
HG00100 HG00100 0 0 2 0
HG00101 HG00101 0 0 1 0
HG00102 HG00102 0 0 2 0
HG00103 HG00103 0 0 1 0
HG00104 HG00104 0 0 2 0
HG00105 HG00105 0 0 1 0

Also, I've attached the VCF I have used as ${input.vcf.gz}

GATK version: version 3.7-0-gcfedb67

Best,

ernesto

↧