MuTect2 for amplicon did not call some variants

December 24, 2018, 11:21 pm

≫ Next: Goodbye note to the GATK community

≪ Previous: ERROR MESSAGE: Fasta dict file does not exist

Hi GATK team,

I am a beginner for using GATK. I performed the amplicon-based target sequencing and then I used the GATK4-MuTect2 to call variants. However, when we compared the variants from GATK4-MuTect2 with those from VariantCaller on Ion Torrent Sever, we found some inconsistencies. Therefore, I generated the bamout and found that some variants seem to be realigned and therefore they did not be called (see figure chr17:50196078).

In other case, the allele frequency (AF) is homozygous in both input bam file and bamout, but the allele frequency (AF) is heterozygous in the vcf which is shown below and in the figure chr17:50188065.

chr17 50188065 . A G . clustered_events DP=6046;ECNT=18;POP_AF=5.000e-08;P_CONTAM=0.00;P_GERMLINE=-2.846e+01;TLOD=1518.06 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:447,1238:0.544:1685:174,937:273,301:32,36:0,0:60:12:0.737,0.525,0.735:0.00,1.00,0

The following is my parametric, IGV and VCF data,
date;/share/app/GATK/gatk-4.0.9.0/gatk --java-options "-Xmx256g" Mutect2 -R grch38.p2_rmsk.fasta -I 18060114.bam -tumor 18060114 -O 8060114_Mutect2_tumor_maxaf1_mrra0.vcf.gz --max-population-af 1 --max-reads-per-alignment-start 0 --min-base-quality-score 0

Your advise is highly appreciated, and look forward to your reply.

Thank you!

Respectfully yours,
Ching-Yuan Wang

↧

Goodbye note to the GATK community

March 31, 2019, 11:44 pm

≫ Next: Found annotations with zero variance PROBLEM, Need some help

≪ Previous: MuTect2 for amplicon did not call some variants

The other day I decided to disassemble the bathroom doorknob. Efforts included chipping away layers of paint and recruiting some muscle to remove the screws the chipping had revealed. When I levered out the latch system and took it apart, I noticed two things. First, parts of it had a beautiful copper color. Second, the internal spring was broken into three parts. The latter explained the sticky latch you had to jiggle to stay closed and a door that had started to pop open randomly as if possessed.

I made the fix with a new spring.

It may be a compulsion to want to fix broken things. I think it stems from the same curiosity that makes you want to take things apart to understand how they work [1]. When I joined the GATK some 3.5 years ago as a technical writer, this compulsion surfaced and drove me to the effort that resulted in the pieces I wrote. Below, at the end of this post, is a sampling of my most viewed articles.

I would like to thank you for allowing me to serve you these past years. I have learned much in the process. The knowledge I have gained in genomics comes not only from these writing projects but also just as much from answering your questions on the forum. From holding to answering just one forum question a day, I am proud to have earned over 250 likes and points aplenty for the forum’s five-star ranking.

My last day with the DSP Communications Team is April 1, which is today [2]. Rest assured, my teammates and our wonderful methods developers will continue to take excellent care of you.

Looking back, 2018 was a busy year. Geraldine asked I help out at the July Cambridge UK workshop and also at the December Taiwan workshop [3]. Each workshop brings with it a torrent of activity creating and updating materials. It is always insightful and rewarding to interact firsthand with researchers, to hear about sticking points and to see reactions to the tutorials I develop and write.

Since returning from the December workshop, I have been submarined pouring effort into finalizing the gCNV tutorial in time for my departure. I hope you find it useful. This tutorial has been the most challenging to develop so far in that exploring the results involved more creative solutions than usual, as you will see in the tutorial’s companion Jupyter Notebook reports here and here [4].

Before I start searching for a new job, this month I will spend some time visiting friends and family and remembering my Ph.D. advisor at his memorial. If you would like to lend your support, I would love to have your endorsement on LinkedIn [5]. If you need to get in touch with me, please ping me on GitHub, in the broadinstitute/gatk repository. My handle is @sooheelee and I will be checking in intermittently.

It has been a privilege.

Yours truly,

Soo Hee

Footnotes

[1] This curiosity should not be surprising in someone who once walked the life of a Ph.D. biochemist. And it should be expected from someone whose folks include a plant pathologist (Dad studied in North Dakota) and a WWII pilot turned aeronautical engineer (Mr. Cummings served in the Army Air Corp; he is turning 95 this May and I will be seeing him for his birthday). Each of my families tells me I’m molded from the same clay as my fathers.

[2] No, this is not an April Fools' joke.

[3] There were two Taiwan workshops in 2018. The video footage of the December 2018 Taiwan workshop is not posted anywhere else, and so here is the link: https://drive.google.com/drive/folders/1-uMoz-ui5IteriKngee7Vic9AWAcnfcL.

[4] I have become a fan of pandas the software but also the animal.

[5] Connect with me, and, if you feel like it, please endorse my skills in Genomics.

A sampling of my most popular articles grouped by year and sorted by number of views

Year	Views	Article# and link	Title
2015	26.1K	6484	(How to) Generate an unmapped BAM from FASTQ or aligned BAM
	17.2K	6483	(How to) Map and clean up short read sequence data efficiently
2016	17.9K	6747	(How to) Mark duplicates with MarkDuplicates or MarkDuplicatesWithMateCigar
	8.2K	7857	Reference Genome Components
	7.8K	8017	(How to) Map reads to a reference with alternate contigs like GRCh38
	4.9K	7847	Changing workflows around calling SNPs and indels
	4.4K	7156	(howto) Perform local realignment around indels
	3.5K	7899	Reference implementation: PairedEndSingleSampleWf pipeline
	2.2K	6926	Spanning or overlapping deletions (* allele)
	2.0K	8180	9 Takeaways to help you get started with GRCh38
	1.9K	7859	(How to) Simulate reads using a reference genome ALT contig
	1.1K	7019	Sam flags down a boat
2017	22.7K	9143*	(How to) Call somatic copy number variants using GATK4 CNV
	2.9K	9183*	(How to) Call somatic SNVs and indels using MuTect2
	2.2K	10172	(How to) Run the GATK4 Docker locally and take a look inside
	1.7K	10911	Differences between GATK3 MuTect2 and GATK4 Mutect2
	1.1K	10060	(How to) Run FlagStatSpark on a cloud Spark cluster
2018	18.5K	11136	(How to) Call somatic mutations using GATK4 Mutect2
	3.0K	11682	(How to part I) Sensitively detect copy ratio alterations and allelic segments
	2.6K	11127	Somatic calling is NOT simply a difference between two callsets
	2.0K	11683	(How to part II) Sensitively detect copy ratio alterations and allelic segments
	938	12350	(How to) Filter on genotype using VariantFiltration
	740	11315	Off-label workflow to simply call differences in two samples
	~	23216	(How to) Filter variants either with VQSR or by hard-filtering
2019	~	11684	(How to) Call common and rare germline copy number variants
	~	11685	(Notebook) Concordance of NA19017 chr20 gCNV calls
	~	11686	(Notebook) Correlate gCNV callset metrics and annotations

*Uses older versions of tools that have been replaced.
~Published in the last three months.

↧

Found annotations with zero variance PROBLEM, Need some help

April 1, 2019, 1:09 am

≫ Next: asterisc in some lines of my vcf file

≪ Previous: Goodbye note to the GATK community

HI, i have some problem with GATK3 on variantRecalibrator

I try to run this one ,But i got an error

java -jar ../tool/gatk/GenomeAnalysisTK.jar -T VariantRecalibrator -R ../genome/chr20.fa -input family.raw.snps.indels.vcf -AS -resource:hapmap,known=false,training=true,truth=true,prior=15.0 ../RB/hapmap_3.3.hg38.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 ../RB/1000G_omni2.5.hg38.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 ../RB/1000G_phase1.snps.high_confidence.hg38.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ../RB/dbsnp_146.hg38.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP -recalFile output.AS.recal -tranchesFile output.AS.tranches -rscriptFile output.plots.AS.R

ERROR

ERROR MESSAGE: Bad input: Found annotations with zero variance. They must be excluded before proceeding.

ERROR ------------------------------------------------------------------------------------------

more information

VariantDataManager - QD: mean = 12.99 standard deviation = 4.76
VariantDataManager - MQ: mean = 28.06 standard deviation = 17.02
VariantDataManager - MQRankSum: mean = -0.67 standard deviation = 0.00
VariantDataManager - ReadPosRankSum: mean = -0.67 standard deviation = 0.00
VariantDataManager - FS: mean = 0.00 standard deviation = 0.01
VariantDataManager - SOR: mean = 1.82 standard deviation = 0.60

Any one can help me?

↧

asterisc in some lines of my vcf file

April 1, 2019, 6:17 am

≫ Next: How to merge the sample_X_genotyped_intervals.vcf files created by PostprocessGermlineCNVCalls?

≪ Previous: Found annotations with zero variance PROBLEM, Need some help

Dear all,

I ran the haplotype caller, in order to find germline variants in my samples (808 samples). But in the ALT column I found "*" in some lines, and I dont know what does it mean.... (I follow the best practices, run gatk, use dbimport to merge the samples and finally I did the VQSR steps. For SNPs I did the genotype posteriors). Here an example:

1 10119 . CT *,C 39.21 . AC=1,2;AF=6.369e-04,1.274e-03;AN=1570;AS_FilterStatus=NA,NA;AS_VQSLOD=NaN,NaN;AS_culprit=NA,NA;BaseQRankSum=0.00;ClippingRankSum=0.00;DP=104112;ExcessHet=3.0186;FS=1.725;InbreedingCoeff=-0.0013;MLEAC=1,1;MLEAF=6.378e-04,6.378e-04;MQ=17.57;MQRankSum=-3.250e-01;PG=0,0,0,0,0,0;QD=1.87;ReadPosRankSum=0.482;SOR=1.127 GT:AD:DP:GQ:PGT:PID:PL:PP 0/0:130,0,0:130:81:.:.:0,81,1215,81,1215,1215:0,81,1215,81,1215,1215 0/1:6,3,0:9:52:.:.:52,0,159,71,168,239:52,0,159,71,168,239 0/0:132,0,0:132:81:.:.:0,81,1215,81,1215,1215:0,81,1215,81,1215,1215

1 66240 . T *,A 16289.75 VQSRTrancheSNP99.90to100.00 AC=80,2;AF=0.051,1.274e-03;AN=1570;AS_FilterStatus=NA,VQSRTrancheSNP99.90to100.00;AS_VQSLOD=NaN,-4.4852;AS_culprit=NA,DP;BaseQRankSum=1.04;ClippingRankSum=0.00;DP=7862;ExcessHet=-0.0000;FS=1.321;InbreedingCoeff=0.3618;MLEAC=123,3;MLEAF=0.107,2.604e-03;MQ=6.93;MQRankSum=-1.151e+00;PG=0,6,18,22,32,49;QD=20.04;ReadPosRankSum=-4.250e-01;SOR=0.841 GT:AD:DP:GQ:PGT:PID:PL:PP 0/0:4,0,0:4:18:.:.:0,12,181,12,181,181:0,18,199,34,213,230 0/0:18,0,0:18:42:.:.:0,36,540,36,540,540:0,42,558,58,572,589

1 66390 . T *,A 7475.63 VQSRTrancheSNP99.00to99.90 AC=36,3;AF=0.023,1.911e-03;AN=1570;AS_FilterStatus=NA,VQSRTrancheSNP99.00to99.90;AS_VQSLOD=NaN,2.9789;AS_culprit=NA,DP;BaseQRankSum=0.180;ClippingRankSum=0.00;DP=10958;ExcessHet=0.0000;FS=7.105;InbreedingCoeff=0.1627;MLEAC=39,3;MLEAF=0.025,1.961e-03;MQ=23.91;MQRankSum=0.00;PG=0,13,32,24,40,53;QD=18.37;ReadPosRankSum=0.119;SOR=0.474 GT:AD:DP:GQ:PGT:PID:PL:PP 0/1:1,9,0:10:34:0|1:66349_TA_T:375,0,15,378,42,420:362,0,34,389,69,460 0/0:12,0,0:12:43:.:.:0,30,450,30,450,450:0,43,482,54,490,503 0/0:1,0,0:1:16:.:.:0,3,27,3,27,27:0,16,59,27,67,80

The vast majority dont accomplish the trances, but why appear *?? this is an example but i have 808 samples.

THanks for your time

Jordi

↧

How to merge the sample_X_genotyped_intervals.vcf files created by PostprocessGermlineCNVCalls?

March 18, 2019, 6:15 am

≫ Next: Case Study: 120 base deletion

≪ Previous: asterisc in some lines of my vcf file

How to merge the sample_X_genotyped_intervals.vcf files created by PostprocessGermlineCNVCalls to a multi-sample VCF file?

The files all have the same bins/records, so it should be easy to created a multi-sample VCF of these files.

I normally use bcftools to merge vcf files. bcftools merge gives the following error when trying to merge the (bgzipped, tabix indexed) sample_X_genotyped_intervals.vcf files created by PostprocessGermlineCNVCalls

Incorrect number of FORMAT/CNLP values at Chr_01:1001, cannot merge. The tag is defined as Number=A, but found
6 values and 3 alleles. See also http://samtools.github.io/bcftools/howtos/FAQ.html#incorrect-nfields

Can you check if the FORMAT declaration of CNLP is correct.

And advise on if there is a tool in GATK to merge single sample vcf files (created by PostprocessGermlineCNVCalls) to a multi-sample VCF file.

For time being I wrote my own python text parsing script to create the multi-sample VCF file.
But this seems like something that should be possible with GATK or bcftools.

Thank you.

↧

Case Study: 120 base deletion

April 1, 2019, 6:50 am

≫ Next: Hard-filtering variants from gene panel sequencing (GPS)

≪ Previous: How to merge the sample_X_genotyped_intervals.vcf files created by PostprocessGermlineCNVCalls?

how to understand this, I am totally puzzled, thanks a lot

↧

Hard-filtering variants from gene panel sequencing (GPS)

April 1, 2019, 7:50 am

≫ Next: GATK4-mutect2 how to or should I use a newer gnomad r2.1 as germline-resource

≪ Previous: Case Study: 120 base deletion

Hi, I'd like to use those guidance for hard-filtering of my GPS variants.

could anyone tell me whether those hard-filters are based on the --newQual in GenotypeGVCFs or not?

Thanks!

↧

GATK4-mutect2 how to or should I use a newer gnomad r2.1 as germline-resource

March 11, 2019, 12:39 am

≫ Next: GATK 4.1.0.0 Mutect2 error with gnomAD AF file

≪ Previous: Hard-filtering variants from gene panel sequencing (GPS)

Dear GATK Team,

We have a program to detect somatic mutations of tumor-vs-normal samples. Although we had read the
mutect2 guide——best practice for mutect2(gatk post#11136), but there is no idea for me to go on.

Gnomad had release a newer version r2.1, but the gatk bundle holds an old version——especially the b37 shows the year 2017.

Now we don't know if should use the r2.1 as a germline-resource, because there're more allele frequencies in the newer version.

If we want to use the newer gnomad as a resource, what should we do to make a 'af-only-gnomad_hg19.vcf' (you see, we used the hg19 but not grch38). Apple is too big to bite, the resource gnomad is the same. By the way, we wish detect the whole genome mutations.
Wish your advice.
Thank you.

↧

GATK 4.1.0.0 Mutect2 error with gnomAD AF file

March 7, 2019, 12:56 pm

≫ Next: Oncotator for build hg38

≪ Previous: GATK4-mutect2 how to or should I use a newer gnomad r2.1 as germline-resource

I'm running into an error when running GATK 4.1.0.0 with the following call:

java -Xmx16g -jar ${gatkDir}/GATK.jar Mutect2 -R ${GRC}.fa -I ${TU}.recal.bam -tumor TU -I ${NM}.recal.bam -normal NM --native-pair-hmm-threads $threads --germline-resource $gnomad --af-of-alleles-not-in-resource 0.0000025 -O ${sampleName}.mutect.UF.vcf --tmp-dir temp

The error is as follows:

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at org.broadinstitute.hellbender.tools.walkers.mutect.GermlineProbabilityCalculator.lambda$getGermlineAltAlleleFrequencies$3(GermlineProbabilityCalculator.java:55)
at java.util.stream.ReferencePipeline$6$1.accept(ReferencePipeline.java:244)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545)
at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
at java.util.stream.DoublePipeline.toArray(DoublePipeline.java:506)
at org.broadinstitute.hellbender.tools.walkers.mutect.GermlineProbabilityCalculator.getGermlineAltAlleleFrequencies(GermlineProbabilityCalculator.java:57)
at org.broadinstitute.hellbender.tools.walkers.mutect.GermlineProbabilityCalculator.getNegativeLog10PopulationAFAnnotation(GermlineProbabilityCalculator.java:29)
at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.callMutations(SomaticGenotypingEngine.java:165)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:233)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:232)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:291)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:267)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)

I have seen errors like this listed before on the forums relating to the AF file. I removed the file, and it was able to successfully run. The AF file is af-only-gnomad.filtered.hg38.vcf.gz

However, the above function call with the AF file runs correctly on GATK 4.0.10.1 with no errors and completes successfully.

The AF file is formatted as follows:

#CHROM POS ID REF ALT QUAL FILTER INFO
1 10067 . T TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC 30.35 PASS .
1 10108 . CAACCCT C 46514.3 PASS .
1 10109 . AACCCT A 89837.3 PASS .
1 10114 . TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTA CAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTA,T 36729 PASS .
1 10119 . CT C 251.23 PASS .
1 10120 . T C 14928.7 PASS .
1 10128 . ACCCTAACCCTAACCCTAAC A 285.71 PASS .
1 10131 . CT C 378.93 PASS .
...

Any thoughts as to what the error could be?

Thank you!

↧

Oncotator for build hg38

April 1, 2019, 10:30 am

≫ Next: Is it necessary to sort the bam files by coordinate before executing MarkDuplicatesSpark？

≪ Previous: GATK 4.1.0.0 Mutect2 error with gnomAD AF file

The current version of Oncotator on the Broad servers is v1.9.0.0 and indicates that only hg19 is supported, are there plans to extend the tool to hg38? If not are there other recommended tools to convert VCF to MAF?

↧

Is it necessary to sort the bam files by coordinate before executing MarkDuplicatesSpark？

April 1, 2019, 8:21 pm

≫ Next: Picard RevertSam java.nio.file.NoSuchFileException

≪ Previous: Oncotator for build hg38

Is it necessary to sort the bam files by coordinate before executing MarkDuplicatesSpark？

↧

Picard RevertSam java.nio.file.NoSuchFileException

November 18, 2017, 1:43 pm

≫ Next: CollectF1R2Counts and LearnReadOrientationModel vs CollectSequencingArtifactMetrics

≪ Previous: Is it necessary to sort the bam files by coordinate before executing MarkDuplicatesSpark？

Hi,

I'm starting to process a set of bams following the best practices and beginning from bams that were processed by someone else. Thus, I'm attempting to generate unmapped BAMs following this post, and using the latest version of Picard (2.15.0). Unfortunately, Picard gives an exception that shows it is unable to find temporary files it is writing. I know there's space for these files and in fact, I now have version 1.141 of Picard running without issue. The output from version 2.15.0 is below.

15:34:14.012 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/REDACTED/bin/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Sat Nov 18 15:34:14 CST 2017] RevertSam INPUT=/REDACTED.bam OUTPUT=/REDACTED/808302_LP6008048-DNA_B02.bam SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=true REMOVE_DUPLICATE_INFORMATION=true REMOVE_ALIGNMENT_INFORMATION=true ATTRIBUTE_TO_CLEAR=[NM, UQ, PG, MD, MQ, SA, MC, AS, XT, XN, AS, OC, OP] SANITIZE=true MAX_DISCARD_FRACTION=0.005 TMP_DIR=[/REDACTED/tmp] VALIDATION_STRINGENCY=LENIENT OUTPUT_BY_READGROUP=false OUTPUT_BY_READGROUP_FILE_FORMAT=dynamic VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sat Nov 18 15:34:14 CST 2017] Executing as awilliams@REDACTED on Linux 3.10.0-229.7.2.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12; Deflater: Intel; Inflater: Intel; Picard version: 2.15.0-SNAPSHOT
[Sat Nov 18 15:34:30 CST 2017] picard.sam.RevertSam done. Elapsed time: 0.27 minutes.
Runtime.totalMemory()=1272971264
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: java.nio.file.NoSuchFileException: /REDACTED/tmp/awilliams/sortingcollection.728972638772980431.tmp
at htsjdk.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:246)
at htsjdk.samtools.util.SortingCollection.add(SortingCollection.java:166)
at picard.sam.RevertSam$RevertSamSorter.add(RevertSam.java:637)
at picard.sam.RevertSam.doWork(RevertSam.java:260)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
Caused by: java.nio.file.NoSuchFileException: /REDACTED/tmp/awilliams/sortingcollection.728972638772980431.tmp
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.createFile(Files.java:632)
at java.nio.file.TempFileHelper.create(TempFileHelper.java:138)
at java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:161)
at java.nio.file.Files.createTempFile(Files.java:852)
at htsjdk.samtools.util.IOUtil.newTempPath(IOUtil.java:316)
at htsjdk.samtools.util.SortingCollection.newTempFile(SortingCollection.java:255)
at htsjdk.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:220)
... 6 more

↧

CollectF1R2Counts and LearnReadOrientationModel vs CollectSequencingArtifactMetrics

April 2, 2019, 2:57 am

≫ Next: Randomness in .vcf.idx output of GATK IndexFeatureFile

≪ Previous: Picard RevertSam java.nio.file.NoSuchFileException

@davidben
I can not post links

we have not seen the doc of 4.1.1.0, but 4.1.0.0 still has the --af-of-alleles-not-in-resource
in org_broadinstitute_hellbender_tools_walkers_mutect_Mutect2.php,

also I want to ask whether --af-of-alleles-not-in-resource is needed in Tumor-only mode， should the command write like this

gatk Mutect2 \
-R reference.fa \
-I tumor.bam \
-tumor tumor_sample_name \
--germline-resource af-only-gnomad.vcf.gz \
--af-of-alleles-not-in-resource 0.00003125 \
--panel-of-normals pon.vcf.gz \
-O somatic.vcf.gz

in the how-to-call-somatic-mutations-using-gatk4-mutect2_p1,
all the things change whether the argument --disable-read-filter should be set and the value of --af-of-alleles-not-in-resource will change in different version of reference

gatk --java-options "-Xmx2g" Mutect2 \
-R hg38/Homo_sapiens_assembly38.fasta \
-I tumor.bam \
-I normal.bam \
-tumor HCC1143_tumor \
-normal HCC1143_normal \
-pon resources/chr17_pon.vcf.gz \
--germline-resource resources/chr17_af-only-gnomad_grch38.vcf.gz \
--af-of-alleles-not-in-resource 0.0000025 \
--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter \
-L chr17plus.interval_list \
-O 1_somatic_m2.vcf.gz \
-bamout 2_tumor_normal_m2.bam

also I think there is some contradiction

another very important question is in Tumor-only mode , whether CollectSequencingArtifactMetrics is needed for tumor bam?

also in paired mode, whether CollectSequencingArtifactMetrics is needed for tumor bam? I think it is just needed for normal bam

and 4.1.1.0 has CollectF1R2Counts and LearnReadOrientationModel, but no clear documentation, can these new tools totally replace CollectSequencingArtifactMetrics
thanks a lot,

↧

Randomness in .vcf.idx output of GATK IndexFeatureFile

April 2, 2019, 10:52 am

≫ Next: Germline copy number variant discovery (CNVs)

≪ Previous: CollectF1R2Counts and LearnReadOrientationModel vs CollectSequencingArtifactMetrics

When I run GATK IndexFeatureFile for the same .vcf from two different locations I get two different .idx files.
Why this randomness exists and is it necessary? I suppose it adds .vcf path or command line to it. Is it possible somehow to eliminate the randomness?

↧

Germline copy number variant discovery (CNVs)

January 7, 2018, 1:08 am

≫ Next: UmiAwareMarkDuplicatesWithMateCigar "does not contain a UMI with the RX attribute"

≪ Previous: Randomness in .vcf.idx output of GATK IndexFeatureFile

Purpose

Identify germline copy number variants.

Diagram is not available

Reference implementation is not available

This workflow is in development; detailed documentation will be made available when the workflow is considered fully released.

↧

UmiAwareMarkDuplicatesWithMateCigar "does not contain a UMI with the RX attribute"

April 2, 2019, 9:54 pm

≫ Next: define java to use in GATK4?

≪ Previous: Germline copy number variant discovery (CNVs)

Hello, I'm using Picard's mark duplicate for my sample based on Qiagen myeloid panel (Amplicon based, single end primer extension, paired-end reads with UMI). However, I am faced with the following error. Do I need any information about the proprietary UMI to be able to use the mark duplicate function? Or is it due to some other problem? Thank you very much!

INFO 2019-03-30 10:55:14 UmiAwareMarkDuplicatesWithMateCigar

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
**********
********** The command line looks like this in the new syntax:
**********
********** UmiAwareMarkDuplicatesWithMateCigar -I 09H787BM.aligned.sorted.bam -O 09H787BM.aligned.md.bam -M output_duplicate_metrics.txt -UMI_METRICS output_umi_metrics.txt
**********

10:55:14.729 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/operation/RedCellNGS/tools/picard/picard/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Sat Mar 30 10:55:14 HKT 2019] UmiAwareMarkDuplicatesWithMateCigar UMI_METRICS_FILE=output_umi_metrics.txt INPUT=[09H787BM.aligned.sorted.bam] OUTPUT=09H787BM.aligned.md.bam METRICS_FILE=output_duplicate_metrics.txt MAX_EDIT_DISTANCE_TO_JOIN=1 UMI_TAG_NAME=RX ALLOW_MISSING_UMIS=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=UmiAwareMarkDuplicatesWithMateCigar READ_NAME_REGEX= OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sat Mar 30 10:55:14 HKT 2019] Executing as nelson@Ubuntu-testing on Linux 4.15.0-46-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_192-b01; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.26-SNAPSHOT
INFO 2019-03-30 10:55:33 SortingCollection Creating merging iterator from 4 files
[Sat Mar 30 10:55:33 HKT 2019] picard.sam.markduplicates.UmiAwareMarkDuplicatesWithMateCigar done. Elapsed time: 0.32 minutes.
Runtime.totalMemory()=5966921728
Exception in thread "main" picard.PicardException: Read M01772:224:000000000-CC9BK:1:1108:22328:6034 does not contain a UMI with the RX attribute.
at picard.sam.markduplicates.UmiGraph.(UmiGraph.java:85)
at picard.sam.markduplicates.UmiAwareDuplicateSetIterator.process(UmiAwareDuplicateSetIterator.java:137)
at picard.sam.markduplicates.UmiAwareDuplicateSetIterator.next(UmiAwareDuplicateSetIterator.java:119)
at picard.sam.markduplicates.UmiAwareDuplicateSetIterator.next(UmiAwareDuplicateSetIterator.java:53)
at picard.sam.markduplicates.SimpleMarkDuplicatesWithMateCigar.doWork(SimpleMarkDuplicatesWithMateCigar.java:133)
at picard.sam.markduplicates.UmiAwareMarkDuplicatesWithMateCigar.doWork(UmiAwareMarkDuplicatesWithMateCigar.java:138)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

↧

define java to use in GATK4?

January 18, 2018, 7:11 am

≫ Next: SelectVariants produce empty files

≪ Previous: UmiAwareMarkDuplicatesWithMateCigar "does not contain a UMI with the RX attribute"

Hi,

today I tried to run GATK4. But I ran into an issue. Just calling "gatk" looks fine, but when running "gatk --list" produces the following output.

Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -jar /dsk/data1/ngs/bin/GATK/4.0.0.0/gatk-package-4.0.0.0-local.jar --help
Error: Invalid or corrupt jarfile /dsk/data1/ngs/bin/GATK/4.0.0.0/gatk-package-4.0.0.0-local.jar

Since now I have the command how GATK actually is started I replaced "java" with "java8" and everything works well.

So my question is, if there is an option, or config or anything, where I can define the path to my java8? I can't just switch to java8 as my main java since I'm working on a cluster, and the admins don't want it.

Thanks in advance,

Anselm Hoppmann

↧

SelectVariants produce empty files

April 3, 2019, 12:12 am

≫ Next: Picard/GATK MergeVcfs throws errors

≪ Previous: define java to use in GATK4?

I have 8 samples of genome sequencing data with a different condition. The question is to identify variants for each sample. I followed best practice GATK for variant calling (https://software.broadinstitute.org/gatk/best-practices/workflow?id=11145).
For variant calling i used different combinations:

HaplotypeCaller -> GenotypeCaller -> SelectVariants
GenotypeCaller -> HaplotypeCaller -> SelectVariants
GenotypeCaller -> HaplotypeCaller -> SamSort -> SelectVariants
GenotypeCaller -> HaplotypeCaller -> SamSort -> SelectVariants(Discovery option)
GATK 3.4 and GATK 3.8
HaplotypeCaller -> GenotypeCaller -> VCFTools

There are no error messages, It looks like SelectVariants goes through the whole file but produce empty output.

If they produce limited data I get from 300 GB (VCF file from HaplotypeCaller) to 2 GB (VCF file from SelectVariants). In this case, one sample gets limited counts of SNVs, which is a problem in downstream analysis.

I am unsure if there is some parameter that should be included for the genome data.

↧

Picard/GATK MergeVcfs throws errors

February 9, 2018, 4:27 am

≫ Next: Does Mutect2 have a multiple-thread setting?

≪ Previous: SelectVariants produce empty files

Dear all,
I am following your guidelines for germline SNP detection in GATK 4. Nevertheless, I cannot complete the concatenation of region-wise gvcfs.
Using GATK MergeVcfs I get the following error:
/package/sequencer/java/8/bin/java -jar -XX:+UseSerialGC -verbose:GC -Xmx8g -Djava.io.tmpdir=/scratch/cluster/seqcore/temp/smith/package/sequencer/gatk/current/gatk-package-4.0.1.1-local.jar MergeVcfs --INPUT ./03_GATK/core_L11935-2_Mystique.chrEBV.gvcf --INPUT ./03_GATK/core_L11935-2_Mystique.chrUn_KI270742v1.gvcf --OUTPUT ./03_GATK/core_L11935-2_Mystique.gvcf

[Fri Feb 09 13:20:55 CET 2018] MergeVcfs --INPUT ./03_GATK/core_L11935-2_Mystique.chrEBV.gvcf --INPUT ./03_GATK/core_L11935-2_Mystique.chrUn_KI270742v1.gvcf --OUTPUT ./03_GATK/core_L11935-2_Mystique.gvcf --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 1 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX true --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Fri Feb 09 13:20:55 CET 2018] Executing as smith@bromhidrosophobie.molgen.mpg.de on Linux 4.14.17.mx64.205 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b17; Deflater: Intel; Inflater: Intel; Picard version: Version:4.0.1.1

java.lang.IllegalArgumentException: Illegal character in fragment at index 1: ##fileformat=VCFv4.2
at java.net.URI.create(URI.java:852)
at htsjdk.samtools.util.IOUtil.getPath(IOUtil.java:1134)
at htsjdk.samtools.util.IOUtil.lambda$unrollPaths$2(IOUtil.java:1088)
at htsjdk.samtools.util.IOUtil$$Lambda$29/1967434886.accept(Unknown Source)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at htsjdk.samtools.util.IOUtil.unrollPaths(IOUtil.java:1085)
at htsjdk.samtools.util.IOUtil.unrollFiles(IOUtil.java:1050)
at picard.vcf.MergeVcfs.doWork(MergeVcfs.java:164)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:24)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
at org.broadinstitute.hellbender.Main.main(Main.java:277)
Caused by: java.net.URISyntaxException: Illegal character in fragment at index 1: ##fileformat=VCFv4.2
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parse(URI.java:3067)
at java.net.URI.(URI.java:588)
at java.net.URI.create(URI.java:850)

Applying the picard commands I get the following:
/package/sequencer/java/8/bin/java -jar -XX:+UseSerialGC -verbose:GC -Xmx8g -Djava.io.tmpdir=/scratch/cluster/seqcore/temp/smith/package/sequencer/picard-tools/current/picard.jar MergeVcfs INPUT=./03_GATK/core_L11935-2_Mystique.chrEBV.gvcf INPUT=./03_GATK/core_L11935-2_Mystique.chrUn_KI270742v1.gvcf OUTPUT= ./03_GATK/core_L11935-2_Mystique.gvcf

13:24:51.701 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/package/sequencer/picard-tools/2.12.1/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Feb 09 13:24:51 CET 2018] MergeVcfs INPUT=[./03_GATK/core_L11935-2_Mystique.chrEBV.gvcf, ./03_GATK/core_L11935-2_Mystique.chrUn_KI270742v1.gvcf] OUTPUT=./03_GATK/core_L11935-2_Mystique.gvcf VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=true CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

Exception in thread "main" htsjdk.samtools.SAMException: Cannot read non-existent file: /project/seqcore-cluster/data/superhero/chrUn_KI270742v1 186727 . C .. END=186739 GT:DP:GQ:MIN_DP:PL 0/0:9:0:4:0,0,0
at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:347)
at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:334)
at htsjdk.samtools.util.IOUtil.unrollFiles(IOUtil.java:948)
at picard.vcf.MergeVcfs.doWork(MergeVcfs.java:98)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)

I appreciate any help on this issue.
Best
Stefan

↧

Does Mutect2 have a multiple-thread setting?

April 3, 2019, 6:39 am

≫ Next: Does anyone know how long the GenomeSTRiP page is going to be down?

≪ Previous: Picard/GATK MergeVcfs throws errors

Hi there,
I am running Mutect2 on my computer cluster? The walltime setting for my cluster is 200 hours. The data is relatively large. Mutect2 is running too slow, even exceed 200 hours...I was wondering if Mutect2 can use multiple threads.
Lei

↧