Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all 12345 articles
Browse latest View live

two adjacent mutations were called with combine phase info but sometimes not by mutect2 3.7-0-gcfedb

$
0
0

We sequenced one sample twice and saw the same two
adjacent mutations in the IGV, see the igv screenshot attached. However, we got two mutations called with combine phase info (in black bold part) in sample1, but not in another sample, see below,

Sample1:
12 25398285 rs121913530 C A . clustered_events;homologous_mapping_event DB;ECNT=3;HCNT=25;MAX_ED=3;MIN_ED=1;TLOD=1258.13 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:527,337:0.391:0:0:.:0|1:25398285_C_A:17691,11388:0:0
2 25398286 rs397517039 A G . clustered_events;homologous_mapping_event DB;ECNT=3;HCNT=25;MAX_ED=3;MIN_ED=1;TLOD=1258.13 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:519,336:0.391:0:0:.:0|1:25398285_C_A:17063,11301:0:0

Sample2:
12 25398285 rs121913530 C A . clustered_events;homologous_mapping_event DB;ECNT=4;HCNT=36;MAX_ED=57;MIN_ED=50;TLOD=1208.99 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:520,323:0.387:0:0:.:17541,11087:0:0
12 25398286 rs397517039 A G . clustered_events;homologous_mapping_event DB;ECNT=4;HCNT=25;MAX_ED=57;MIN_ED=50;TLOD=1197.33 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/1:512,324:0.384:0:0:.:17162,10799:0:0

I am wondering why we don't have combine phase information in Sample2. Thanks a lot.


gatk version 4

$
0
0

Is there a equivalent command in gatk version 4 for --nonDeterministicRandomSeed in gatk SelectVariants?
thank you!

GATK4 joint Genotyping for an exome pipeline: CombineGVCFs or GenomicsDBImport ?

$
0
0

Hello,

I want to use 386 exomes as a normalization group for joint genotyping for an exome diagnostic pipeline. Usually it was done with a “giant combined gvcf” splitted per chromosome but I wanted to give GenomicsDBImport a try.

So I did and I’m quite disappointed. I think I’m might doing something wrong or maybe GenomicsDBImport is not yet suited yet for my purpose. So I have some questions.

The building of a GenomicsDBImport is longer than a traditional CombineGVCFs per chromosome. It wouldn’t be a problem if I could build it “forever” and then give the database plus the patient samples .gvcfs to process to GenotypeGVCFs or add new samples to the database. Do you plan adding this feature?

Because you can’t add a new simple in an already built GenomicsDB, I should rebuilt it with the new samples at every single pipeline execution. So I don’t see why use this GenomicsDB or perhaps should I use the Intel library? It seems to add an unwanted supplementary level of complexity which I don’t know if it is worth it or not.

Am I missing something?

Thank you.

Best practice for multi-sample non-human Indel realignment

$
0
0

Hi,
I have 8 samples of a non-human vertebrate that I want to put through the GATK pipeline. I will have further samples to run in the future.
I'm wondering what the best practice to do Indel realignment is, saying that I have multiple samples.
I'm considering the following three alternatives.
1. Just run GATK indel realignments on individual alignments:

java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fasta -I Sample1.bam -o Sample1Indels.intervals
java -jar GenomeAnalysisTK.jar -T IndelRealigner -R reference.fasta -I Sample1.bam -targetIntervals Sample1Indels.intervals -o realignedSample1.bam
java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fasta -I Sample2.bam -o Sample2Indels.intervals
java -jar GenomeAnalysisTK.jar -T IndelRealigner -R reference.fasta -I Sample2.bam -targetIntervals Sample2Indels.intervals -o realignedSample2.bam

etc, etc,
2. Run GATK RealignerTargetCreator using all samples in one run and then apply intervals to each sample:

java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fasta -I Sample1.bam -I Sample2.bam -I Sample3.bam -I Sample4.bam -I Sample5.bam -I Sample6.bam -I Sample7.bam -I Sample8.bam -o allSampleIndels.intervals
java -jar GenomeAnalysisTK.jar -T IndelRealigner -R reference.fasta -I Sample1.bam -targetIntervals allSampleIndels.intervals -o Sample1_indel_realigned.bam
java -jar GenomeAnalysisTK.jar -T IndelRealigner -R reference.fasta -I Sample2.bam -targetIntervals allSampleIndels.intervals -o Sample2_indel_realigned.bam

etc, etc.
3. Run RealignerTargetCreator on all samples individually and then use Picard-tools IntervalListTools to create an intersection of all the intervals to identify indels present in all samples (requires some pre-processing to get into the required format for picard-tools).

java -jar picard.jar IntervalListTools INPUT=Sample1.intervals INPUT=Sample2.intervals  INPUT=Sample3.intervals  INPUT=Sample4.intervals  INPUT=Sample5.intervals  INPUT=Sample6.intervals  INPUT=Sample7.intervals  INPUT=Sample8.intervals  OUTPUT=all_ntersect.intervals ACTION=INTERSECT
java -jar GenomeAnalysisTK.jar -T IndelRealigner -R reference.fasta -I Sample1.bam -targetIntervals all_intersect.intervals -o realignedSample1.bam
etc, etc,

Would someone be so kind as to comment on the different methods and which would be the best?
Many thanks,
Graham

Possible bug in SelectVariants tool

$
0
0

Dear GATK experts,

I have done variant calling on 384 potato samples following, mostly, GATK best practices and have applied hard filters to select SNPs for further usage. However, I am noticing that '--max-nocall-fraction', '--max-nocall-number' and '--max-fraction-filtered-genotypes' arguments for 'SelectVariants' are not working properly. I have tried with various cutoff settings and every time I am observing SNPs with a much larger number of genotypes (~246 out of 384) with 'no call' than the set thresholds. I have searched the forum first but couldn't find any relevant threads. I am using the latest GATK version (4.0.7.0). I am attaching three example sets of (1) log files (2) subset vcf files and (3) vcf index file for the three main vcfs. I would appreciate if you could provide any feedback on this issue and/or if this behaviour has been observed by some other users also.

Regards,
Sanjeev

Several Annotations not working in GATK Haplotype Caller

$
0
0

I am using Genotype Given Allele with Haplotype Caller
I am trying to explicitely request all annotations that the documentation says are compatible with the Haplotype caller (and that make sense for a single sample .. e.g. no hardy weinberg ..)

the following annotations all have "NA"
GCContent(GC) HomopolymerRun(Hrun) TandemRepeatAnnotator (STR RU RPA)
.. but are valid requests because I get no errors from GATK.

This is the command I ran (all on one line)

java -Xmx40g -jar /data5/bsi/bictools/alignment/gatk/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller --input_file /data2/external_data/[...]/s115343.beauty/Paired_analysis/secondary/Paired_10192014/IGV_BAM/pair_EX167687/s_EX167687_DNA_Blood.igv-sorted.bam --alleles:vcf /data2/external_data/[...]m026645/s109575.ez/Sequencing_2016/OMNI.vcf --phone_home NO_ET --gatk_key /projects/bsi/bictools/apps/alignment/GenomeAnalysisTK/3.1-1/Hossain.Asif_mayo.edu.key --reference_sequence /data2/bsi/reference/sequence/human/ncbi/hg19/allchr.fa --minReadsPerAlignmentStart 1 --disableOptimizations --dontTrimActiveRegions --forceActive --out /data2/external_data/[...]m026645/s109575.ez/Sequencing_2016/EX167687.0.0375.chr22.vcf --logging_level INFO -L chr22 --downsample_to_fraction 0.0375 --downsampling_type BY_SAMPLE --genotyping_mode GENOTYPE_GIVEN_ALLELES --standard_min_confidence_threshold_for_calling 20.0 --standard_min_confidence_threshold_for_emitting 0.0 --annotateNDA --annotation BaseQualityRankSumTest --annotation ClippingRankSumTest --annotation Coverage --annotation FisherStrand --annotation GCContent --annotation HomopolymerRun --annotation LikelihoodRankSumTest --annotation MappingQualityRankSumTest --annotation NBaseCount --annotation QualByDepth --annotation RMSMappingQuality --annotation ReadPosRankSumTest --annotation StrandOddsRatio --annotation TandemRepeatAnnotator --annotation DepthPerAlleleBySample --annotation DepthPerSampleHC --annotation StrandAlleleCountsBySample --annotation StrandBiasBySample --excludeAnnotation HaplotypeScore --excludeAnnotation InbreedingCoeff

Log file is below( Notice "weird" WARNings about) "StrandBiasBySample annotation exists in input VCF header"..
which make no sense because the header is empty other than the barebone fields.

This is the barebone VCF
head /data2/external_data/[...]_m026645/s109575.ez/Sequencing_2016/OMNI.vcf

fileformat=VCFv4.2

CHROM POS ID REF ALT QUAL FILTER INFO

chr1 723918 rs144434834 G A 30 PASS .
chr1 729632 rs116720794 C T 30 PASS .
chr1 752566 rs3094315 G A 30 PASS .
chr1 752721 rs3131972 A G 30 PASS .
chr1 754063 rs12184312 G T 30 PASS .
chr1 757691 rs74045212 T C 30 PASS .
chr1 759036 rs114525117 G A 30 PASS .
chr1 761764 rs144708130 G A 30 PASS .

This is the output

INFO 10:03:06,080 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:03:06,082 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
INFO 10:03:06,083 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 10:03:06,083 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 10:03:06,086 HelpFormatter - Program Args: -T HaplotypeCaller --input_file /data2/external_data/[...]/s115343.beauty/Paired_analysis/secondary/Paired_10192014/IGV_BAM/pair_EX167687/s_EX167687_DNA_Blood.igv-sorted.bam --alleles:vcf /data2/external_data/[...]m026645/s109575.ez/Sequencing_2016/OMNI.vcf --phone_home NO_ET --gatk_key /projects/bsi/bictools/apps/alignment/GenomeAnalysisTK/3.1-1/Hossain.Asif_mayo.edu.key --reference_sequence /data2/bsi/reference/sequence/human/ncbi/hg19/allchr.fa --minReadsPerAlignmentStart 1 --disableOptimizations --dontTrimActiveRegions --forceActive --out /data2/external_data/[...]m026645/s109575.ez/Sequencing_2016/EX167687.0.0375.chr22.vcf --logging_level INFO -L chr22 --downsample_to_fraction 0.0375 --downsampling_type BY_SAMPLE --genotyping_mode GENOTYPE_GIVEN_ALLELES --standard_min_confidence_threshold_for_calling 20.0 --standard_min_confidence_threshold_for_emitting 0.0 --annotateNDA --annotation BaseQualityRankSumTest --annotation ClippingRankSumTest --annotation Coverage --annotation FisherStrand --annotation GCContent --annotation HomopolymerRun --annotation LikelihoodRankSumTest --annotation MappingQualityRankSumTest --annotation NBaseCount --annotation QualByDepth --annotation RMSMappingQuality --annotation ReadPosRankSumTest --annotation StrandOddsRatio --annotation TandemRepeatAnnotator --annotation DepthPerAlleleBySample --annotation DepthPerSampleHC --annotation StrandAlleleCountsBySample --annotation StrandBiasBySample --excludeAnnotation HaplotypeScore --excludeAnnotation InbreedingCoeff
INFO 10:03:06,093 HelpFormatter - Executing as m037385@franklin04-213 on Linux 2.6.32-573.8.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26.
INFO 10:03:06,094 HelpFormatter - Date/Time: 2016/01/19 10:03:06
INFO 10:03:06,094 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:03:06,094 HelpFormatter - ---------------------------------------------------------------------------------
INFO 10:03:06,545 GenomeAnalysisEngine - Strictness is SILENT
INFO 10:03:06,657 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Fraction: 0.04
INFO 10:03:06,666 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 10:03:07,012 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.35
INFO 10:03:07,031 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO 10:03:07,170 IntervalUtils - Processing 51304566 bp from intervals
INFO 10:03:07,256 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 10:03:07,595 GenomeAnalysisEngine - Done preparing for traversal
INFO 10:03:07,595 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 10:03:07,595 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 10:03:07,596 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime
INFO 10:03:07,596 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output
WARN 10:03:07,709 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
WARN 10:03:07,709 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
INFO 10:03:07,719 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO 10:03:37,599 ProgressMeter - chr22:5344011 0.0 30.0 s 49.6 w 10.4% 4.8 m 4.3 m
INFO 10:04:07,600 ProgressMeter - chr22:11875880 0.0 60.0 s 99.2 w 23.1% 4.3 m 3.3 m
Using AVX accelerated implementation of PairHMM
INFO 10:04:29,924 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file
INFO 10:04:29,925 VectorLoglessPairHMM - Using vectorized implementation of PairHMM
WARN 10:04:29,938 AnnotationUtils - Annotation will not be calculated, genotype is not called
WARN 10:04:29,938 AnnotationUtils - Annotation will not be calculated, genotype is not called
WARN 10:04:29,939 AnnotationUtils - Annotation will not be calculated, genotype is not called
INFO 10:04:37,601 ProgressMeter - chr22:17412465 0.0 90.0 s 148.8 w 33.9% 4.4 m 2.9 m
INFO 10:05:07,602 ProgressMeter - chr22:18643131 0.0 120.0 s 198.4 w 36.3% 5.5 m 3.5 m
INFO 10:05:37,603 ProgressMeter - chr22:20133744 0.0 2.5 m 248.0 w 39.2% 6.4 m 3.9 m
INFO 10:06:07,604 ProgressMeter - chr22:22062452 0.0 3.0 m 297.6 w 43.0% 7.0 m 4.0 m
INFO 10:06:37,605 ProgressMeter - chr22:23818297 0.0 3.5 m 347.2 w 46.4% 7.5 m 4.0 m
INFO 10:07:07,606 ProgressMeter - chr22:25491290 0.0 4.0 m 396.8 w 49.7% 8.1 m 4.1 m
INFO 10:07:37,607 ProgressMeter - chr22:27044271 0.0 4.5 m 446.4 w 52.7% 8.5 m 4.0 m
INFO 10:08:07,608 ProgressMeter - chr22:28494980 0.0 5.0 m 496.1 w 55.5% 9.0 m 4.0 m
INFO 10:08:47,609 ProgressMeter - chr22:30866786 0.0 5.7 m 562.2 w 60.2% 9.4 m 3.8 m
INFO 10:09:27,610 ProgressMeter - chr22:32908950 0.0 6.3 m 628.3 w 64.1% 9.9 m 3.5 m
INFO 10:09:57,610 ProgressMeter - chr22:34451306 0.0 6.8 m 677.9 w 67.2% 10.2 m 3.3 m
INFO 10:10:27,611 ProgressMeter - chr22:36013343 0.0 7.3 m 727.5 w 70.2% 10.4 m 3.1 m
INFO 10:10:57,613 ProgressMeter - chr22:37387478 0.0 7.8 m 777.1 w 72.9% 10.7 m 2.9 m
INFO 10:11:27,614 ProgressMeter - chr22:38534891 0.0 8.3 m 826.8 w 75.1% 11.1 m 2.8 m
INFO 10:11:57,615 ProgressMeter - chr22:39910054 0.0 8.8 m 876.4 w 77.8% 11.4 m 2.5 m
INFO 10:12:27,616 ProgressMeter - chr22:41738463 0.0 9.3 m 926.0 w 81.4% 11.5 m 2.1 m
INFO 10:12:57,617 ProgressMeter - chr22:43113306 0.0 9.8 m 975.6 w 84.0% 11.7 m 112.0 s
INFO 10:13:27,618 ProgressMeter - chr22:44456937 0.0 10.3 m 1025.2 w 86.7% 11.9 m 95.0 s
INFO 10:13:57,619 ProgressMeter - chr22:45448656 0.0 10.8 m 1074.8 w 88.6% 12.2 m 83.0 s
INFO 10:14:27,620 ProgressMeter - chr22:46689073 0.0 11.3 m 1124.4 w 91.0% 12.5 m 67.0 s
INFO 10:14:57,621 ProgressMeter - chr22:48062438 0.0 11.8 m 1174.0 w 93.7% 12.6 m 47.0 s
INFO 10:15:27,622 ProgressMeter - chr22:49363910 0.0 12.3 m 1223.6 w 96.2% 12.8 m 29.0 s
INFO 10:15:57,623 ProgressMeter - chr22:50688233 0.0 12.8 m 1273.2 w 98.8% 13.0 m 9.0 s
INFO 10:16:12,379 VectorLoglessPairHMM - Time spent in setup for JNI call : 0.061128124000000006
INFO 10:16:12,379 PairHMM - Total compute time in PairHMM computeLikelihoods() : 22.846350295
INFO 10:16:12,380 HaplotypeCaller - Ran local assembly on 25679 active regions
INFO 10:16:12,434 ProgressMeter - done 5.1304566E7 13.1 m 15.0 s 100.0% 13.1 m 0.0 s
INFO 10:16:12,435 ProgressMeter - Total runtime 784.84 secs, 13.08 min, 0.22 hours
INFO 10:16:12,435 MicroScheduler - 727347 reads were filtered out during the traversal out of approximately 4410423 total reads (16.49%)
INFO 10:16:12,435 MicroScheduler - -> 2 reads (0.00% of total) failing BadCigarFilter
INFO 10:16:12,436 MicroScheduler - -> 669763 reads (15.19% of total) failing DuplicateReadFilter
INFO 10:16:12,436 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 10:16:12,436 MicroScheduler - -> 57582 reads (1.31% of total) failing HCMappingQualityFilter
INFO 10:16:12,437 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 10:16:12,437 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 10:16:12,437 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 10:16:12,438 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter

Allele Depth (AD) / Allele Balance (AB) Filtering in GATK 4

$
0
0

Hi,

I am trying to filter my GATK 4.0.3 - HaplotypeCaller generated multi-sample VCF for allele depth (AD) annotation at sample genotype-level (so available in "FORMAT" fields of each sample).

I think prior to GATK 4, this annotation was available as "Allele Balance" (AB) ratios (generated by AlleleBalanceBySample), but it is not available anymore in GATK 4. So I tried to filter genotypes based on AD field, that is exactly the same thing but indicated in "X,Y" format, so in an array format of integers. This array format makes it difficult to filter based on depth of alternative allele divided by depth of all alleles at a specific site.

Can you please recommend any solution to this problem? If I could turn this array into a ratio, I could easily filter genotypes using VariantFiltration or other tools such as vcflib/vcffilter. I also tried the below code (following https://gatkforums.broadinstitute.org/gatk/discussion/1255/what-are-jexl-expressions-and-how-can-i-use-them-with-the-gatk):

gatk VariantFiltration -R $ref -V $vcf -O $output --genotype-filter-expression 'vc.getGenotype("Sample1").getAD().1 / vc.getGenotype("Sample1").getAD().0 > 0.33' --set-filtered-genotype-to-no-call --genotype-filter-name 'ABfilter'

This worked, but strangely it filters the variant for all samples if only one of the sample have allele depths that are not in balance (defined by the filter). If it worked only for Sample1, I was planning to write a quick loop for all the samples for instance. I tried the same with GATK 3.8, but still it filters whole variant for all the samples if it is filtered in just one sample.

GATK4 beta Mutect2 misses --cosmic option

$
0
0

could anyone explain why the --cosmic option is removed?

Also, should --germline_resource be used with --dbsnp
or replace it?

Thanks!


http://southsidemessenger.com/putlockerhdwatch-halloween-2018-full-movie-online-123-movies

$
0
0

Putlocker|HD|Watch Halloween [2018] Full Movie Online 123.Movies

Plans to update the GATK bundle

$
0
0

I was wondering when you guys plan on updating the bundle to GRCh38?

Error when using CollectWgsMetrics on Picard

$
0
0

Hello, i was running CollectWgsMetrics with a BAM file as input and when it finished running it showed the following message:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 248956422
at picard.analysis.AbstractWgsMetricsCollector.isReferenceBaseN(AbstractWgsMetricsCollector.java:215)
at picard.analysis.WgsMetricsProcessorImpl.processFile(WgsMetricsProcessorImpl.java:92)
at picard.analysis.CollectWgsMetrics.doWork(CollectWgsMetrics.java:491)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Do you have any idea why?
Thank you,
Francisco

Error when using GenotypeGVCFs with GenomicsDB

$
0
0

Hi,

I'm using GATK 4.0.10.0 and when running GenotypeGVCFs with a GenomicsDB workspace as input, I get the following error:

11:22:14.094 INFO GenotypeGVCFs - ------------------------------------------------------------
11:22:14.094 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.0.10.0
11:22:14.094 INFO GenotypeGVCFs - For support and documentation go to
11:22:14.095 INFO GenotypeGVCFs - Executing as cluengo@login1 on Linux v2.6.32-696.13.2.el6.Bull.128.x86_64 amd64
11:22:14.095 INFO GenotypeGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_102-b14
11:22:14.095 INFO GenotypeGVCFs - Start Date/Time: October 19, 2018 11:22:13 AM CEST
11:22:14.095 INFO GenotypeGVCFs - ------------------------------------------------------------
11:22:14.095 INFO GenotypeGVCFs - ------------------------------------------------------------
11:22:14.096 INFO GenotypeGVCFs - HTSJDK Version: 2.16.1
11:22:14.096 INFO GenotypeGVCFs - Picard Version: 2.18.13
11:22:14.096 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:22:14.096 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:22:14.096 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:22:14.096 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:22:14.096 INFO GenotypeGVCFs - Deflater: IntelDeflater
11:22:14.096 INFO GenotypeGVCFs - Inflater: IntelInflater
11:22:14.096 INFO GenotypeGVCFs - GCS max retries/reopens: 20
11:22:14.096 INFO GenotypeGVCFs - Requester pays: disabled
11:22:14.096 INFO GenotypeGVCFs - Initializing engine
11:22:14.669 INFO GenotypeGVCFs - Shutting down engine
[October 19, 2018 11:22:14 AM CEST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=963117056
Exception in thread "main" java.lang.ExceptionInInitializerError
at com.intel.genomicsdb.reader.GenomicsDBFeatureReader.generateHeadersForQueryGivenQueryJSONFile(GenomicsDBFeatureReader.java:206)
at com.intel.genomicsdb.reader.GenomicsDBFeatureReader.generateHeadersForQuery(GenomicsDBFeatureReader.java:201)
at com.intel.genomicsdb.reader.GenomicsDBFeatureReader.(GenomicsDBFeatureReader.java:76)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:401)
at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:311)
at org.broadinstitute.hellbender.engine.FeatureDataSource.(FeatureDataSource.java:267)
at org.broadinstitute.hellbender.engine.VariantWalker.initializeDrivingVariants(VariantWalker.java:55)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:49)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:638)
at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:43)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: com.intel.genomicsdb.exception.GenomicsDBException: Could not load genomicsdb native library
at com.intel.genomicsdb.reader.GenomicsDBQueryStream.(GenomicsDBQueryStream.java:48)
... 16 more
Caused by: java.lang.UnsatisfiedLinkError: com.intel.genomicsdb.GenomicsDBLibLoader.jniGenomicsDBOneTimeInitialize()I
at com.intel.genomicsdb.GenomicsDBLibLoader.jniGenomicsDBOneTimeInitialize(Native Method)
at com.intel.genomicsdb.GenomicsDBLibLoader.loadLibrary(GenomicsDBLibLoader.java:53)
at com.intel.genomicsdb.reader.GenomicsDBQueryStream.(GenomicsDBQueryStream.java:45)
... 16 more

In previous GATK versions I ran into some problems with file locking and Lustre, but in this case, it shouldn't be a problem since it doesn't even work when disabling file locking.

Thank you,

Cristina.

Off-label workflow to simply call differences in two samples

$
0
0

image
Given my years as a biochemist, if given two samples to compare, my first impulse is to want to know what are the functional differences, i.e. differences in proteins expressed between the two samples. I am interested in genomic alterations that ripple down the central dogma to transform a cell.

Please note the workflow that follows is NOT a part of the Best Practices. This is an illustrative, unsupported workflow. For the official Somatic Short Variant Calling Best Practices workflow, see Tutorial#11136.

To call every allele that is different between two samples, I have devised a two-pass workflow that takes advantage of Mutect2 features. This workflow uses Mutect2 in tumor-only mode and appropriates the --germline-resource argument to supply a single-sample VCF with allele fractions instead of population allele frequencies. The workflow assumes the two case samples being compared originate from the same parental line and the ploidy and mutation rates make it unlikely that any site accumulates more than one allele change.


First, call on each sample using Mutect2's tumor-only mode.

gatk Mutect2 \
-R ref.fa \
-I A.bam \
-tumor A \
-O A.vcf

gatk Mutect2 \
-R ref.fa \
-I B.bam \
-tumor B \
-O B.vcf

Second, for each single-sample VCF, move the sample-level AF allele-fraction annotation to the INFO field and simplify to a sites-only VCF.

This is a heuristic solution in which we substitute sample-level allele fractions for the expected population germline allele frequencies. Mutect2 is actually designed to use population germline allele frequencies in somatic likelihood calculations, so this substitution allows us to fulfill the requirement for an AF annotation with plausible fractional values. The terminal screenshots highlight the data transpositions.

Before:

image

After:

image

Third, call on each sample in a second pass, again in tumor-only mode, with the following additions.

gatk Mutect2 \
-R ref.fa \
-I A.bam \
-tumor A \
--germline-resource Baf.vcf \
--af-of-alleles-not-in-resource 0 \
--max-population-af 0 \
-pon pon_maskAB.vcf \
-O A-B.vcf

gatk Mutect2 \
-R ref.fa \
-I B.bam \
-tumor B \
--germline-resource Aaf.vcf \
--af-of-alleles-not-in-resource 0 \
--max-population-af 0 \
-pon pon_maskAB.vcf \
-O B-A.vcf
  • Provide the matched single-sample callset for the case sample with the --germline-resource argument.
  • Avoid calling any allele in the --germline-resource by setting --max-population-af to zero.
  • Maximize the probability of calling any differing allele by setting --af-of-alleles-not-in-resource to zero.
  • Prefilter sites with artifacts and cross-sample contamination with a panel of normals (PoN) in which confident variant sites for both sample A and B have been removed, e.g. with gatk SelectVariants –V pon.vcf -XL AandB_haplotypecaller.vcf –O pon_maskAB.vcf.

Fourth, filter out unlikely calls with FilterMutectCalls.

gatk FilterMutectCalls \
-V A-B.vcf \
-O A-B-filter.vcf

gatk FilterMutectCalls \
-V B-A.vcf \
-O B-A-filter.vcf

FilterMutectCalls provides many filters, e.g. that account for low base quality, for events that are clustered, for low mapping quality and for short-tandem-repeat contractions. Of the filters, let's consider the multiallelic filter. It discounts sites with more than two variant alleles that pass the tumor LOD threshold.

  • We assume case sample variant sites will have a maximum of one allele that is different from the --germline-resource control. A single allele call will pass the multiallelic filter. However, if we emit any shared variant allele alongside the differing allele, e.g. for a heterozygous site without ref alleles, then the call becomes multiallelic and will be filtered, which is not what we want. We previously set Mutect2’s --max-population-af to zero to ensure only the differing allele is called, and so here we can rely on FilterMutectCalls to filter artifactual multiallelic sites.
  • If multiple variant alleles are expected per call, then FilterMutectCall’s multiallelic filtering will be undesirable. For example, if changes to allele fractions for alleles that are shared was of interest for the two samples derived from the same parental line, and Mutect2 --max-population-af was set to one in the previous step to additionally emit the shared variant alleles, then you would expect multiallelic calls. These will be indistinguishable from artifactual multiallelic sites.

This workflow produces contrastive variants. If the samples are a tumor and its matched normal, then the calls include sites where heterozygosity was lost.

We know that loss of heterozygosity (LOH) plays a role in tumorigenesis (doi:10.1186/s12920-015-0123-z). This leads us to believe the heterozygosity of proteins we express contributes to our health. If this is true, then for somatic studies, if cataloging the gain of alleles is of interest, then cataloging the loss of alleles should also be of interest. Can we assume just because variants are germline that they do not play a role in disease processes? How can we account for the combinatorial effects of the diploid nature of our genomes?

Remember regions of LOH do not necessarily represent a haploid state but can be copy-neutral or even copy-amplified. It may be that as one parental chromosome copy is lost, the other is duplicated to maintain copy number, which presumably compensates for dosage effects as is the case in uniparental isodisomy.


Sequences at index 0 don't match, but using the same reference genome

$
0
0

I am trying to run CollectMultipleMetrics on a CRAM file but I get an "Sequences at index 0 don't match, but using the same reference genome". What can I do to resolve this?

Below what I have tried so far:

I am trying to run CollectMultipleMetrics on a CRAM file with the following code:

java -jar picard.jar CollectMultipleMetrics I=SRR1378155_SRR1378155.cram O=SRR1378155_SRR1378155.multiplemetrics R=/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa

but I get the error

htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 0 don't match: 0/248956422/chr1/M5=2648ae1bacce4ec4b6cf337dcae37816/UR=/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa 0/248956422/chr1/M5=1f9b4bfb2d6193a45e52901b8aa4339e/UR=file:/apps/data/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa.gz

I am not sure where the "UR=file:/apps/data/ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa.gz" comes from, as when I look in the CRAM header it shows "UR:/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa" for the contigs.

I tried resorting using the reference from last post here: http://seqanswers.com/forums/archive/index.php/t-14635.html with

 java -jar picard.jar ReorderSam INPUT=SRR1378155_SRR1378155.cram  OUTPUT=SRR1378155_SRR1378155.sorted.cram REFERENCE=/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa`

And tried it suing the .gzipped reference file as well, but for both I got the same sequence don't match error.

When I use ValidateSAM on the CRAM file, I get "Exception in thread "main" htsjdk.samtools.cram.CRAMException: Contig chr1 not found in the reference file.", from answer here https://gatkforums.broadinstitute.org/gatk/discussion/7944/can-validatesamfile-check-on-cram-files I converted my CRAM to BAM file and then ran validateSAM, whic gave me:

HISTOGRAM java.lang.String

Error Type Count
ERROR:MISSING_READ_GROUP 1
WARNING:RECORD_MISSING_READ_GROUP 124248788

I then added the read group to the BAM file and after that there are no errors found in the BAM file, but when I try to use CollectMultipleMetrics I again get the "Sequences at index 0 don't match" error.

First couple of lines with CRAM header, shows UR:/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa:

@HD VN:1.5 SO:coordinate
@PG ID:STAR PN:STAR VN:STAR_2.5.1b CL:STAR --runThreadN 8 --genomeDir /apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/STAR/2.5.1b-foss-2015b/ --genomeLoad NoSharedMemory --readFilesIn /groups/umcg-biogen/tmp04/biogen/input/GTEx/sra/fastq/cart_prj8709_201809140901/SRR1378155_1.fastq.gz /groups/umcg-biogen/tmp04/biogen/input/GTEx/sra/fastq/cart_prj8709_201809140901/SRR1378155_2.fastq.gz --readFilesCommand zcat --outFileNamePrefix /local/5210636/SRR1378155_SRR1378155. --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMismatchNmax 6 --quantMode GeneCounts --twopassMode Basic
@PG ID:scramble PN:scramble PP:STAR VN:1.14.6 CL:scramble -I bam -O cram -r /apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa -t 2 /local/5211312/SRR1378155_SRR1378155.fixmates.bam /groups//umcg-biogen/tmp04/biogen/input/GTEx/pipelines/results///cramFiles/SRR1378155_SRR1378155.cram
@CO user command line: STAR --outFileNamePrefix /local/5210636/SRR1378155_SRR1378155. --readFilesIn /groups/umcg-biogen/tmp04/biogen/input/GTEx/sra/fastq/cart_prj8709_201809140901/SRR1378155_1.fastq.gz /groups/umcg-biogen/tmp04/biogen/input/GTEx/sra/fastq/cart_prj8709_201809140901/SRR1378155_2.fastq.gz --readFilesCommand zcat --genomeDir /apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/STAR/2.5.1b-foss-2015b/ --genomeLoad NoSharedMemory --runThreadN 8 --outFilterMultimapNmax 1 --outFilterMismatchNmax 6 --twopassMode Basic --quantMode GeneCounts --outSAMunmapped Within
@SQ SN:chr1 LN:248956422 M5:2648ae1bacce4ec4b6cf337dcae37816 UR:/apps/data//ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/GRCh38.p5.genome.fa

wrong description for AS_QualByDepth in GATK docs?


AndreGerlach

$
0
0

Arora Shine Beauty Cream=> Arora Shine Beauty Cream is a medicine that is used for the treatment of Wrinkles, Lightens light brown color patches on skin, Age spots, Skin inflammation, Acne, Dry, rough patches and tiny bumps on skin and other conditions. Arora Shine Beauty Cream contains Hydroquinone, Mometasone, and Tretinoin as active ingredients. that macke u beautifull and shning your skin

http://www.supplementsleader.com/arora-shine-beauty-cream/

ERROR stack trace

$
0
0

Hi,

I ran genotypeGVCF (version 3.8) with an input from 91 WES samples combined (with combinegvcf). I got an error and I couldn't find a solution here.

My command line include the options: --dbsnp, --annotateNDA, --sample_ploidy 182, --useNewAFCalculator

The error log which I had:

ERROR --
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 20
at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoods.getNumLikelihoodElements(GeneralPloidyGenotypeLikelihoods.java:440)
at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoods.subsetToAlleles(GeneralPloidyGenotypeLikelihoods.java:339)
at org.broadinstitute.gatk.tools.walkers.genotyper.afcalc.IndependentAllelesExactAFCalculator.subsetAlleles(IndependentAllelesExactAFCalculator.java:494)
at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:292)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:392)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:375)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:330)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.regenotypeVC(GenotypeGVCFs.java:327)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:305)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:136)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions SITE GATK
ERROR
ERROR MESSAGE: 20
ERROR ------------------------------------------------------------------------------------------

Does anyone could help me?

Sorry for any inconvenience or if I asked in wrong place.

Thank you for your time and patience

VariantFiltration not filtering correctly

$
0
0

Hi,
I'm running the following command to hard-filtering some variants:

gatk -R /Users/debortoli/Doutorado/hg19/hg19.fa \ -T VariantFiltration -V vcf_no_indels.recode.vcf \ --filterExpression "ReadPosRankSum < -5.0 || MQRankSum < -4.0 || MQ < 40.0 || QD < 2.0 || FS > 60.0 || SOR > 2.0" \ --filterName "FAIL" \ -o vcf_no_indels_hard_filtered_test.vcf

The output show some strange things like:

chr15 28228924 . G A 68.73 PASS AC=2;AF=3.236e-03;AN=618;DP=3360;ExcessHet=0.0106;FS=0.000;InbreedingCoeff=0.0278;MLEAC=1;MLEAF=1.618e-03;MQ=60.00;QD=22.91;SOR=2.833 GT:AD:DP:GQ:PL 0/0:22,0:22:66:0,66,768 0/0:35,0:35:99:0,105,1186 0/0:11,0:11:33:0,33,378
chr15 28419695 rs149592795 T C 339531 MQRankSum AC=133;AF=0.196;AN=680;BaseQRankSum=-8.510e-01;ClippingRankSum=0.036;DP=154411;ExcessHet=73.3363;FS=0.623;InbreedingCoeff=-0.2431;MLEAC=133;MLEAF=0.196;MQ=59.11;MQRankSum=-9.913e+00;QD=3.03;ReadPosRankSum=3.09;SOR=0.610 GT:AD:DP:GQ:PGT:PID:PL 0/0:89,0:89:99:.:.:0,120,1800 0/0:1055,0:1055:99:.:.:0,120,1800 0/0:338,0:338:99:.:.:0,120,1800

I'm wondering why they are passing the filter when they shouldn't....there are more examples along the vcf that also pass one of the filters when they shouldn't...

Java error when using ASEReadCounter

$
0
0

Hi, all!
I'm trying to do an allele-specific expression analysis on some of my samples and I gave ASEReadCounter a try. I'm having trouble with Java and I probably should post this on some java forum but thought you could help me since you would know better about GATK. Here is the error message I get:

 Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Partitioner
    at java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
    at java.lang.Class.getConstructors(Class.java:1651)
    at org.broadinstitute.hellbender.utils.ClassUtils.canMakeInstances(ClassUtils.java:30)
    at org.broadinstitute.hellbender.Main.extractCommandLineProgram(Main.java:318)
    at org.broadinstitute.hellbender.Main.setupConfigAndExtractProgram(Main.java:180)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:202)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Partitioner
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 8 more

At first I tried it using Java version 9. I thought this could be the problem, so I tried running it using Java 8 but that didn't solve it. Could you help me?

Using GATK pipelines?

$
0
0

Just wondering, is everyone working mostly with open source pipelines or more with custom pipelines? I understand the pros and cons of both but want to see what others recommend. This is for both WGS and WES.

Viewing all 12345 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>