Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all 12345 articles
Browse latest View live

i am running haplotypcaller in one bam file

$
0
0

java -jar GenomeAnalysisTK-3.7/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference/GRCh37/hs37d5.fa -I output.bam --dbsnp reference/gatkbundle/dbsnp_138.b37.vcf -o output.g.vcf -ERC GVCF

i am trying add one more bam file to my cohort. i ran one bam file separately, but while applying gvcf mode it is not running . it is throwing the below error.

MESSAGE: Invalid command line: Argument emitRefConfidence has a bad value: Can only be used in single sample mode currently. Use the sample_name argument to run on a single sample out of a multi-sample BAM file


Does FilterByOrientationBias consider normal samples?

$
0
0

FilterByOrientationBias takes the output of CollectSequencingArtifactMetrics to do somatic variant filtering. Its manual says:

CollectSequencingArtifactMetrics should be run for both the normal sample and the tumor sample, if the matched normal is available.

But the example command-line shown in the manual is:

 gatk-launch --javaOptions "-Xmx4g" FilterByOrientationBias \
   --artifactModes 'G/T' \
   -V tumor_unfiltered.vcf.gz \
   -P tumor.pre_adapter_detail_metrics \
   --output oxog_filtered.vcf.gz

The input only involves tumor sample. Do I really need to run CollectSequencingArtifactMetrics on the matched normal sample? If yes, how should I use it in FilterByOrientationBias?

Thanks.

Error in Mutect2

$
0
0

I'm running the following command for mutect2 and keep getting an error:

jave -jar gatk Mutect2 \
-R ../../cf3genome.fa \
-I test.bam \
--tumor-sample test \
-O a1.vcf
A USER ERROR has occurred: Bad input: Sample test is not in BAM header: [20]

But I added test to the header with picard tools, here's my header:

samtools view -H test.bam 
@HD VN:1.5  SO:coordinate
@SQ SN:1    LN:122678785
@SQ SN:10   LN:69331447
@SQ SN:11   LN:74389097
@SQ SN:12   LN:72498081
@SQ SN:13   LN:63241923
@SQ SN:14   LN:60966679
@SQ SN:15   LN:64190966
@SQ SN:16   LN:59632846
@SQ SN:17   LN:64289059
@SQ SN:18   LN:55844845
@SQ SN:19   LN:53741614
@SQ SN:2    LN:85426708
@SQ SN:20   LN:58134056
@SQ SN:21   LN:50858623
@SQ SN:22   LN:61439934
@SQ SN:23   LN:52294480
@SQ SN:24   LN:47698779
@SQ SN:25   LN:51628933
@SQ SN:26   LN:38964690
@SQ SN:27   LN:45876710
@SQ SN:28   LN:41182112
@SQ SN:29   LN:41845238
@SQ SN:3    LN:91889043
@SQ SN:30   LN:40214260
@SQ SN:31   LN:39895921
@SQ SN:32   LN:38810281
@SQ SN:33   LN:31377067
@SQ SN:34   LN:42124431
@SQ SN:35   LN:26524999
@SQ SN:36   LN:30810995
@SQ SN:37   LN:30902991
@SQ SN:38   LN:23914537
@SQ SN:4    LN:88276631
@SQ SN:5    LN:88915250
@SQ SN:6    LN:77573801
@SQ SN:7    LN:80974532
@SQ SN:8    LN:74330416
@SQ SN:9    LN:61074082
@SQ SN:MT   LN:16727
@SQ SN:X    LN:123869142
@SQ SN:JH373233.1   LN:2660953
@SQ SN:JH373234.1   LN:1881673
@SQ SN:JH373235.1   LN:1415205
@SQ SN:JH373236.1   LN:1067467
@SQ SN:JH373238.1   LN:881102
@SQ SN:JH373237.1   LN:866315
@SQ SN:JH373239.1   LN:822601
@SQ SN:JH373241.1   LN:745551
...
#A bunch of unassembled contigs here
...
@RG ID:test LB:test PL:illumin  SM:20   PU:unit1
@PG ID:STAR PN:STAR VN:STAR_2.5.3a  CL:/projects/evcon@colostate.edu/STAR-2.5.3a/bin/Linux_x86_64/STAR   --genomeDir /scratch/summit/evcon@colostate.edu/str_idx/   --readFilesIn AESC006_1_val_1.fq.gz   AESC006_2_val_2.fq.gz      --readFilesCommand zcat      --outFileNamePrefix stralgnout/AESC006_1_val_1   --outSAMtype BAM   SortedByCoordinate      --outFilterMultimapNmax 1   --sjdbGTFfile /scratch/summit/evcon@colostate.edu/cf3gtf.gtf   --quantMode GeneCounts   
@CO user command line: /projects/evcon@colostate.edu/STAR-2.5.3a/bin/Linux_x86_64/STAR --genomeDir /scratch/summit/evcon@colostate.edu/str_idx/ --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --sjdbGTFfile /scratch/summit/evcon@colostate.edu/cf3gtf.gtf --outFilterMultimapNmax 1 --readFilesCommand zcat --outFileNamePrefix stralgnout/AESC006_1_val_1 --readFilesIn AESC006_1_val_1.fq.gz AESC006_2_val_2.fq.gz

Possible to GenotypeGVCFs at all sites?

$
0
0

Dear,

I am using GATK 4.0.4.0 following the best practices for joint variant calling on a cohort of samples. Everything works. However, in the GenotypeGVCFs step I would like to genotype all sites including non variant, at least for a specific set of genes. I assume that this is impossible. Will this functionality be ported to GATK4? Is there another way to retrieve the same results, for example by converting the -ERC GVCF ouput of GATKs haplotypecaller to -ERC BP_RESOLUTION? So in my final vcf I would like to see genotypes for all positions. I am asking this because based on this last vcf file I cannot discriminate between highly reliable and less reliable reference genotypes for positions where no sample has a variant allele.

Best wishes,
Wouter

gVCF from different gatk versions

$
0
0

Hello,
I'm adding individuals to my project. Can I use gatk4 to create these new g.vcf and then combine them with my old g.vcf (produced with gatk3) for joint genotyping and filtering in gatk4 ?

How should I pre-process data from multiplexed sequencing and multi-library designs?

$
0
0

Our Best Practices pre-processing documentation assumes a simple experimental design in which you have one set of input sequence files (forward/reverse or interleaved FASTQ, or unmapped uBAM) per sample, and you run each step of the pre-processing workflow separately for each sample, resulting in one BAM file per sample at the end of this phase.

However, if you are generating multiple libraries for each sample, and/or multiplexing samples within and/or across sequencing lanes, the data must be de-multiplexed before pre-processing, typically resulting in multiple sets of FASTQ files per sample all of which should have distinct read group IDs (RGID).

At that point there are several different valid strategies for implementing the pre-processing workflow. Here at the Broad Institute, we run the initial steps of the pre-processing workflow (mapping, sorting and marking duplicates) separately on each individual read group. Then we merge the data to produce a single BAM file for each sample (aggregation); this is done by re-running Mark Duplicates, this time on all read group BAM files for a sample at the same time. Then we run Indel Realignment and Base Recalibration on the aggregated per-sample BAM files. See the worked-out example below and the presentation on Broad Production Pipelines here for more details.

Note that there are many possible ways to achieve a similar result; here we present the way we think gives the best combination of efficiency and quality. This assumes that you are dealing with one or more samples, and each of them was sequenced on one or more lanes.

Example

Let's say we have this example data (assuming interleaved FASTQs containing both forward and reverse reads) for two sample libraries, sampleA and sampleB, which were each sequenced on two lanes, lane1 and lane2:

  • sampleA_lane1.fq
  • sampleA_lane2.fq
  • sampleB_lane1.fq
  • sampleB_lane2.fq

These will each be identified as separate read groups A1, A2, B1 and B2. If we had multiple libraries per sample, we would further distinguish them (eg sampleA_lib1_lane1.fq leading to read group A11, sampleA_lib2_lane1.fq leading to read group A21 and so on).

1. Run initial steps per-readgroup once

Assuming that you received one FASTQ file per sample library, per lane of sequence data (which amounts to a read group), run each file through mapping and sorting. During the mapping step you assign read group information, which will be very important in the next steps so be sure to do it correctly. See the read groups dictionary entry for guidance.

The example data becomes:

  • sampleA_rgA1.bam
  • sampleA_rgA2.bam
  • sampleB_rgB1.bam
  • sampleB_rgB2.bam

At this point we mark duplicates in each read group BAM file (dedup), which allows us to estimate the complexity of the corresponding library of origin as a quality control step. This step is optional.

The example data becomes:

  • sampleA_rgA1.dedup.bam
  • sampleA_rgA2.dedup.bam
  • sampleB_rgB1.dedup.bam
  • sampleB_rgB2.dedup.bam

Technically this first run of marking duplicates is not necessary because we will run it again per-sample, and that per-sample marking would be enough to achieve the desired result. To reiterate, we only do this round of marking duplicates for QC purposes.

2. Merge read groups and mark duplicates per sample (aggregation + dedup)

Once you have pre-processed each read group individually, you merge read groups belonging to the same sample into a single BAM file. You can do this as a standalone step, bur for the sake of efficiency we combine this with the per-readgroup duplicate marking step (it's simply a matter of passing the multiple inputs to MarkDuplicates in a single command).

The example data becomes:

  • sampleA.merged.dedup.bam
  • sampleB.merged.dedup.bam

To be clear, this is the round of marking duplicates that matters. It eliminates PCR duplicates (arising from library preparation) across all lanes in addition to optical duplicates (which are by definition only per-lane).

3. Remaining per-sample pre-processing

Then you run indel realignment (optional) and base recalibration (BQSR).

The example data becomes:

  • sample1.merged.dedup.(realn).recal.bam
  • sample2.merged.dedup.(realn).recal.bam

Realigning around indels per-sample leads to consistent alignments across all lanes within a sample. This step is only necessary if you will be using a locus-based variant caller like MuTect 1 or UnifiedGenotyper (for legacy reasons). If you will be using HaplotypeCaller or MuTect2, you do not need to perform indel realignment.

Base recalibration will be applied per-read group if you assigned appropriate read group information in your data. BaseRecalibrator distinguishes read groups by RGID, or RGPU if it is available (PU takes precedence over ID). This will identify separate read groups (distinguishing both lanes and libraries) as such even if they are in the same BAM file, and it will always process them separately -- as long as the read groups are identified correctly of course. There would be no sense in trying to recalibrate across lanes, since the purpose of this processing step is to compensate for the errors made by the machine during sequencing, and the lane is the base unit of the sequencing machine (assuming the equipment is Illumina HiSeq or similar technology).

People often ask also if it's worth the trouble to try realigning across all samples in a cohort. The answer is almost always no, unless you have very shallow coverage. The problem is that while it would be lovely to ensure consistent alignments around indels across all samples, the computational cost gets too ridiculous too fast. That being said, for contrastive calling projects -- such as cancer tumor/normals -- we do recommend realigning both the tumor and the normal together in general to avoid slight alignment differences between the two tissue types.

GRCh37/hg19: should I re-process my BAMs?

$
0
0

I have a human exome experiment on which I am using hg19 resources (reference, targets, dbSNP, ... the whole shebang). I want to add some 1000Genomes exomes to this experiment, but the available BAMs are from GRCh37.

Is there a tool to port the BAMs from GRCh37 to hg19, and to continue with that? Maybe LiftOver?

Do you rather recommend re-processing the 1000Genomes BAMs on hg19? Would that mean regenerate FASTQs and re-do the whole map/MarkDup/IndelReal/BQSR steps?

For now, I have worked on the original BAMs but have renamed all the classical chromosomes from "1" to "chr1" and I got rid of the mitochondrial chromosome and all other contigs (got rid of these contigs also in the resources to avoid GATKs complaints on missing contigs). How bad would you think that is based on the differences you know between GRCh37 and hg19?

Thanks a lot for your help!

ID column missing in VCF after using HaplotypeCaller

$
0
0

Hello,
I have been using the HaplotypeCaller from GATK to call variants in many individuals all together at once. In the output file that I obtained, I see that the column "ID" always contain the value "." for every lines, so I understand that this is to indicate a missing value.
I think this may be a problem in my downstream analyses.
Any idea about the reason why this "ID" would be missing? and how may I fix this problem?
Thank you very much for any help :)


HaplotypeCaller misses a 1-bp insertion, but is recovered when limiting region around variant

$
0
0

We exome sequenced a sample and also found an interesting variant by Sanger sequencing. The variant was not in the exome's HC output, but it showed up in IGV, with UG, and with HC if it was region limited to a 40bp window around the insertion. There are 8/30 reads supporting the variant in IGV from the bam, and 7/28 make it through HC on the restricted region. No one caller is perfect, but I just can't grasp where HC is throwing out these good reads or how this is not meeting the active region threshold for HC. Any additional elaboration would be appreciated. Thanks.

IGV on HC bam output, restricted region and non-restricted:

HC(small region):
8 145669770 . T TA, 172.73 . BaseQRankSum=1.220;ClippingRankSum=-1.539;DP=28;MLEAC=1,0;MLEAF=0.500,0.00;MQ=60.00;MQ0=0;MQRankSum=-0.690;ReadPosRankSum=-1.857 GT:AD:DP:GQ:PL:SB 0/1:21,7,0:28:99:210,0,700,273,724,996:12,9,3,4
8 145669771 . A . . END=145669771 GT:DP:GQ:MIN_DP:PL 0/0:30:0:30:0,0,502
8 145669772 . C . . END=145669791 GT:DP:GQ:MIN_DP:PL 0/0:30:81:28:0,81,1215

HC:
8 145669770 . T . . END=145669771 GT:DP:GQ:MIN_DP:PL 0/0:30:0:27:0,0,454
8 145669772 . C . . END=145669791 GT:DP:GQ:MIN_DP:PL 0/0:30:81:28:0,81,1215

Mutation description

$
0
0

Hello,

I did variant calling + vqrs with the GATK 3.3 version and I obtain this mutation description in one sample:

0/1:0.72:13,5:18:99:1,0:0.5,0:0|**1:226046924_T_TCA:168,0,2670 **

Could you help me to interpret the last part of the description? When I examine the bamfile with the igvtool I don't see anything... The igvtool do not show any mutation in this position.

The info column is this one:

AC=1;AF=0.0001558;AN=6420;BaseQRankSum=1.97;ClippingRankSum=0.493;DP=155640;FS=14.443;GQ_MEAN=98.61;GQ_STDDEV=25.95;InbreedingCoeff=0.002;LikelihoodRankSum=1.97;MLEAC=1;MLEAF=0.0001503;MQ=60;MQ0=0;MQRankSum=0.197;NCC=11;QD=6.26;ReadPosRankSum=-3.253;SOR=3.596;VQSLOD=7.52;culprit=ReadPosRankSum

And the format column this one:

GT:AB:AD:DP:GQ:MLPSAC:MLPSAF:PGT:PID:PL

Thank you,

Laura Domènech

Funcotator error

$
0
0

Hi, GATK team

I ran into a problem recently coming from Funcotator. I initially run Mutect2 coupled with Funcotator on FireCloud, GATK 4.0.4.0 version, but it throw out some error. And I also tried run Funcotator locally and it produces the almost identical error, see below

[June 6, 2018 1:47:32 PM EDT] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 12.63 minutes.
Runtime.totalMemory()=12082741248
java.lang.StringIndexOutOfBoundsException: String index out of range: -61
    at java.lang.String.substring(String.java:1967)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createUtrFuncotation(GencodeFuncotationFactory.java:1088)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createGencodeFuncotationOnTranscript(GencodeFuncotationFactory.java:601)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsHelper(GencodeFuncotationFactory.java:529)
    at org.broadinstitute.hellbender.tools.funcotator.dataSources.gencode.GencodeFuncotationFactory.createFuncotationsOnVariant(GencodeFuncotationFactory.java:276

I know Funcotator is still in beta version, but want to ask you help me debug this problem and I am also wondering is there any more stable Funcotator.

Thanks,
Zhouwe

filter for strand bias in stranded RNAseq?

$
0
0

Hello,

I was wondering if it makes sense to filter for strand bias as stated in the Best Practice RNAseq Variant Calling guide as most of todays RNAseq data is strand specific. I would actually expect high strand biases of variants and be suspicious about variants which do NOT show strand bias =)
...or did i get something wrong with the Fisher Strand values?

Thank you

GATK4 ReadsPipelineSpark error

$
0
0

When I run gatk4.0.4 ReadsPipelineSpark, it runs correctly until this error appears and I do not know how to solve it:

18/06/07 17:13:04 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17:13:05.109 INFO BaseRecalibrationEngine - The covariates being used here:
17:13:05.109 INFO BaseRecalibrationEngine - ReadGroupCovariate
17:13:05.109 INFO BaseRecalibrationEngine - QualityScoreCovariate
17:13:05.109 INFO BaseRecalibrationEngine - ContextCovariate
17:13:05.109 INFO BaseRecalibrationEngine - CycleCovariate
18/06/07 17:13:05 ERROR Executor: Exception in task 46.0 in stage 3.0 (TID 2724)
java.lang.IllegalStateException: Duplicate key -1
at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133)
at java.util.HashMap.merge(HashMap.java:1254)
at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.lambda$mark$2142e97f$1(MarkDuplicatesSpark.java:82)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$10$1.apply(JavaRDDLike.scala:319)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$10$1.apply(JavaRDDLike.scala:319)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Thank you very much

UnifiedGenotyper error: Somehow the requested coordinate is not covered by the read.

$
0
0

Dear GATK Team,

I am receiving the following error while running GATK 1.6. Unfortunately, for project consistency I cannot update to a more recent version of GATK and would at least wish to understand the source of the error. I ran ValidateSamFile on the input bam files and they appear to be OK.

Any insight or advice would be greatly appreciated:

`##### ERROR ------------------------------------------------------------------------------------------

ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Somehow the requested coordinate is not covered by the read. Too many deletions?
at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:425)
at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:374)
at org.broadinstitute.sting.utils.sam.ReadUtils.getReadCoordinateForReferenceCoordinate(ReadUtils.java:370)
at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipByReferenceCoordinates(ReadClipper.java:445)
at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipByReferenceCoordinatesRightTail(ReadClipper.java:176)
at org.broadinstitute.sting.gatk.walkers.indels.PairHMMIndelErrorModel.computeReadHaplotypeLikelihoods(PairHMMIndelErrorModel.java:196)
at org.broadinstitute.sting.gatk.walkers.genotyper.IndelGenotypeLikelihoodsCalculationModel.getLikelihoods(IndelGenotypeLikelihoodsCalculationModel.java:212)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoods(UnifiedGenotyperEngine.java:235)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyperEngine.calculateLikelihoodsAndGenotypes(UnifiedGenotyperEngine.java:164)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:302)
at org.broadinstitute.sting.gatk.walkers.genotyper.UnifiedGenotyper.map(UnifiedGenotyper.java:115)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:78)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:63)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:248)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:92)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 1.6-22-g3ec78bd):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
ERROR
ERROR MESSAGE: Somehow the requested coordinate is not covered by the read. Too many deletions?
ERROR ------------------------------------------------------------------------------------------`

Abbreviated commandline used:

GenomeAnalysisTK.jar -T UnifiedGenotyper -glm BOTH -et NO_ET \ -R "Saccharomyces_cerevisiae/UCSC/sacCer2/Sequence/WholeGenomeFasta/genome.fa" \ -dcov 5000 -I "someFile.bam" --output_mode EMIT_ALL_SITES -gvcf -l OFF \ -stand_call_conf 1 -L chrIV:1-1531919

ReadBackedPhasing with several BAM files possible?

$
0
0

Hello,
I have ran Haplotype Caller on 12 BAM files all together at once, using "Variant-only calling on DNAseq". Now, I have one VCF file containing all my variants. I would like to run ReadBackedPhasing in order to phase my SNPs. However, I see in the manual that one BAM file is required to provide physical information. Is it possible, in my case, to provide not one, but my 12 BAM files in a single command?
(I am aware that having ran the GVCF workflow would have phased my SNPs already, but I realised that too late...)
Thank you by advance for your help :)


SNP calling using pooled RNA-seq data

$
0
0

Hello,

First of all, thank you for your detailed best practice pipeline for SNP calling from RNA-seq data.

I have pooled RNA seq data which I need to call SNP from. Each library consists of a pooled sample of 2-3 individuals of the same sex-tissue combination.

I was wondering if Haplotype caller can handle SNP calling from pooled sequences or is it better if I use FreeBayes?

I understand that these results come from experimenting with the data but it would be great if you could share your experiences with me on this.

Cheers,
Homa

java.lang.NumberFormatException when trying to perform VariantFiltration

$
0
0

I'm trying to get a set of robust variants to use to recalibrate quality scores. I called variants using gatk4, and then tried to perform VariantFiltration:

gatk-4.0.5.1/gatk VariantFiltration -R data/genome.fasta -V variants/6753_12-15-2015_first_pass_filtered.vcf -filter 'QD > 2 && FS > 60 && SOR < 3 && MQ > 40 && MQRankSum > -3 && ReadPosRankSum > -4' -output variants/6753_12-15-2015_second_pass_filtered.vcf -filter-name "default"

However, it complains with a java.lang.NumberFormatException:

Using GATK jar gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar VariantFiltration -R data/genome.fasta -V variants/6753_12-15-2015_first_pass_filtered.vcf -filter QD > 2 && FS > 60 && SOR < 3 && MQ > 40 && MQRankSum > -3 && ReadPosRankSum > -4 -output variants/6753_12-15-2015_second_pass_filtered.vcf -filter-name default
15:42:33.964 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar!/com/intel/gkl/native/libgkl_compression.dylib
15:42:34.114 INFO VariantFiltration - ------------------------------------------------------------
15:42:34.115 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.0.5.1
15:42:34.115 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
15:42:34.115 INFO VariantFiltration - Executing as sherlock@DN52ehae.SUNet on Mac OS X v10.13.5 x86_64
15:42:34.116 INFO VariantFiltration - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_91-b14
15:42:34.116 INFO VariantFiltration - Start Date/Time: June 15, 2018 3:42:33 PM PDT
15:42:34.116 INFO VariantFiltration - ------------------------------------------------------------
15:42:34.116 INFO VariantFiltration - ------------------------------------------------------------
15:42:34.117 INFO VariantFiltration - HTSJDK Version: 2.15.1
15:42:34.118 INFO VariantFiltration - Picard Version: 2.18.2
15:42:34.118 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:42:34.118 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:42:34.118 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:42:34.118 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:42:34.118 INFO VariantFiltration - Deflater: IntelDeflater
15:42:34.119 INFO VariantFiltration - Inflater: IntelInflater
15:42:34.119 INFO VariantFiltration - GCS max retries/reopens: 20
15:42:34.119 INFO VariantFiltration - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
15:42:34.119 INFO VariantFiltration - Initializing engine
15:42:34.634 INFO FeatureManager - Using codec VCFCodec to read file file:///Users/sherlock/dev/Bhatt_lab/crassphage/variants/6753_12-15-2015_first_pass_filtered.vcf
15:42:34.663 INFO VariantFiltration - Done initializing engine
15:42:34.750 INFO ProgressMeter - Starting traversal
15:42:34.750 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
15:42:34.781 INFO VariantFiltration - Shutting down engine
[June 15, 2018 3:42:34 PM PDT] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=342884352
java.lang.NumberFormatException: For input string: "26.67"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.apache.commons.jexl2.JexlArithmetic.toLong(JexlArithmetic.java:906)
at org.apache.commons.jexl2.JexlArithmetic.compare(JexlArithmetic.java:718)
at org.apache.commons.jexl2.JexlArithmetic.greaterThan(JexlArithmetic.java:790)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:796)
at org.apache.commons.jexl2.parser.ASTGTNode.jjtAccept(ASTGTNode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:449)
at org.apache.commons.jexl2.parser.ASTAndNode.jjtAccept(ASTAndNode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:449)
at org.apache.commons.jexl2.parser.ASTAndNode.jjtAccept(ASTAndNode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:449)
at org.apache.commons.jexl2.parser.ASTAndNode.jjtAccept(ASTAndNode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:449)
at org.apache.commons.jexl2.parser.ASTAndNode.jjtAccept(ASTAndNode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:449)
at org.apache.commons.jexl2.parser.ASTAndNode.jjtAccept(ASTAndNode.java:18)
at org.apache.commons.jexl2.Interpreter.interpret(Interpreter.java:232)
at org.apache.commons.jexl2.ExpressionImpl.evaluate(ExpressionImpl.java:65)
at htsjdk.variant.variantcontext.JEXLMap.evaluateExpression(JEXLMap.java:186)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:95)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:15)
at htsjdk.variant.variantcontext.VariantContextUtils.match(VariantContextUtils.java:338)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.matchesFilter(VariantFiltration.java:380)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.filter(VariantFiltration.java:339)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.apply(VariantFiltration.java:299)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:109)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:107)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:994)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)

I don't really know how to fix it. ValidateVariants gives no errors, and I am able to perform variant selection, e.g.:

gatk-4.0.5.1/gatk SelectVariants -R data/genome.fasta -V variants/6753_12-15-2015_first_pass_raw.vcf -select 'vc.getGenotype("6753_12-15-2015").getAD().1/vc.getGenotype("6753_12-15-2015").getDP() > 0.9 ' -output variants/6753_12-15-2015_first_pass_filtered.vcf

with no problems. Insights would be gratefully appreciated.
Thanks!
Gavin

GISTIC warning message

$
0
0

When I ran GISTIC, I got a warning.

The command is:
./gistic2 -b $res_dir -seg $segfile -refgene $reffile -genegistic 1 -smallmem 1 -broad 1 -brlen 0.5 -conf 0.90 -armpeel 1 -savegene 1 -gcm extreme

The warning:
Warning: Shortened 9004 segments in '/dir/Segmentation.txt' that overlap by one marker.
What does this mean? Should I take care for this warning?

Unable to retrieve result with VariantRecalibrator tool in my data

$
0
0

Hi, I am running the VQSR with VariantRecalibrator tool on my two vcf file( raw variants set based on WGS ). One of the files can successfully run out of the result, but the other file fails. The error message is shown below:
##### ERROR stack trace
org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: Unable to retrieve result
at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:190)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
Caused by: java.lang.IllegalArgumentException: No data found.
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:536)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:191)
at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.notifyTraversalDone(HierarchicalMicroScheduler.java:226)
at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:183)
... 5 more

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Unable to retrieve result
 ##### ERROR ------------------------------------------------------------------------------------------

Thanks for your reply!
xiaolian

How to filter out the SNPs with deletion allel ?

$
0
0

Hi,

I'm wondering whether there are a way to filter out the SNPs with deletion allel (*). I'm just not confident enough to keep these snps so I prefer rather discard them rather than having false snp.
However, in VariantFiltration or SelectVariants, there seems to be no option to filter or discart this type of SNPs.

I'm looking forward to having your response.

Cheers

Viewing all 12345 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>