GATK (v4.0.10.1) CombineGVCFs failing with 'java.lang.OutOfMemoryError'; not using memory provided

October 22, 2018, 11:00 am

≫ Next: Missing sample_file -sf option in GATK4 SelectVariants

≪ Previous: ASEReadCounter- A USER ERROR has occurred: Invalid argument

Hi,

We ran a CombineGVCFs job using the following command, where gvcfs.list contained only 31 gvcf files with 24 samples each:

$GATK --java-options "-Xmx650G" \
CombineGVCFs \
-R $referenceFasta \
-O full_cohort.ADNI.camo_genes.b37.g.vcf \
--variant gvcfs.list

We tried the extreme memory because CombineGVCFs kept failing. This node has 750G of RAM.

Despite the high memory provided, we get the stacktrace below. The total memory reported by GATK is only ~12G, though (Runtime.totalMemory()=12662603776). Am I missing something? I don't understand why GATK is only using 12G of RAM when we provided much more, and then failing with an OutOfMemoryError.

We are currently setting up GenomicsDBImport, but this seems worth reporting.

Really appreciate your help.

18:55:51.944 INFO ProgressMeter - 4:26649295 23.6 18617000 787894.4
18:56:01.988 INFO ProgressMeter - 4:26655758 23.8 18779000 789159.6
18:59:13.407 INFO CombineGVCFs - Shutting down engine
[October 19, 2018 6:59:13 PM CDT] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 27.06 minutes.
Runtime.totalMemory()=12662603776
Exception in thread "main" java.lang.OutOfMemoryError
at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:316)
at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
at java.io.BufferedWriter.close(BufferedWriter.java:266)
at htsjdk.variant.variantcontext.writer.VCFWriter.close(VCFWriter.java:226)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.closeTool(CombineGVCFs.java:461)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:970)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)

↧

Missing sample_file -sf option in GATK4 SelectVariants

October 13, 2018, 12:05 pm

≫ Next: Intervals and interval lists

≪ Previous: GATK (v4.0.10.1) CombineGVCFs failing with 'java.lang.OutOfMemoryError'; not using memory provided

In the port over to gatk SelectVariants the sample name -sn and sample expression -se options made it in, but the sample_file -sf option did not seem to make it. I tried:

gatk SelectVariants \
  -V in.vcf.gz \
  -O out.vcf.gz \
  -RF SampleReadFilter \
  -sample sample_list_file.txt

as a workaround, and it didn't have any errors, but the output file is more than three times the size of the input file, and still shows all the samples.

↧

Intervals and interval lists

December 23, 2017, 7:03 pm

≫ Next: https://healthjumpprograming.com/enduro-stack/

≪ Previous: Missing sample_file -sf option in GATK4 SelectVariants

Interval lists define subsets of genomic regions, sometimes even just individual positions in the genome. You can provide GATK tools with intervals or lists of intervals when you want to restrict them to operating on a subset of genomic regions. There are four main types of reasons for doing so:

You want to run a quick test on a subset of data (often used in troubleshooting)
You want to parallelize execution of an analysis across genomic regions
You need to exclude regions that have bad or uninformative data where a tool is getting stuck
The analysis you're running should only take data from those subsets due to how the underlying algorithm works

Regarding the latter case, see the Best Practices workflow recommendations and tool example commands for guidance regarding when to restrict analysis to intervals.

Interval-related arguments and syntax

Arguments for specifying and modifying intervals are provided by the engine and can be applied to most of not all tools. The main arguments you need to know about are the following:

-L / --intervals allows you to specify an interval or list of intervals to include.
-XL / --exclude-intervals allows you to specify an interval or list of intervals to exclude.
-ip / --interval-padding allows you to add padding (in bp) to the intervals you include.
-ixp / --interval-exclusion-padding allows you to add padding (in bp) to the intervals you exclude.

By default the engine will merge any intervals that abut (i.e. they are contiguous, they touch without overlapping) or overlap into a single interval. This behavior can be modified by specifying an alternate interval merging rule (see --interval-merging-rule in the Tool Docs).

The syntax for using -L is as follows; it applies equally to -XL:

-L chr20 for contig chr20.
-L chr20:1-100 for contig chr20, positions 1-100.
-L intervals.list (or intervals.interval_list, or intervals.bed) when specifying a text file containing intervals (see supported formats below).
-L variants.vcf when specifying a VCF file containing variant records; their genomic coordinates will be used as intervals.

If you want to provide several intervals or several interval lists, just pass them in using separate -L or -XL arguments (you can even use both of them in the same command). You can use all the different formats within the same command line. By default, the GATK engine will take the UNION of all the intervals in all the sets. This behavior can be modified by specifying an alternate interval set rule (see --interval-set-rule in the Tool Docs).

Supported interval list formats

GATK supports several types of interval list formats: Picard-style .interval_list, GATK-style .list, BED files with extension .bed, and VCF files. The intervals MUST be sorted by coordinate (in increasing order) within contigs; and the contigs must be sorted in the same order as in the sequence dictionary. This is require for efficiency reasons.

A. Picard-style `.interval_list`

Picard-style interval files have a SAM-like header that includes a sequence dictionary. The intervals are given in the form <chr> <start> <stop> + <target_name>, with fields separated by tabs, and the coordinates are 1-based (first position in the genome is position 1, not position 0).

@HD     VN:1.0  SO:coordinate
@SQ     SN:1    LN:249250621    AS:GRCh37       UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   M5:1b22b98cdeb4a9304cb5d48026a85128     SP:Homo Sapiens
@SQ     SN:2    LN:243199373    AS:GRCh37       UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   M5:a0d9851da00400dec1098a9255ac712e     SP:Homo Sapiens
1       30366   30503   +       target_1
1       69089   70010   +       target_2
1       367657  368599  +       target_3
1       621094  622036  +       target_4
1       861320  861395  +       target_5
1       865533  865718  +       target_6

This is the preferred format because the explicit sequence dictionary safeguards against accidental misuse (e.g. apply hg18 intervals to an hg19 BAM file). Note that this file is 1-based, not 0-based (the first position in the genome is position 1).

B. GATK-style `.list` or `.intervals`

This is a simpler format, where intervals are in the form <chr>:<start>-<stop>, and no sequence dictionary is necessary. This file format also uses 1-based coordinates. Note that only the <chr> part is strictly required; if you just want to specify chromosomes/ contigs as opposed to specific coordinate ranges, you don't need to specify the rest. Both <chr>:<start>-<stop> and <chr> can be present in the same file. You can also specify intervals in this format directly at the command line instead of writing them in a file.

C. BED files with extension `.bed`

We also accept the widely-used BED format, where intervals are in the form <chr> <start> <stop>, with fields separated by tabs. However, you should be aware that this file format is 0-based for the start coordinates, so coordinates taken from 1-based formats (e.g. if you're cooking up a custom interval list derived from a file in a 1-based format) should be offset by 1. The GATK engine recognizes the .bed extension and interprets the coordinate system accordingly.

D. VCF files

Yeah, I bet you didn't expect that was a thing! It's very convenient. Say you want to redo a variant calling run on a set of variant calls that you were given by a colleague, but with the latest version of HaplotypeCaller. You just provide the VCF, slap on some padding on the fly using e.g. -ip 100 in the HC command, and boom, done. Each record in the VCF will be interpreted as a single-base interval, and by adding padding you ensure that the caller sees enough context to reevaluate the call appropriately.

Obtaining suitable interval lists

So where do those intervals come from? It depends a lot on what you're working with (everyone's least favorite answer, I know). The most important distinction is the sequencing experiment type: is it whole genome, or targeted sequencing of some sort?

Targeted sequencing (exomes, gene panels etc.)

For exomes and similarly targeted data types, the interval list should correspond to the capture targets used for the library prep, and is typically provided by the prep kit manufacturer (with versions for each ref genome build of course).

We make our exome interval lists available, but be aware that they are specific to the custom exome targeting kits used at the Broad. If you got your sequencing done somewhere else, you should seek to get the appropriate intervals list from the sequencing provider.

Whole genomes (WGS)

For whole genome sequence, the intervals lists don’t depend on the prep (since in principle you captured the “whole genome”) so instead it depends on what regions of the genome you want to blacklist (e.g. centromeric regions that waste your time for nothing) and how the reference genome build enables you to cut up regions (separated by Ns) for scatter-gather parallelizing.

We make our WGS interval lists available, and the good news is that, as long as you're using the same genome reference build as us, you can use them with your own data even if it comes from somewhere else -- assuming you agree with our decisions about which regions to blacklist! Which you can examine by looking at the intervals themselves. However, we don't currently have documentation on their provenance, sorry -- baby steps.

↧

https://healthjumpprograming.com/enduro-stack/

October 23, 2018, 12:29 am

≫ Next: www.factofsupplements.com/gidae-skincare/

≪ Previous: Intervals and interval lists

Daily Dosage Every package is filled by all of sixty nutritional tablets. A greater animal needs to inhale two tablets on like the rock of gibralter grounds for 3 several weeks. For eclipse absorption end, sip these tablets half an hour already your meals. For achieving fastidious outcomes nick this nutritional condom in conjunction with uninterrupted diet plan and duty classes. Unique Advantages Balances

↧

www.factofsupplements.com/gidae-skincare/

October 23, 2018, 12:46 am

≫ Next: differentiate Enduro Stack

≪ Previous: https://healthjumpprograming.com/enduro-stack/

Gidae Skincare Uk Some for this wrinkle creams can help women who've an obsession with staying young, anti-aging is a woman's ally if they do not want to grow used. Wrinkles are their Gidae Skincare Canada early signs a person need to are ageing you might want to try anti aging cream. When you start utilizing these skin maintenance systems at an early age then you can certainly may have the ability to stay looking younger much over you friends who are not using the. If in order to a pal you will state your friends about these great products which help reduce wrinkles.

↧

differentiate Enduro Stack

October 23, 2018, 1:03 am

≫ Next: http://www.high5supplements.com/enduro-stack/

≪ Previous: www.factofsupplements.com/gidae-skincare/

looking results so you simply simply siality is to

↧

http://www.high5supplements.com/enduro-stack/

October 23, 2018, 2:08 am

≫ Next: http://www.kingofsupplement.com/endozyn-reviews/

≪ Previous: differentiate Enduro Stack

yet before plenty of your time you discharge or don’t improve hardons. Along these lines, you remain for a brief period f efforts and are not prepared to fulfill your accomplices. Lack of stamina or drive Charisma is an imperative men sex hormonal and is mindful to provide stamina and it likewise encourages you in getting palatable outcomes. There are normal procedures which create sexual interest in an appearance, yet a few factors, for example, maturing, addictions and pressure create snag in its creation. If that you feel an absence of stamina, at that point you are experiencing low generate

↧

http://www.kingofsupplement.com/endozyn-reviews/

October 23, 2018, 2:44 am

≫ Next: Which method for detection of rare variants in random human exome samples?

≪ Previous: http://www.high5supplements.com/enduro-stack/

Endozyn The 2d male enhancement that we are going to explore is the performance medicinal drugs peculiarly made for the penis. These drugs include a lot of herbs which might be believed to be wondrous erection enhancers. Nonetheless, watch out with these so-called "surprise" medications, most of them have detrimental aspect results that purpose whatever from minor nausea to extreme infections which would result to amputation.

↧

Which method for detection of rare variants in random human exome samples?

October 23, 2018, 3:29 am

≫ Next: maryjohns

≪ Previous: http://www.kingofsupplement.com/endozyn-reviews/

Hi, I am trying to detect rare variants that might only be present in one sample for rare diseases project. I have many exome samples of random population and I do not want to skip any rare variant. I am aware that multi-sample mode penalizes rare variants, and the same will happen if I use VQSR. For this specific case, should I do multi-sampling or single-sample mode?
Thank you very much

↧

maryjohns

October 23, 2018, 3:45 am

≫ Next: http: www.supplements4lifetime.com testo-drive-365-muscle-canada

≪ Previous: Which method for detection of rare variants in random human exome samples?

Vital Progenix Australia :The supplement is thought as as terribly very important Progenix Australia and thus the neatest issue of this product is that it’s altogether composed of seasoner ingredients. therefore let’s try it out and let’s relish its very good results. Be the first one to order the merchandise and impress your partner with outstanding sexual performance.

↧

http: www.supplements4lifetime.com testo-drive-365-muscle-canada

October 23, 2018, 4:21 am

≫ Next: The Guaranteed Best Weight Loss Program

≪ Previous: maryjohns

Testo Drive 365 Muscle Canada when wanted to pride his otherworldly battles by portraying them in wording that review a montage from one of the later Rocky films. "I don't prepare as if I were shadowboxing," he boasts to the Corinthians. Strikingly enough, insurance agencies figure that youthful male drivers, up to the age of 25, are the most foolhardy drivers out and about.

http: www.supplements4lifetime.com testo-drive-365-muscle-canada

↧

The Guaranteed Best Weight Loss Program

October 23, 2018, 5:23 am

≫ Next: Exception error on running Mutect2 "java.lang.NullPointerException"

≪ Previous: http: www.supplements4lifetime.com testo-drive-365-muscle-canada

In the event that you need to get more fit, choosing the correct health improvement plan is Bioxyn to your prosperity. There are an apparently boundless number of health improvement plans accessible available today so how might you choose which one is ideal for you? This is an inquiry I see a ton from the two people searching for the correct program to pursue. It's a confounding circumstance to be in I can envision; needing to get thinner however not knowing which program best serves your requirements. So what do you do? All things considered, the motivation behind this article is to talk about different viewpoints and ideas encompassing health improvement plans with the expectation that it will enable you to have the capacity to choose the suitable program for you. First of all!

↧

Exception error on running Mutect2 "java.lang.NullPointerException"

October 23, 2018, 5:52 am

≫ Next: Bug with login via twitter

≪ Previous: The Guaranteed Best Weight Loss Program

Error Message:

java.lang.NullPointerException
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.getContigNames(SequenceDictionaryUtils.java:463)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.getCommonContigsByName(SequenceDictionaryUtils.java:457)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.compareDictionaries(SequenceDictionaryUtils.java:234)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.validateDictionaries(SequenceDictionaryUtils.java:150)
at org.broadinstitute.hellbender.utils.SequenceDictionaryUtils.validateDictionaries(SequenceDictionaryUtils.java:98)
at org.broadinstitute.hellbender.engine.GATKTool.validateSequenceDictionaries(GATKTool.java:701)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:643)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:156)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)

Please help to me understand what is the issue

↧

Bug with login via twitter

October 12, 2018, 8:02 pm

≫ Next: GenomicsDBImport --intervals

≪ Previous: Exception error on running Mutect2 "java.lang.NullPointerException"

I tried to join using my twitter account, and it redirected to a twitter authorize this app window. But after clicking Approve, it was just caught in a redirect loop back to that same page. Tried in Chrome and Edge.

↧

GenomicsDBImport --intervals

May 8, 2018, 1:06 am

≫ Next: GenomicsDBImport terminates after Overlapping contigs found error

≪ Previous: Bug with login via twitter

Hi,
I am using a non-chromosomal genome.
Since GenomicsDBImport can only take in single genomic interval, how should I set the parameter for --intervals?
For contig >LFYR01000729.1 Zostera marina strain Finnish scaffold_1, whole genome shotgun sequence, should I set it as --intervals LFYR01000729.1 Zostera marina strain Finnish scaffold_1, whole genome shotgun sequence?

↧

GenomicsDBImport terminates after Overlapping contigs found error

October 15, 2018, 8:11 am

≫ Next: Difference betwee raw data and down-regulated data

≪ Previous: GenomicsDBImport --intervals

My original query was about batching and making intervals for GenomicsDBImport, but I have run into a new problem. I am using version 4.0.7.0 I tried the following:

gatk GenomicsDBImport \
--java-options "-Xmx250G -XX:+UseParallelGC -XX:ParallelGCThreads=24" \
-V input.list \
--genomicsdb-workspace-path 5sp_45ind_assmb_00 \
--intervals interval.00.list \
--batch-size 9

where I have split my list of contigs into 50 lists, and set batch size as 9 (instead of reading in 45 g.vcf at once) for a total of 5 batches. It looks like it has started to run, but terminated quickly after an error.

The resulting stack trace is:

00:53:23.869 INFO  GenomicsDBImport - HTSJDK Version: 2.16.0
00:53:23.869 INFO  GenomicsDBImport - Picard Version: 2.18.7
00:53:23.869 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
00:53:23.869 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
00:53:23.869 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
00:53:23.869 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
00:53:23.869 INFO  GenomicsDBImport - Deflater: IntelDeflater
00:53:23.869 INFO  GenomicsDBImport - Inflater: IntelInflater
00:53:23.869 INFO  GenomicsDBImport - GCS max retries/reopens: 20
00:53:23.869 INFO  GenomicsDBImport - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
00:53:23.869 INFO  GenomicsDBImport - Initializing engine
01:26:13.490 INFO  IntervalArgumentCollection - Processing 58057410 bp from intervals
01:26:13.517 INFO  GenomicsDBImport - Done initializing engine
Created workspace /home/leq/gvcfs/5sp_45ind_assmb_00
01:26:13.655 INFO  GenomicsDBImport - Vid Map JSON file will be written to 5sp_45ind_assmb_00/vidmap.json
01:26:13.655 INFO  GenomicsDBImport - Callset Map JSON file will be written to 5sp_45ind_assmb_00/callset.json
01:26:13.655 INFO  GenomicsDBImport - Complete VCF Header will be written to 5sp_45ind_assmb_00/vcfheader.vcf
01:26:13.655 INFO  GenomicsDBImport - Importing to array - 5sp_45ind_assmb_00/genomicsdb_array
01:26:13.656 INFO  ProgressMeter - Starting traversal
01:26:13.656 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
01:33:16.970 INFO  GenomicsDBImport - Importing batch 1 with 9 samples
[libprotobuf ERROR google/protobuf/io/coded_stream.cc:207] A protocol message was rejected because it was too big (more than 67108864 bytes).  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
Contig/chromosome ctg7180018354961 begins at TileDB column 0 and intersects with contig/chromosome ctg7180018354960 that spans columns [1380207667, 1380207970] terminate called after throwing an instance of 'ProtoBufBasedVidMapperException' what():  
ProtoBufBasedVidMapperException : Overlapping contigs found

How do I overcome this issue of 'overlapping contigs found'? Is there a problem with my set of contigs? Also, is the warning about protocol messages something to worry about?

Thank you!

↧

Difference betwee raw data and down-regulated data

August 22, 2018, 4:28 am

≫ Next: Variant not being called by HC GATK v3.7-0-gcfedb67

≪ Previous: GenomicsDBImport terminates after Overlapping contigs found error

Hi,GATK team!
We make a panel in GATK. This panel is about 3M. I tested a sample of a cancer patient. The normal size of this sample is 2.0G and the tumor is 4.6G. At the same time, I lowered the sample of the tumor so that the normal and tumor data volume is 1:1, and the same process is used for testing. The two raw.vcf obtained are compared, as shown in the figure.
In this figure,the middle (1176) is the common mutation site of the two original files, while the purple part (272) is the mutation site generated by the original data, and the red part (92) is the mutation site generated after the data volume is down-regulated. In theory, The mutation site generated after the data is down-regulated should be included in the mutation site generated by the original data, but the result is different from the theory. Why is this result? Does your team have an optimal ratio for the adjustment of the amount of data?
Could you please help me to solve this problem,Thanks.

↧

Variant not being called by HC GATK v3.7-0-gcfedb67

October 4, 2018, 6:12 am

≫ Next: Form of .intervals

≪ Previous: Difference betwee raw data and down-regulated data

Hello,
We are calling variants on data that has been sequenced on NextSeq platform. We have been using the same pipeline , with the same commands since a year and in all our runs we have a control sample to check if the variant calling and sequencing has been done right. For this particular run, one of a known SNP 7:143013285 that was called in the same sample in the previous 5 runs (over the last year) was missed by haplotype caller. On looking at the bam file, the variant seems to be present (highlighted BAM file). Above two bam files are from the same sample that were called in previous runs and HC was able to pick it up. The command I use are as follows

trim_galore -q 0 --paired --fastqc $R1_fastq $R2_fastq --output_dir $FASTQ

bwa mem -M -t 8 $ind.fa $FASTQ/${s_id}.R1_val_1.fq.gz $FASTQ/${s_id}.R2_val_2.fq.gz | sambamba_v0.6.6 view -t 8 -S -h -f bam -o $s_id.bam /dev/stdin
sambamba_v0.6.6 sort -t 8 -o $s_id.sorted.bam $s_id.bam
sambamba_v0.6.6 index -t 8 $s_id.sorted.bam

java -jar $picard AddOrReplaceReadGroups I=$s_id.sorted.bam O=$s_id.sorted.RG.bam SORT_ORDER=coordinate RGID=$s_id RGLB=$flowcell RGPL=illumina RGPU=U RGSM=$RUNNAME
sambamba_v0.6.6 index $s_id.sorted.RG.bam 
sambamba_v0.6.6 markdup -t 8 $s_id.sorted.RG.bam $s_id.markdup.bam 
sambamba_v0.6.6 index $s_id.markdup.bam 

java -Xmx8g -Djava.io.tmpdir=/ionng/tmp -jar $gatk -T BaseRecalibrator \
    -I $s_id.markdup.bam \
    -R $ind.fa \
        -knownSites dbsnp_138.b37.vcf \
        -knownSites Mills_and_1000G_gold_standard.indels.b37.vcf \
        -knownSites 1000G_phase1.indels.b37.vcf \
    -o $s_id.recal_data.table \
    -L $bed 

#Apply the Recalibration
java -Xmx8g -Djava.io.tmpdir=$TMPDIR -jar $gatk -T PrintReads \
    -I $s_id.markdup.bam \
    -R $ind.fa \
    -BQSR $s_id.recal_data.table \
    -o $s_id.${RUNNAME}.variant_ready.bam 

java -Xmx32g -jar $gatk -T HaplotypeCaller \
    -R $ind.fa --dbsnp $dbsnp_138.b37.vcf \
    -I $s_id.${RUNNAME}.variant_ready.bam \
    -stand_call_conf 30.0 \
    -L $bed \
    -o $s_id.${RUNNAME}.g.vcf

Things that I have tried which did not work:-
1. tried running HC with the options -allowNonUniqueKmers
2. also tried options to change parameters -stand_call_conf 2.0 -mmq 5
3. tried -ERC BP_RESOLUTION that results in
7 143013285 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:9,2:11:0:0,0,152

NOTE: The variant was picked up when I ran FREEBAYES and VARSCAN using default parameters.

FREEBAYES
7 143013285 . C T 206.73 . AB=0.394737;ABP=6.66752;AC=1;AF=0.5;AN=2;AO=15;CIGAR=1X;DP=38;DPB=38;DPRA=0;EPP=20.5268;EPPR=37.093;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=47.6012;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=495;QR=811;RO=23;RPL=2;RPP=20.5268;RPPR=37.093;RPR=13;RUN=1;SAF=15;SAP=35.5824;SAR=0;SRF=23;SRP=52.9542;SRR=0;TYPE=snp;technology.illumina=1 GT:DP:AD:RO:QR:AO:QA:GL 0/1:38:23,15:23:811:15:495:-33.4265,0,-61.8717

VARSCAN
7 143013285 . C T . PASS ADP=38;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:52:38:38:23:15:39.47%:5.4463E-6:35:33:23:0:15:0

I can email the bamout file if required (though I am not allowed to upload it publicly.)

Any suggestions will be helpful. Thanks.

Thankyou

↧

Form of .intervals

October 13, 2018, 6:57 am

≫ Next: SVGenotyper error: Cannot get platform for read

≪ Previous: Variant not being called by HC GATK v3.7-0-gcfedb67

Hi!
I'm running GenomicsDBImport to combine my 100 samples and my command line is

gatk --java-options "-Xmx10240m -Djava.io.tmpdir=./" GenomicsDBImport --genomicsdb-workspace-path my_database -L 1.intervals --sample-name-map map2

The error message is
A USER ERROR has occurred: Badly formed genome unclippedLoc: Query interval "Contig:scaffold1 start:0 end:416820" is not valid for this input.

So I want to know what the intervals document should be like.

My reference only has thousands of sacffolds in it .
So the .intervals I used is
Contig:scaffold1 start:0 end:416820
Contig:scaffold2 start:0 end:868635
Contig:scaffold3 start:0 end:530760
Contig:scaffold4 start:0 end:723991
Contig:scaffold7 start:0 end:431581
Contig:scaffold8 start:0 end:94119
Contig:scaffold9 start:0 end:1039220
Contig:scaffold10 end:start:0 1039754
……

Is there anything wrong? How to correct it?

Thanks a lot!

Niu

↧

SVGenotyper error: Cannot get platform for read

July 29, 2018, 11:33 am

≫ Next: Could I use variantEval to evaluate somatic mutation calling by mutect2?

≪ Previous: Form of .intervals

I saw the other recent post regarding bam header issues in SVDiscovery, but thought I would post a separate question since this regards a separate script. I've run the CNVDiscovery pipeline on a set of high coverage bams and am now attempting to genotype some lower coverage individuals at these previously discovered structural variants. All of these samples have the necessary metadata and have been run separately through CNVDiscovery, so all bams will play well with at least some of Genome STRiP's functionality.

However, when I run SVGenotyper, I get the following error:

INFO  12:37:45,082 HelpFormatter - ----------------------------------------------------------------------------------------- 
INFO  12:37:45,090 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7.GS-r1748-0-g74bfe0b, Compiled 2018/04/10 10:30:23 
INFO  12:37:45,090 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
INFO  12:37:45,090 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
INFO  12:37:45,090 HelpFormatter - [Sun Jul 29 12:37:45 CDT 2018] Executing on Linux 2.6.32-696.30.1.el6.x86_64 amd64 
INFO  12:37:45,090 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01 
INFO  12:37:45,100 HelpFormatter - Program Args: -T SVGenotyperWalker -R /home/hirschc1/pmonnaha/misc-files/gstrip/W22_chr1-10.fasta -O /panfs/roc/scratch/pmonnaha/Maize/gstrip/w22_allSamps_smallWindows/P0120.genotypes.vcf.gz -disableGATKTraversal true -md /home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-0 -md /home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-1 -md /home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-2 -configFile /home/hirschc1/pmonnaha/software/svtoolkit/conf/genstrip_parameters.txt -P input.platformMapFile:/home/hirschc1/pmonnaha/misc-files/gstrip/platform_map.txt -P depth.parityCorrectionThreshold:null -runDirectory /panfs/roc/scratch/pmonnaha/Maize/gstrip/w22_allSamps_smallWindows -genderMapFile /home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-0/sample_gender.report.txt -genderMapFile /home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-1/sample_gender.report.txt -genderMapFile /home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-2/sample_gender.report.txt -ploidyMapFile /home/hirschc1/pmonnaha/misc-files/gstrip/W22_chr1-10.ploidymap.txt -vcf /panfs/roc/scratch/pmonnaha/Maize/gstrip/w22_BigSamps_smallWindows/results/gs_cnv.genotypes.vcf.gz -partitionName P0120 -partition records:11901-12000 -L chr1:1-1 
INFO  12:37:45,104 HelpFormatter - Executing as pmonnaha@cn0536 on Linux 2.6.32-696.30.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_144-b01. 
INFO  12:37:45,105 HelpFormatter - Date/Time: 2018/07/29 12:37:45 
INFO  12:37:45,105 HelpFormatter - ----------------------------------------------------------------------------------------- 
INFO  12:37:45,105 HelpFormatter - ----------------------------------------------------------------------------------------- 
INFO  12:37:45,133 29-Jul-2018 GenomeAnalysisEngine - Strictness is SILENT
INFO  12:37:45,405 29-Jul-2018 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO  12:37:45,444 29-Jul-2018 IntervalUtils - Processing 1 bp from intervals
INFO  12:37:45,608 29-Jul-2018 GenomeAnalysisEngine - Preparing for traversal
INFO  12:37:45,609 29-Jul-2018 GenomeAnalysisEngine - Done preparing for traversal
INFO  12:37:45,610 29-Jul-2018 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  12:37:45,610 29-Jul-2018 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
INFO  12:37:45,610 29-Jul-2018 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
INFO  12:37:45,616 29-Jul-2018 SVGenotyper - Opening reference sequence ...
INFO  12:37:45,616 29-Jul-2018 SVGenotyper - Opened reference sequence.
INFO  12:37:45,618 29-Jul-2018 MetaData - Opening metadata ... 
INFO  12:37:45,618 29-Jul-2018 MetaData - Adding metadata location /home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-0 ...
INFO  12:37:45,619 29-Jul-2018 MetaData - Adding metadata location /home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-1 ...
INFO  12:37:45,619 29-Jul-2018 MetaData - Adding metadata location /home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-2 ...
INFO  12:37:45,623 29-Jul-2018 MetaData - Opened metadata.
INFO  12:37:45,627 29-Jul-2018 SVGenotyper - Initializing input data set ...
INFO  12:37:45,834 29-Jul-2018 SVGenotyper - Initialized data set: 99 files, 162 read groups, 99 samples.
INFO  12:37:46,024 29-Jul-2018 MetaData - Loading insert size distributions ...
INFO  12:37:46,302 29-Jul-2018 MetaData - Loading insert size distributions ...
INFO  12:37:46,395 29-Jul-2018 MetaData - Loading insert size distributions ...
INFO  12:37:46,486 29-Jul-2018 ReadCountCache - Initializing read count cache with 3 files.
##### ERROR --
##### ERROR stack trace 
java.lang.RuntimeException: Cannot get platform for read HISEQ07:449:H3MCMBCXX:1:1116:14082:78574
    at org.broadinstitute.sv.util.ReadPairOrientation.getOrientation(ReadPairOrientation.java:99)
    at org.broadinstitute.sv.genotyping.ReadPairMapper.processSingleRead(ReadPairMapper.java:203)
    at org.broadinstitute.sv.genotyping.ReadPairMapper.searchWindow(ReadPairMapper.java:175)
    at org.broadinstitute.sv.genotyping.ReadPairMapper.getReadPairs(ReadPairMapper.java:157)
    at org.broadinstitute.sv.genotyping.GenotypingReadPairModule.getReadPairs(GenotypingReadPairModule.java:217)
    at org.broadinstitute.sv.genotyping.GenotypingReadPairModule.genotypeSample(GenotypingReadPairModule.java:104)
    at org.broadinstitute.sv.genotyping.GenotypingReadPairModule.genotypeCnp(GenotypingReadPairModule.java:66)
    at org.broadinstitute.sv.genotyping.GenotypingReadPairModule.genotypeCnp(GenotypingReadPairModule.java:40)
    at org.broadinstitute.sv.genotyping.GenotypingAlgorithm.genotypeCnpInternal(GenotypingAlgorithm.java:149)
    at org.broadinstitute.sv.genotyping.GenotypingAlgorithm.genotypeCnp(GenotypingAlgorithm.java:114)
    at org.broadinstitute.sv.genotyping.SVGenotyperWalker.processVCFFile(SVGenotyperWalker.java:273)
    at org.broadinstitute.sv.genotyping.SVGenotyperWalker.map(SVGenotyperWalker.java:217)
    at org.broadinstitute.sv.genotyping.SVGenotyperWalker.map(SVGenotyperWalker.java:57)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:106)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
    at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:141)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:91)
    at org.broadinstitute.sv.main.SVGenotyper.main(SVGenotyper.java:21)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7.GS-r1748-0-g74bfe0b):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Cannot get platform for read HISEQ07:449:H3MCMBCXX:1:1116:14082:78574
##### ERROR ---------------------------------------------------------------------------

I saw in a separate post (https://gatkforums.broadinstitute.org/gatk/discussion/1511/svdiscovery-walker) that this issue can be fixed for SVDiscovery Walker by specifying -P input.platformMapFile:platform_map.txt, and that this fix was planned for SVGenotyper as well.

I created the platform map file, but am still receiving the error. Was this option ever implemented SVGenotyper or will I need to re-header the bams?

My call to SVGenotyper is as follows:

classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"
java -Xmx${MEM}g -cp ${classpath} \
     org.broadinstitute.gatk.queue.QCommandLine \
     -S ${SV_DIR}/qscript/SVGenotyper.q \
     -S ${SV_DIR}/qscript/SVQScript.q \
     -cp ${classpath} \
     -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
     -configFile ${SV_DIR}/conf/genstrip_parameters.txt \
     -R /home/hirschc1/pmonnaha/misc-files/gstrip/${REF}_chr1-10.fasta \
     -vcf ${VCF} \
     -I ${BAM_LIST} \
     -O ${OUT}/${NAME} \
     -md /home/hirschc1/pmonnaha/misc-files/gstrip/${REF}_MetaData_E2-0 \
     -md /home/hirschc1/pmonnaha/misc-files/gstrip/${REF}_MetaData_E2-1 \
     -md /home/hirschc1/pmonnaha/misc-files/gstrip/${REF}_MetaData_E2-2 \
     -runDirectory ${OUT} \
     -jobLogDir ${OUT}/logs \
     -jobRunner Drmaa \
     -gatkJobRunner Drmaa \
     -P input.platformMapFile:/home/hirschc1/pmonnaha/misc-files/gstrip/platform_map.txt \
     -P depth.parityCorrectionThreshold:null \
     -retry 10 \
     -parallelRecords 100 \
     -memLimit ${MEM} \
     -jobNative '-l walltime=12:00:00 -A hirschc1' \
     -run

Thanks,
Patrick

↧