Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all 12345 articles
Browse latest View live

GenomicsDBImport bus error

$
0
0

Hi,

I am a new user for gatk4. Recently I tried to combine 800 g.vcf files. It is quite slow by directly use 'CombineGVCFs', so I tried to use 'GenomicsDB' to make cobimed db file first. While after running few day, I just check the log file it is showd '3410 Bus error', what's the meaning of this error and how to solve it?
By the way, I want know is there any better method to combine g.vcf files more faster?

Thank you!

Qiang


haplotypecalling of side-by-side variants homo/hetero

$
0
0

Hi,

we currently observed several cases were we find hetero/homo variants in to continous positions. Looking on the BAM file verifies that the variants are on the same allele/reads.
The GATK HaplotypeCaller will always call these variants as two single variants whereas for example freebayes will call it as a single variant.

e.g. in VCF file format:

GATK
10 123123123 G A ...
10 123123124 C A ...

freebayes
10 123123123 GC AA ...

Why is GATK Haplotypecaller not combining these two variants, since we are interested in the haplotype? Did I miss a flag for the command line etc. ?

Best.

Marten

Call Variant locally and on spark but the results have different PPV and sensitivity. WHY?

$
0
0

I use the sam ubam as input and run it on two ways. One is on local, the other is on spark.
Local: run BWA, MarkDuplicates, BaseRecalibrator, ApplyBQSRSpark, HaplotypeCaller.
Spark: run BWASpark, ReadsPipelineSpark to get final result.

However, we get different validation result. The result of spark version has lower PPV and sensitivity than local version. What is the reason for that ? Is there any advice to improve the PPV and sensitivity of Spark verison?

SNP PPV INDEL PPV SNP Sensitivity INDEL Sensitivity
Local 99.13% 94.48% 99.65% 94.98%
Spark 98.98% 92.76% 99.32% 95.02%

https://macrofaretry.com/

$
0
0

Macrofare circulating on the Internet regarding what a healthy diet should be. Thats why this book was necessary based on scientific research funded by a national public body such as the Carlos III Health Institute and told simply explains the professor of Public Health at the University of Navarra and visiting professor at Harvard

http://www.ebizoffer.com/ketozin/

$
0
0

KetozinSecond.your framework is loaded with poisons. Water keeps itself unadulterated. It likewise helps hydrate your blood, scrub the kidneys, help the Ketozin liver and that's just the beginning. There are very a few wellbeing and general prosperity benefits.

rafaelner

$
0
0

Could be a quality male sweetening product from vanguard health that improves erectile health. it's been specially designed to enhance premature ejaculations, offer more durable erections, improve endurance power and boost sexual sensations. This formula is Associate in Nursing improved version of the initial Testo Drive 365 Canada supplement because it currently offers facilitate with premature ejaculations.

https://paltroxttry.com/

$
0
0

PaltroxT accelerate the recovery of muscles after a training or competition so that these reach the most optimal point in the following work sessions and, in this way, can train and compete without the risk of breaking at any time. This is how the discharge massage helps you prevent possible injuries Doing the massage of discharge

HaplotypeCaller Missing a high quality SNP

$
0
0

I know it's a common question, but I have checked with https://broadinstitute.org/gatk/guide/article?id=1235 and I'm quite confident the problem lies elsewhere.

I have a trio with a high quality SNP passed from dad to proband.
I am running HaplotypeCaller in basepair resolution mode, one individual at a time, and then processing as usual through the "N+1" pipeline.
The variant in the proband's GVCF is called at high quality: "0/1:63,55,0:118:99:1383,0,1709,1571,1873,3445:24,39,19,36"

However, that position in the dad's GVCF is not called: "0/0:52,52:104:0:0,0,168"
despite noting 52 alt reads out of a total of 104 reads. The GQ is 0. This is using both the official release of 3.5, and the latest nightly build (as of today).

If I use an older version of GATK, 3.1, the variant in the dad is found at high quality: "0/1:52,49,0:101:99:1086,0,1397,1245,1541,2786:16,36,16,33"
with 49 alt reads out of 101 and a GQ of 99.

At this point, I'm not sure what to do. I'd prefer not to have to recall my 10k+ samples with version 3.1!
I sent a reproducible snippet to the ftp server, with commands, etc.

Thanks,
Jason


Shred T3X

$
0
0

The recipe professes to expand the fundamental hormones in body which encourages you to accomplish uplifted stamina and young perseverance for pinnacle execution. It additionally causes you to shred your fat cells in body normally and augments your muscle recuperation process post exercise. It additionally claims to sustain the muscle cells, while forestalling muscle misfortune. It underpins you in performing harder at exercise center to make critical muscle development

Missing sample_file -sf option in GATK4 SelectVariants

$
0
0

In the port over to gatk SelectVariants the sample name -sn and sample expression -se options made it in, but the sample_file -sf option did not seem to make it. I tried:

gatk SelectVariants \
  -V in.vcf.gz \
  -O out.vcf.gz \
  -RF SampleReadFilter \
  -sample sample_list_file.txt

as a workaround, and it didn't have any errors, but the output file is more than three times the size of the input file, and still shows all the samples.

Best practices for recalling target sites

$
0
0

I have a set of ~2000 samples and I want to call variants on all of them, for a set list of sites. For each sample, I have a g.vcf saved, which includes the entire genome, not just the target sites.

I've tried using GenomicsDBImport with the list of sites in a vcf file added using the -L option, but it doesn't work because it interprets each site as a different interval.
gatk --java-options "-Xmx100g -Xms100g" GenomicsDBImport --batch-size 1000 -L tmp_outliersites.vcf --sample-name-map all_samples.sample_map --reader-threads 10 --genomicsdb-workspace-path tmp_database

....
java.util.concurrent.CompletionException: org.broadinstitute.hellbender.exceptions.GATKException: Cannot call query with different interval, expected:HanXRQChr01:18001917-18001917 queried with: HanXRQChr01:35358953-35358953
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.broadinstitute.hellbender.exceptions.GATKException: Cannot call query with different interval, expected:HanXRQChr01:18001917-18001917 queried with: HanXRQChr01:35358953-35358953
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport$InitializedQueryWrapper.query(GenomicsDBImport.java:766)
at com.intel.genomicsdb.importer.GenomicsDBImporter.(GenomicsDBImporter.java:165)
at com.intel.genomicsdb.importer.GenomicsDBImporter.lambda$null$2(GenomicsDBImporter.java:604)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)

It also gives me this warning:

15:26:37.956 WARN GenomicsDBImport - A large number of intervals were specified. Using more than 100 intervals in a single import is not recommended and can cause performance to suffer. It is recommended that intervals be aggregated together.

Is there a better way of doing this? I could do it one site at a time, but that takes like 5-10 minutes and I have about 200K sites I want to process.

JEXL Substring Matching

$
0
0

Hi there,

I have a record in my VCF that looks like this:

clinvar.CLNSIGCONF=Likely_benign(3)

What I would like to do is send a -select expression through SelectVariants that will keep records when certain substrings are found. So, for this, I'm trying something like:

-select "Likely_benign =~ clinvar.CLNSIGCONF"

I've tried this with various quoting, tried to add wildcards, and a bunch of other things. I can't seem to figure out how to make this behave the way I want to. Also, no errors, and no variants in the VCF. Any JEXL experts feel like lending me a hand?

Thanks!

Bug with login via twitter

$
0
0

I tried to join using my twitter account, and it redirected to a twitter authorize this app window. But after clicking Approve, it was just caught in a redirect loop back to that same page. Tried in Chrome and Edge.

GenomicsDBImport --intervals

$
0
0

Hi,
I am using a non-chromosomal genome.
Since GenomicsDBImport can only take in single genomic interval, how should I set the parameter for --intervals?
For contig >LFYR01000729.1 Zostera marina strain Finnish scaffold_1, whole genome shotgun sequence, should I set it as --intervals LFYR01000729.1 Zostera marina strain Finnish scaffold_1, whole genome shotgun sequence?

GenomicsDBImport terminates after Overlapping contigs found error

$
0
0

My original query was about batching and making intervals for GenomicsDBImport, but I have run into a new problem. I am using version 4.0.7.0 I tried the following:

gatk GenomicsDBImport \
--java-options "-Xmx250G -XX:+UseParallelGC -XX:ParallelGCThreads=24" \
-V input.list \
--genomicsdb-workspace-path 5sp_45ind_assmb_00 \
--intervals interval.00.list \
--batch-size 9 

where I have split my list of contigs into 50 lists, and set batch size as 9 (instead of reading in 45 g.vcf at once) for a total of 5 batches. It looks like it has started to run, but terminated quickly after an error.

The resulting stack trace is:

00:53:23.869 INFO  GenomicsDBImport - HTSJDK Version: 2.16.0
00:53:23.869 INFO  GenomicsDBImport - Picard Version: 2.18.7
00:53:23.869 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
00:53:23.869 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
00:53:23.869 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
00:53:23.869 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
00:53:23.869 INFO  GenomicsDBImport - Deflater: IntelDeflater
00:53:23.869 INFO  GenomicsDBImport - Inflater: IntelInflater
00:53:23.869 INFO  GenomicsDBImport - GCS max retries/reopens: 20
00:53:23.869 INFO  GenomicsDBImport - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
00:53:23.869 INFO  GenomicsDBImport - Initializing engine
01:26:13.490 INFO  IntervalArgumentCollection - Processing 58057410 bp from intervals
01:26:13.517 INFO  GenomicsDBImport - Done initializing engine
Created workspace /home/leq/gvcfs/5sp_45ind_assmb_00
01:26:13.655 INFO  GenomicsDBImport - Vid Map JSON file will be written to 5sp_45ind_assmb_00/vidmap.json
01:26:13.655 INFO  GenomicsDBImport - Callset Map JSON file will be written to 5sp_45ind_assmb_00/callset.json
01:26:13.655 INFO  GenomicsDBImport - Complete VCF Header will be written to 5sp_45ind_assmb_00/vcfheader.vcf
01:26:13.655 INFO  GenomicsDBImport - Importing to array - 5sp_45ind_assmb_00/genomicsdb_array
01:26:13.656 INFO  ProgressMeter - Starting traversal
01:26:13.656 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
01:33:16.970 INFO  GenomicsDBImport - Importing batch 1 with 9 samples
[libprotobuf ERROR google/protobuf/io/coded_stream.cc:207] A protocol message was rejected because it was too big (more than 67108864 bytes).  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
Contig/chromosome ctg7180018354961 begins at TileDB column 0 and intersects with contig/chromosome ctg7180018354960 that spans columns [1380207667, 1380207970] terminate called after throwing an instance of 'ProtoBufBasedVidMapperException' what():  
ProtoBufBasedVidMapperException : Overlapping contigs found

How do I overcome this issue of 'overlapping contigs found'? Is there a problem with my set of contigs? Also, is the warning about protocol messages something to worry about?

Thank you!


Difference betwee raw data and down-regulated data

$
0
0

Hi,GATK team!
We make a panel in GATK. This panel is about 3M. I tested a sample of a cancer patient. The normal size of this sample is 2.0G and the tumor is 4.6G. At the same time, I lowered the sample of the tumor so that the normal and tumor data volume is 1:1, and the same process is used for testing. The two raw.vcf obtained are compared, as shown in the figure.
In this figure,the middle (1176) is the common mutation site of the two original files, while the purple part (272) is the mutation site generated by the original data, and the red part (92) is the mutation site generated after the data volume is down-regulated. In theory, The mutation site generated after the data is down-regulated should be included in the mutation site generated by the original data, but the result is different from the theory. Why is this result? Does your team have an optimal ratio for the adjustment of the amount of data?
Could you please help me to solve this problem,Thanks.

Variant not being called by HC GATK v3.7-0-gcfedb67

$
0
0

Hello,
We are calling variants on data that has been sequenced on NextSeq platform. We have been using the same pipeline , with the same commands since a year and in all our runs we have a control sample to check if the variant calling and sequencing has been done right. For this particular run, one of a known SNP 7:143013285 that was called in the same sample in the previous 5 runs (over the last year) was missed by haplotype caller. On looking at the bam file, the variant seems to be present (highlighted BAM file). Above two bam files are from the same sample that were called in previous runs and HC was able to pick it up. The command I use are as follows

trim_galore -q 0 --paired --fastqc $R1_fastq $R2_fastq --output_dir $FASTQ

bwa mem -M -t 8 $ind.fa $FASTQ/${s_id}.R1_val_1.fq.gz $FASTQ/${s_id}.R2_val_2.fq.gz | sambamba_v0.6.6 view -t 8 -S -h -f bam -o $s_id.bam /dev/stdin
sambamba_v0.6.6 sort -t 8 -o $s_id.sorted.bam $s_id.bam
sambamba_v0.6.6 index -t 8 $s_id.sorted.bam

java -jar $picard AddOrReplaceReadGroups I=$s_id.sorted.bam O=$s_id.sorted.RG.bam SORT_ORDER=coordinate RGID=$s_id RGLB=$flowcell RGPL=illumina RGPU=U RGSM=$RUNNAME
sambamba_v0.6.6 index $s_id.sorted.RG.bam 
sambamba_v0.6.6 markdup -t 8 $s_id.sorted.RG.bam $s_id.markdup.bam 
sambamba_v0.6.6 index $s_id.markdup.bam 

java -Xmx8g -Djava.io.tmpdir=/ionng/tmp -jar $gatk -T BaseRecalibrator \
    -I $s_id.markdup.bam \
    -R $ind.fa \
        -knownSites dbsnp_138.b37.vcf \
        -knownSites Mills_and_1000G_gold_standard.indels.b37.vcf \
        -knownSites 1000G_phase1.indels.b37.vcf \
    -o $s_id.recal_data.table \
    -L $bed 

#Apply the Recalibration
java -Xmx8g -Djava.io.tmpdir=$TMPDIR -jar $gatk -T PrintReads \
    -I $s_id.markdup.bam \
    -R $ind.fa \
    -BQSR $s_id.recal_data.table \
    -o $s_id.${RUNNAME}.variant_ready.bam 

java -Xmx32g -jar $gatk -T HaplotypeCaller \
    -R $ind.fa --dbsnp $dbsnp_138.b37.vcf \
    -I $s_id.${RUNNAME}.variant_ready.bam \
    -stand_call_conf 30.0 \
    -L $bed \
    -o $s_id.${RUNNAME}.g.vcf

Things that I have tried which did not work:-
1. tried running HC with the options -allowNonUniqueKmers
2. also tried options to change parameters -stand_call_conf 2.0 -mmq 5
3. tried -ERC BP_RESOLUTION that results in
7 143013285 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:9,2:11:0:0,0,152

NOTE: The variant was picked up when I ran FREEBAYES and VARSCAN using default parameters.

FREEBAYES
7 143013285 . C T 206.73 . AB=0.394737;ABP=6.66752;AC=1;AF=0.5;AN=2;AO=15;CIGAR=1X;DP=38;DPB=38;DPRA=0;EPP=20.5268;EPPR=37.093;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=47.6012;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=495;QR=811;RO=23;RPL=2;RPP=20.5268;RPPR=37.093;RPR=13;RUN=1;SAF=15;SAP=35.5824;SAR=0;SRF=23;SRP=52.9542;SRR=0;TYPE=snp;technology.illumina=1 GT:DP:AD:RO:QR:AO:QA:GL 0/1:38:23,15:23:811:15:495:-33.4265,0,-61.8717

VARSCAN
7 143013285 . C T . PASS ADP=38;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:52:38:38:23:15:39.47%:5.4463E-6:35:33:23:0:15:0

I can email the bamout file if required (though I am not allowed to upload it publicly.)

Any suggestions will be helpful. Thanks.

Thankyou

Mutect2 AF in normal sample

$
0
0

Hi,
I’m trying to use Mutect2 (GATK version 4.0.2.0) for somatic mutation calling following the steps present in this tutorial (https://gatkforums.broadinstitute.org/gatk/discussion/11136/how-to-call-somatic-mutations-using-gatk4-mutect2). I noticed some discrepancy in the AD and AF FORMAT field for the normal sample.

This is an example:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TUMOR NORMAL

chr1 855969 . G A . . DP=56;ECNT=1;NLOD=3.82;N_ART_LOD=-1.290e+00;POP_AF=2.500e-06;P_GERMLINE=-3.123e+00;TLOD=8.01

GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:21,4:0.197:15,4:6,0:29:0,0:45:9:0.162,0.00,0.160:0.013,0.036,0.952
0/0:13,0:0.283:9,0:4,0:0:0,0:0:0

Why, for the normal, I have a frequency of 0.283 if the allelic depths for the reference allele is 13 and for the alternative allele is 0 (so the AF should be 0)?

This is my command line:

gatk Mutect2 \
-R reference.fasta \
-I tumor.bam \
-I normal.bam \
-tumor TUMOR \
-normal NORMAL \
--germline-resource af-only-gnomad_b37.vcf.gz \
--af-of-alleles-not-in-resource 0.0000025 \
--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter
-L region_file.bed \
-O somatic_mutation.vcf

Thanks for your attention.

Andrea

GenomicsDBImport "RunConfigException" Error when scale is high

$
0
0

Hi,
I am having a problem in successfully running GenomicsDBImport with ~10K samples. Most of the times, I get this below mentioned error, while in a very few cases (genomic intervals) I had success.

terminate called after throwing an instance of 'RunConfigException'
  what():  RunConfigException : ifs.is_open()

Could you help me to understand the error message please ?

Note: I have a decent ulimit set (~65K) , so higher number of file descriptors can open at the same time.

The code I run: GATK version: 4.0.0.0

java -d64 -Xmx64g -Xms64g -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar gatk.4.0.0.0.jar GenomicsDBImport \
    --genomicsdb-workspace-path /some/path/genomicsdb-workspace \
    --interval-padding 500 \
    --batch-size 50 \
    --intervals chr1:257667-297968 \
    --reader-threads 5 \
    --TMP_DIR /some/path/tmp \
   --variant /some/path/sample1.vcf.gz
   --variant /some/path/sample2.vcf.gz
   .......
   --variant /some/path/sample10000.vcf.gz

Thank you,

Kousik

ASEReadCounter- A USER ERROR has occurred: Invalid argument

$
0
0

Hi! I wrote a post about having trouble with java and gatk but I fixed it and now I'm getting another error message, this time it seems to be a gatk problem. I'm trying to run an allele-specific expression analysis using ASEReadCounter. My command line is the following:

gatk ASEReadCounter \ 
-R /home/nalu/ref_macaco/ref_final/Mfas_ref.fa \ 
-I /home/nalu/macaco/clean_merge/E13_1/sample_1_fixmate_sort.bam \ 
-V /home/nalu/macaco/input_varscan/variantes/vcf_E13_1 \ 
-O /home/nalu/teste_gatk.table

And I'm getting an error message containing a list of required and optional arguments. By the end of the message it says the following:

***********************************************************************

A USER ERROR has occurred: Invalid argument ' -R'.

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

I removed the reference option since it's not required, so that after specifying ASEReadCounter, the first argument was -I, then it returned me the same message but this time addressing that the invalid argument was -I.

I used the command it suggested to print the stack trace and got the following:

***********************************************************************

A USER ERROR has occurred: Invalid argument ' -R'.

***********************************************************************
org.broadinstitute.barclay.argparser.CommandLineException: Invalid argument ' -R'.
    at org.broadinstitute.barclay.argparser.CommandLineArgumentParser.setPositionalArgument(CommandLineArgumentParser.java:600)
    at org.broadinstitute.barclay.argparser.CommandLineArgumentParser.parseArguments(CommandLineArgumentParser.java:432)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.parseArgs(CommandLineProgram.java:233)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:207)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

Can someone help me? Is there something wrong with my command line?

Thanks!

Viewing all 12345 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>