Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

Error of INDEL mode during VQSR process

$
0
0
Hello,
I'm trying to do VQSR on exome data of 50,000 samples. Since this dataset is too big, I used GenomicsDBImport to merge. Whole exome merging is also slower than expected, so I did this per chromosome.
SNP mode for VQSR went on smoothly while the INDEL mode has several problems.
I used the command below at first:

```
time $gatk VariantRecalibrator \
-R $reference \
-V $outdir/population/${outname}.HC.snps.VQSR.vcf.gz \
-resource:mills,known=true,training=true,truth=true,prior=12.0 $GATK_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
-an DP -an QD -an FS -an SOR -an ReadPosRankSum -an MQRankSum\
-mode INDEL \
--max-gaussians 6 \
--rscript-file $outdir/population/${outname}.HC.indels.plots.R \
--tranches-file $outdir/population/${outname}.HC.indels.tranches \
-O $outdir/population/${outname}.HC.snps.indels.recal && \
time $gatk ApplyVQSR \
-R $reference \
-V $outdir/population/${outname}.HC.snps.VQSR.vcf.gz \
--truth-sensitivity-filter-level 99.0 \
--tranches-file $outdir/population/${outname}.HC.snps.indels.tranches \
--recal-file $outdir/population/${outname}.HC.snps.indels.recal \
-mode INDEL \
-O $outdir/population/${outname}.HC.VQSR.vcf.gz && echo "** SNPs and Indels VQSR (${sample}.HC.VQSR.vcf.gz finish) done **"
```
There are two warnings:
WARN VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable.
WARN VariantRecalibratorEngine - Evaluate datum returned a NaN
And the program stopped due to "No data found"

I searched on this forum for solution and removed "-MQRankSum". The recalibration process passed this time although the first warning is still there:
WARN VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable.

But the ApplyVQSR process stopped, due to:

```
13:55:06.069 INFO ApplyVQSR - Deflater: IntelDeflater
13:55:06.069 INFO ApplyVQSR - Inflater: IntelInflater
13:55:06.070 INFO ApplyVQSR - GCS max retries/reopens: 20
13:55:06.070 INFO ApplyVQSR - Requester pays: disabled
13:55:06.070 INFO ApplyVQSR - Initializing engine
13:55:06.517 INFO FeatureManager - Using codec VCFCodec to read file file:///home/pang/data/public_data/UKBB/exome_population/population/ukb_efe_chr4.HC.snps.indels.recal
13:55:06.593 INFO FeatureManager - Using codec VCFCodec to read file file:///home/pang/data/public_data/UKBB/exome_population/population/ukb_efe_chr4.HC.snps.VQSR.vcf.gz
13:55:06.776 INFO ApplyVQSR - Done initializing engine
13:55:06.778 INFO ApplyVQSR - Shutting down engine
[December 11, 2019 1:55:06 PM CET] org.broadinstitute.hellbender.tools.walkers.vqsr.ApplyVQSR done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=1933574144
***********************************************************************

A USER ERROR has occurred: Couldn't read file /home/pang/data/public_data/UKBB/exome_population/population/ukb_efe_chr4.HC.snps.indels.tranches. Error was: /home/pang/data/public_data/UKBB/exome_population/population/ukb_efe_chr4.HC.snps.indels.tranches with exception: /home/pang/data/public_data/UKBB/exome_population/population/ukb_efe_chr4.HC.snps.indels.tranches

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
```

I do not understand what this "exception" is. Could you give me some suggestion on how to solve this?
On the other hand, I know the limitation for VQSR is above 30 exomes, in some cases it is the reason for "No data found" error. Although I did VQSR for each chromosome separately, I think 50,000 samples is enough.
Is there some method to merge these chromosomes together and do the VQSR afterwards?

Thanks!
Shichao

Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>