Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all 12345 articles
Browse latest View live

How do I resolve the log4j errors while running BaseRecalibratorSpark using GATK4?

$
0
0

These are the errors:
log4j:ERROR A "org.apache.log4j.ConsoleAppender" object is not assignable to a "org.apache.log4j.Appender" variable.
log4j:ERROR The class "org.apache.log4j.Appender" was loaded by
log4j:ERROR [sun.misc.Launcher$AppClassLoader@7e6cbb7a] whereas object of type
log4j:ERROR "org.apache.log4j.ConsoleAppender" was loaded by [org.apache.spark.util.ChildFirstURLClassLoader@5ef60048].
log4j:ERROR Could not instantiate appender named "console".
log4j:ERROR A "org.apache.log4j.ConsoleAppender" object is not assignable to a "org.apache.log4j.Appender" variable.
log4j:ERROR The class "org.apache.log4j.Appender" was loaded by
log4j:ERROR [sun.misc.Launcher$AppClassLoader@7e6cbb7a] whereas object of type
log4j:ERROR "org.apache.log4j.ConsoleAppender" was loaded by [org.apache.spark.util.ChildFirstURLClassLoader@5ef60048].
log4j:ERROR Could not instantiate appender named "console".

While running this command:

./gatk-launch BaseRecalibratorSpark -I gs://mybucket/example.bam -R gs://mybucket/example.fasta -knownSites sample.vcf.gz -O test.spark.table -apiKey $APIKEY -- --sparkRunner GCS --cluster samplecluster --num-executors 5 --executor-cores 2 --executor-memory 4g


Why "NO_READS" from DiagnoseTargets despite lots of reads in interval?

$
0
0

I a m running DiagnoseTargets on a list of intervals corresponding to targeted exons. In the result file, intervals are filtered by i.e. PASS, LOW_COVERAGE, COVERAGE_GAPS or NO_READS for each sample as well as for the sample set as a whole. In the individual-sample FORMAT fields, information on TF (filter) and IDP (average sample depth across interval) are given. How come I often see a combination like this?

TF:IDP NO_READS:96.09

Filtered as NO_READS, yet average depth of 96.09 across the interval? Of course, PART of the interval may be without reads, even more than the threshold set by --coverage_status_threshold, but this is what I understand is meant by COVERAGE_GAPS. I also wondered whether the reads might all have been totally filtered out due to low quality parameters and adjusted --minimum_base_quality and --minimum_mapping_quality to 0, but NO_READS are still flagged for a number of intervals, despite IDP being far from 0. Is this a bug, or have I misunderstood something?

L. Pihlstrom

Understanding and adapting the generic hard-filtering recommendations

$
0
0

This document aims to provide insight into the logic of the generic hard-filtering recommendations that we provide as a substitute for VQSR. Hopefully it will also serve as a guide for adapting these recommendations or developing new filters that are appropriate for datasets that diverge significantly from what we usually work with.


Introduction

Hard-filtering consists of choosing specific thresholds for one or more annotations and throwing out any variants that have annotation values above or below the set thresholds. By annotations, we mean properties or statistics that describe for each variant e.g. what the sequence context is like around the variant site, how many reads covered it, how many reads covered each allele, what proportion of reads were in forward vs reverse orientation, and so on.

The problem with this approach is that it is very limiting because it forces you to look at each annotation dimension individually, and you end up throwing out good variants just because one of their annotations looks bad, or keeping bad variants in order to keep those good variants.

In contrast, VQSR is more powerful because it uses machine-learning algorithms to learn from the data what are the annotation profiles of good variants (true positives) and of bad variants (false positives) in a particular dataset. This empowers you to pull out variants based on how they cluster together along different dimensions, and liberates you to a large extent from the linear tyranny of single-dimension thresholds.

Unfortunately this method requires a large number of variants and well-curated known variant resources. For those of you working with small gene panels or with non-model organisms, this is a deal-breaker, and you have to fall back on hard-filtering.


Outline

In this article, we illustrate how the generic hard-filtering recommendations we provide relate to the distribution of annotation values we typically see in callsets produced by our variant calling tools, and how this in turn relates to the underlying physical properties of the sequence data.

We also use results from VQSR filtering (which we take as ground truth in this context) to highlight the limitations of hard-filtering.

We do this in turn for each of five annotations that are highly informative among the recommended annotations: QD, FS, MQ, MQRankSum and ReadPosRankSum. The same principles can be applied to most other annotations produced by GATK tools.


Overview of data and methods

Origin of the dataset

We called variants on a whole genome trio (samples NA12878, NA12891, NA12892, previously pre-processed) using HaplotypeCaller in GVCF mode, yielding a gVCF file for each sample. We then joint-genotyped the gVCFs using GenotypeGVCF, yielding an unfiltered VCF callset for the trio. Finally, we ran VQSR on the trio VCF, yielding the filtered callset. We will be looking at the SNPs only.

Plotting methods and interpretation notes

All plots shown below are density plots generated using the ggplot2 library in R. On the x-axis are the annotation values, and on the y-axis are the density values. The area under the density plot gives you the probability of observing the annotation values. So, the entire area under all of the plots will be equal to 1. However, if you would like to know the probability of observing an annotation value between 0 and 1, you will have to take the area under the curve between 0 and 1.

In plain English, this means that the plots shows you, for a given set of variants, what is the distribution of their annotation values. The caveat is that when we're comparing two or more sets of variants on the same plot, we have to keep in mind that they may contain very different numbers of variants, so the amount of variants in a given part of the distribution is not directly comparable; only their proportions are comparable.


QualByDepth (QD)

This is the variant confidence (from the QUAL field) divided by the unfiltered depth of non-hom-ref samples. This annotation is intended to normalize the variant quality in order to avoid inflation caused when there is deep coverage. For filtering purposes it is better to use QD than either QUAL or DP directly.

The generic filtering recommendation for QD is to filter out variants with QD below 2. Why is that?

First, let’s look at the QD values distribution for unfiltered variants. Notice the values can be anywhere from 0-40. There are two peaks where the majority of variants are (around QD = 12 and QD = 32). These two peaks correspond to variants that are mostly observed in heterozygous (het) versus mostly homozygous-variant (hom-var) states, respectively, in the called samples. This is because hom-var samples contribute twice as many reads supporting the variant than do het variants. We also see, to the left of the distribution, a "shoulder" of variants with QD hovering between 0 and 5.

image

We expect to see a similar distribution profile in callsets generated from most types of high-throughput sequencing data, although values where the peaks form may vary.

Now, let’s look at the plot of QD values for variants that passed VQSR and those that failed VQSR. Red indicates the variants that failed VQSR, and blue (green?) the variants that passed VQSR.

image

We see that the majority of variants filtered out correspond to that low-QD "shoulder" (remember that since this is a density plot, the y-axis indicates proportion, not number of variants); that is what we would filter out with the generic recommendation of the threshold value 2 for QD.

Notice however that VQSR has failed some variants that have a QD greater than 30! All those variants would have passed the hard filter threshold, but VQSR tells us that these variants looked artifactual in one or more other annotation dimensions. Conversely, although it is not obvious in the figure, we know that VQSR has passed some variants that have a QD less than 2, which hard filters would have eliminated from our callset.


FisherStrand (FS)

This is the Phred-scaled probability that there is strand bias at the site. Strand Bias tells us whether the alternate allele was seen more or less often on the forward or reverse strand than the reference allele. When there little to no strand bias at the site, the FS value will be close to 0.

Note: SB, SOR and FS are related but not the same! They all measure strand bias (a type of sequencing bias in which one DNA strand is favored over the other, which can result in incorrect evaluation of the amount of evidence observed for one allele vs. the other) in different ways. SB gives the raw counts of reads supporting each allele on the forward and reverse strand. FS is the result of using those counts in a Fisher's Exact Test. SOR is a related annotation that applies a different statistical test (using the SB counts) that is better for high coverage data.

Let’s look at the FS values for the unfiltered variants. The FS values have a very wide range; we made the x-axis log-scaled so the distribution is easier to see. Notice most variants have an FS value less than 10, and almost all variants have an FS value less than 100. However, there are indeed some variants with a value close to 400.

image

The plot below shows FS values for variants that passed VQSR and failed VQSR.

image

Notice most of the variants that fail have an FS value greater than 55. Our hard filtering recommendations tell us to fail variants with an FS value greater than 60. Notice that although we are able to remove many false positives by removing variants with FS greater than 60, we still keep many false positive variants. If we move the threshold to a lower value, we risk losing true positive variants.


StrandOddsRatio (SOR)

This is another way to estimate strand bias using a test similar to the symmetric odds ratio test. SOR was created because FS tends to penalize variants that occur at the ends of exons. Reads at the ends of exons tend to only be covered by reads in one direction and FS gives those variants a bad score. SOR will take into account the ratios of reads that cover both alleles.

Let’s look at the SOR values for the unfiltered variants. The SOR values range from 0 to greater than 9. Notice most variants have an SOR value less than 3, and almost all variants have an SOR value less than 9. However, there is a long tail of variants with a value greater than 9.

image

The plot below shows SOR values for variants that passed VQSR and failed VQSR.

image

Notice most of the variants that have an SOR value greater than 3 fail the VQSR filter. Although there is a non-negligible population of variants with an SOR value less than 3 that failed VQSR, our hard filtering recommendation of failing variants with an SOR value greater than 3 will at least remove the long tail of variants that show fairly clear bias according to the SOR test.


RMSMappingQuality (MQ)

This is the root mean square mapping quality over all the reads at the site. Instead of the average mapping quality of the site, this annotation gives the square root of the average of the squares of the mapping qualities at the site. It is meant to include the standard deviation of the mapping qualities. Including the standard deviation allows us to include the variation in the dataset. A low standard deviation means the values are all close to the mean, whereas a high standard deviation means the values are all far from the mean.When the mapping qualities are good at a site, the MQ will be around 60.

Now let’s check out the graph of MQ values for the unfiltered variants. Notice the very large peak around MQ = 60. Our recommendation is to fail any variant with an MQ value less than 40.0. You may argue that hard filtering any variant with an MQ value less than 50 is fine as well. This brings up an excellent point that our hard filtering recommendations are meant to be very lenient. We prefer to keep all potentially decent variants rather than get rid of a few bad variants.

image

Let’s look at the VQSR pass vs fail variants. At first glance, it seems like VQSR has passed the variants in the high peak and failed any variants not in the peak.

image

It is hard to tell which variants passed and failed, so let’s zoom in and see what exactly is happening.

image

The plot above shows the x-axis from 59-61. Notice the variants in blue (the ones that passed) all have MQ around 60. However, some variants in red (the ones that failed) also have an MQ around 60.


MappingQualityRankSumTest (MQRankSum)

This is the u-based z-approximation from the Rank Sum Test for mapping qualities. It compares the mapping qualities of the reads supporting the reference allele and the alternate allele. A positive value means the mapping qualities of the reads supporting the alternate allele are higher than those supporting the reference allele; a negative value indicates the mapping qualities of the reference allele are higher than those supporting the alternate allele. A value close to zero is best and indicates little difference between the mapping qualities.

Next, let’s look at the distribution of values for MQRankSum in the unfiltered variants. Notice the values range from approximately -10.5 to 6.5. Our hard filter threshold is -12.5. There are no variants in this dataset that have MQRankSum less than -10.5! In this case, hard filtering would not fail any variants based on MQRankSum. Remember, our hard filtering recommendations are meant to be very lenient. If you do plot your annotation values for your samples and find none of your variants have MQRankSum less than -12.5, you may want to refine your hard filters. Our recommendations are indeed recommendations that you the scientist will want to refine yourself.

image

Looking at the plot of pass VQSR vs fail VQSR variants, we see the variants with an MQRankSum value less than -2.5 fail VQSR. However, the region between -2.5 to 2.5 contains both pass and fail variants. Are you noticing a trend here? It is very difficult to pick a threshold for hard filtering. If we pick -2.5 as our hard filtering threshold, we still have many variants that fail VQSR in our dataset. If we try to get rid of those variants, we will lose some good variants as well. It is up to you to decide how many false positives you would like to remove from your dataset vs how many true positives you would like to keep and adjust your threshold based on that.

image


ReadPosRankSumTest (ReadPosRankSum)

This is the u-based z-approximation from the Rank Sum Test for site position within reads. It compares whether the positions of the reference and alternate alleles are different within the reads. Seeing an allele only near the ends of reads is indicative of error, because that is where sequencers tend to make the most errors. A negative value indicates that the alternate allele is found at the ends of reads more often than the reference allele; a positive value indicates that the reference allele is found at the ends of reads more often than the alternate allele. A value close to zero is best because it indicates there is little difference between the positions of the reference and alternate alleles in the reads.

The last annotation we will look at is ReadPosRankSum. Notice the values fall mostly between -4 and 4. Our hard filtering threshold removes any variant with a ReadPosRankSum value less than -8.0. Again, there are no variants in this dataset that have a ReadPosRankSum value less than -8.0, but some datasets might. If you plot your variant annotations and find there are no variants that have a value less than or greater than one of our recommended cutoffs, you will have to refine them yourself based on your annotation plots.

image

Looking at the VQSR pass vs fail variants, we can see VQSR has failed variants with ReadPosRankSum values less than -1.0 and greater than 3.5. However, notice VQSR has failed some variants that have values that pass VQSR.

image

What's the difference between b37 and hg19 resources?

$
0
0

Hi, all. I have questions on resource bundles.
Are the 'hg19' bundle files just liftover from 'b37' bundles in UCSC-style? If so, why are there some variants in only one version and not the other? For example, the variant 'rs34872315 (on chr1)' is in b37 version of dbsnp137.excluding_sites_after_129.vcf, but not in hg19 version. At first, I thought it's because of the differences in reference genome (vcf files in the bundle are fit for the accompanying reference sequences). But the reference chromosome 1 was the same in both bundles. Can you help me to understand the difference between b37 and hg19 resource bundles?

How do I produce a file with genotyping data admixture can use?

$
0
0

I have had my whole genome sequenced by Veritas and they have provided VCF.GZ and BAM files with the data. I'm wanting to run it with the 1000 Genomes data in admixture.

So I downloaded the snps.genotypes VCF.GZ and VCF.GZ.TBI files from here:

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/hd_genotype_chip/

In vcftools I converted them to PED files. Then I took my VCF.GZ file and did the same thing, converted it to PED file. I then merged them in plink and stored the result as BED files.

I then ran them in admixture, which returns an error, saying the following:

Error: detected that all genotypes are missing for an individual.

Please apply quality-control filters to remove such individuals.

So I run it in plink again, using the "--mind" flag, and it removes 1 individual. The problem is, the individual it removed is ME! So there was no genotyping data for me!

It seems this is because the VCF file doesn't have genotyping data, so I want to rectify it. Unfortunately, as I work on this, I keep running up against terms like "SNP calling", which seems extremely vague and open-ended, and unclear as to how it helps me meet my goal. I'm just wanting to be able to get my genotyping data in the BED file with all the other participants and run it in admixture. My understanding is GATK can help me, so what can I do?

BQSR with MuTect2: use it or not ?

$
0
0

Hello,

I've been reading some threads on the forum about BQSR with MuTect2. I know it has been proposed in Best-Practices uses. However, there were a lot of mixed comments and I can't find a clear conclusion on whether to use BQSR with MuTect2 since MuTect2 takes into consideration the base quality score, and that's what BQSR does. I am working on 18 human samples matched normal and tumor. Those samples have been exome-sequenced. I am using MuTect2 from GATK 3.7 stable version. I generated results using the proposed pipeline here. I used the following inputs:

  • Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
  • dbsnp_138.hg19.vcf
  • hg19_ref_genome.fa

Following this thread here for example, I am worried that potential true variants could be altered due to recalibration.

I also have another doubt, in BQSR thread, I just want to make sure that BQSR does NOT change the base of the variant itself but it just assigns a low base quality score if it gets recalibrated.

I have analyzed commands ran by The Cancer Genome Atlas and they actually use BQSR in their workflow. So finally, I would like to know if it safe to use BQSR with MuTect2 ? It is better to have multiple dbSNPs to avoid having mismatches of potential variants (for example, I have downloaded from NCBI all kwown SNPs of the human ~ 57GB vcf file) ?

Thank you in advance !

Problem with joint genotyping

$
0
0

Dear Team,

I am running joint calling (according to Best Practices). I have ~800 samples from 3 different centres.First I run CombineGVCFs on cases from batch1(~130samples), controls from batch1(~130samples), cases from batch2 (~130samples) etc. Next, I run GenotypeGVCFs on these 6 files.

According to my final file I have 19 HOM and 32 HET in chr1:47571951.

However, when I looked at the individual g.vcfs ( generated by HaplotypeCaller, --emitConfidence GVCF) the specific variant is recorded in 1/800 g.vcfs (from batch1).

1 47571951 . T TTA, 0 . DP=25;MLEAC=0,0;MLEAF=0.00,0.00;MQ=58.61;MQ0=0 GT:AD:DP:GQ:PL:SB 0/0:21,0,0:21:63:0,63,735,63,735,735:8,13,0,0

but when I looked that the bam file, there is nothing.

After running CombineGVCFs with other files from batch1 controls I have this:

1 47571951 . T TTA, . . DP=8537;MQ=58.61;MQ0=0 GT:AD:DP:MIN_DP:PL:SB./.:.:80:63:0,86,1514,86,1514,1514 ./.:.:60:46:0,110,1343,110,1343,1343 ./.:.:59:50:0,95,1202,95,1202,1202 ./.:.:91:65:0,115,1541,115,1541,1541./.:.:90:74:0,120,1800,120,1800,1800 ./.:.:49:38:0,70,921,70,921,921 ./.:.:52:40:0,88,1080,88,1080,1080 ./.:.:54:46:0,61,1216,61,1216,1216 ./.:.:63:45:0,106,1119,106,1119,1119 ./.:.:46:36:0,78,873,78,873,873 ./.:.:48:35:0,67,771,67,771,771 ./.:.:77:70:0,85,1741,85,1741,1741 ./.:.:64:43:0,81,948,81,948,948 ./.:.:65:51:0,120,1800,120,1800,1800 ./.:.:90:70:0,110,1701,110,1701,1701 ./.:.:66:44:0,114,1003,114,1003,1003 ./.:.:63:57:0,116,1376,116,1376,1376 ./.:.:57:52:0,113,1547,113,1547,1547 ./.:.:74:51:0,93,1122,93,1122,1122 ./.:.:81:53:0,73,1273,73,1273,1273 ./.:.:33:25:0,69,710,69,710,710 ./.:.:70:63:0,120,1800,120,1800,1800 ./.:.:45:34:0,69,830,69,830,830 ./.:.:64:47:0,75,1155,75,1155,1155 ./.:.:59:28:0,72,780,72,780,780 ./.:.:61:45:0,93,1516,93,1516,1516 ./.:.:52:45:0,66,1240,66,1240,1240 ./.:.:77:62:0,120,1800,120,1800,1800 ./.:.:55:48:0,86,1358,86,1358,1358 ./.:.:86:61:0,64,1553,64,1553,1553 ./.:.:106:86:0,120,1800,120,1800,1800 ./.:.:86:84:0,120,1800,120,1800,1800 ./.:.:65:51:0,65,1133,65,1133,1133 ./.:.:71:53:0,104,1392,104,1392,1392 ./.:.:56:46:0,110,1197,110,1197,1197 ./.:.:70:51:0,88,1222,88,1222,1222 ./.:.:50:40:0,84,1260,84,1260,1260 ./.:.:58:41:0,95,1017,95,1017,1017 ./.:.:73:62:0,120,1800,120,1800,1800 ./.:.:48:41:0,87,1049,87,1049,1049 ./.:.:14:12:0,21,361,21,361,361 ./.:.:83:55:0,120,1800,120,1800,1800 ./.:.:58:50:0,90,1350,90,1350,1350 ./.:.:63:36:0,107,875,107,875,875 ./.:.:69:68:0,120,1800,120,1800,1800 ./.:.:69:55:0,120,1800,120,1800,1800 ./.:.:51:42:0,75,1348,75,1348,1348 ./.:.:77:41:0,79,1120,79,1120,1120 ./.:.:57:45:0,109,1051,109,1051,1051 ./.:.:50:36:0,69,984,69,984,984 ./.:.:68:58:0,120,1800,120,1800,1800 ./.:.:42:31:0,62,741,62,741,741 ./.:.:49:35:0,80,878,80,878,878 ./.:.:62:50:0,117,1755,117,1755,1755 ./.:.:60:55:0,89,1485,89,1485,1485 ./.:.:53:39:0,68,990,68,990,990 ./.:.:62:55:0,120,1800,120,1800,1800 ./.:.:58:51:0,64,1167,64,1167,1167 ./.:.:40:23:0,66,990,66,990,990 ./.:.:75:63:0,94,1800,94,1800,1800 ./.:.:73:60:0,119,1800,119,1800,1800 ./.:.:65:41:0,72,983,72,983,983 ./.:.:74:66:0,84,1669,84,1669,1669 ./.:.:82:67:0,120,1800,120,1800,1800 ./.:.:50:39:0,110,1071,110,1071,1071 ./.:.:64:48:0,116,1495,116,1495,1495 ./.:.:65:40:0,104,1095,104,1095,1095 ./.:.:33:32:0,71,851,71,851,851 ./.:.:62:50:0,75,1383,75,1383,1383 ./.:.:71:54:0,119,1505,119,1505,1505./.:.:43:28:0,62,681,62,681,681 ./.:.:71:66:0,93,1645,93,1645,1645 ./.:.:38:33:0,61,859,61,859,859 ./.:.:76:47:0,101,1121,101,1121,1121 ./.:.:65:47:0,71,1078,71,1078,1078 ./.:.:58:42:0,65,1144,65,1144,1144 ./.:.:70:59:0,111,1785,111,1785,1785 ./.:.:33:32:0,72,1080,72,1080,1080 ./.:.:43:37:0,84,1260,84,1260,1260 ./.:.:99:55:0,109,1361,109,1361,1361 ./.:.:53:42:0,106,990,106,990,990 ./.:.:55:32:0,80,769,80,769,769 ./.:.:75:41:0,90,1065,90,1065,1065 ./.:.:60:50:0,114,1710,114,1710,1710 ./.:.:98:78:0,120,1800,120,1800,1800 ./.:.:80:48:0,94,1089,94,1089,1089 ./.:.:73:43:0,80,1061,80,1061,1061 ./.:.:38:31:0,66,697,66,697,697 ./.:.:52:33:0,98,890,98,890,890 ./.:.:45:43:0,101,1257,101,1257,1257 ./.:.:43:34:0,65,680,65,680,680 ./.:.:86:82:0,120,1800,120,1800,1800 ./.:.:44:38:0,62,960,62,960,960 ./.:.:61:54:0,89,1393,89,1393,1393 ./.:.:43:37:0,81,989,81,989,989 ./.:.:87:67:0,120,1800,120,1800,1800 ./.:.:54:35:0,80,892,80,892,892 ./.:.:65:53:0,93,1181,93,1181,1181 ./.:.:66:58:0,120,1800,120,1800,1800 ./.:.:57:44:0,91,971,91,971,971 ./.:.:53:45:0,80,1246,80,1246,1246 ./.:.:62:49:0,95,1159,95,1159,1159 ./.:.:53:40:0,95,969,95,969,969 ./.:.:86:72:0,120,1800,120,1800,1800./.:21,0,0:21:.:0,63,735,63,735,735:8,13,0,0 ./.:.:54:45:0,120,1800,120,1800,1800 ./.:.:74:63:0,120,1800,120,1800,1800 ./.:.:74:52:0,117,1755,117,1755,1755 ./.:.:61:44:0,79,1273,79,1273,1273 ./.:.:61:54:0,104,1617,104,1617,1617 ./.:.:89:59:0,100,1721,100,1721,1721 ./.:.:92:65:0,67,1504,67,1504,1504 ./.:.:67:52:0,93,1450,93,1450,1450 ./.:.:86:73:0,120,1800,120,1800,1800 ./.:.:62:51:0,81,1437,81,1437,1437 ./.:.:59:47:0,72,1161,72,1161,1161 ./.:.:67:50:0,74,1800,74,1800,1800 ./.:.:35:32:0,71,849,71,849,849 ./.:.:77:71:0,108,1664,108,1664,1664 ./.:.:35:28:0,63,809,63,809,809 ./.:.:62:45:0,88,1208,88,1208,1208 ./.:.:76:57:0,70,1187,70,1187,1187 ./.:.:51:35:0,63,783,63,783,783 ./.:.:63:53:0,120,1800,120,1800,1800 ./.:.:39:36:0,86,954,86,954,954 ./.:.:56:40:0,68,1010,68,1010,1010 ./.:.:62:42:0,84,1159,84,1159,1159 ./.:.:69:43:0,99,1264,99,1264,1264 ./.:.:69:37:0,66,870,66,870,870 ./.:.:138:90:0,88,1800,88,1800,1800 ./.:.:52:47:0,66,1334,66,1334,1334 ./.:.:66:51:0,95,1155,95,1155,1155 ./.:.:61:35:0,103,953,103,953,953 ./.:.:74:73:0,120,1800,120,1800,1800 ./.:.:40:36:0,62,954,62,954,954 ./.:.:95:79:0,89,1800,89,1800,1800 ./.:.:74:53:0,63,1149,63,1149,1149 ./.:.:66:63:0,112,1560,112,1560,1560 ./.:.:70:67:0,120,1800,120,1800,1800 ./.:.:123:106:0,120,1800,120,1800,1800 ./.:.:32:32:0,81,1215,81,1215,1215 ./.:.:66:40:0,97,1051,97,1051,1051 ./.:.:57:36:0,76,980,76,980,980 ./.:.:69:49:0,120,1800,120,1800,1800 ./.:.:58:43:0,93,1014,93,1014,1014 ./.:.:51:30:0,69,734,69,734,734 ./.:.:80:65:0,120,1800,120,1800,1800 ./.:.:83:71:0,120,1800,120,1800,1800 ./.:.:54:44:0,73,1209,73,1209,1209 ./.:.:46:38:0,80,961,80,961,961 ./.:.:63:56:0,97,1383,97,1383,1383 ./.:.:57:41:0,88,1337,88,1337,1337 ./.:.:70:44:0,71,1112,71,1112,1112 ./.:.:67:65:0,120,1800,120,1800,1800 ./.:.:84:74:0,120,1800,120,1800,1800 ./.:.:74:56:0,120,1800,120,1800,1800 ./.:.:46:34:0,90,918,90,918,918 ./.:.:44:34:0,67,906,67,906,906 ./.:.:29:24:0,60,793,60,793,793 ./.:.:43:27:0,76,711,76,711,711 ./.:.:91:68:0,113,1525,113,1525,1525 ./.:.:65:63:0,118,1641,118,1641,1641 ./.:.:59:41:0,102,978,102,978,978 ./.:.:101:82:0,97,1800,97,1800,1800 ./.:.:59:44:0,67,1081,67,1081,1081 ./.:.:48:37:0,62,842,62,842,842 ./.:.:50:38:0,98,959,98,959,959 ./.:.:87:72:0,120,1800,120,1800,1800 ./.:.:64:58:0,75,1535,75,1535,1535 ./.:.:33:31:0,67,910,67,910,910 ./.:.:65:39:0,61,865,61,865,865 ./.:.:84:62:0,73,1540,73,1540,1540 ./.:.:75:66:0,120,1800,120,1800,1800

And results from controls from batch3(CombineGVCFS) I have this (different variant but the same position)

1 47571951 . T . . . GT:DP:GQ:MIN_DP:PL ./.:78:99:69:0,120,1800 ./.:62:99:52:0,73,1184 ./.:56:99:47:0,94,1143 ./.:55:99:54:0,120,1800 ./.:71:99:58:0,120,1800 ./.:41:99:39:0,61,923 ./.:37:99:36:0,80,921 ./.:30:84:30:0,65,754 ./.:28:72:25:0,69,1035 ./.:48:99:42:0,74,1047 ./.:47:99:40:0,85,950 ./.:140:99:115:0,120,1800 ./.:103:99:87:0,120,1800 ./.:173:99:126:0,120,1800 ./.:148:99:122:0,81,1800 ./.:185:99:148:0,120,1800 ./.:163:99:112:0,120,1800 ./.:174:99:136:0,120,1800 ./.:161:99:143:0,120,1800 ./.:214:99:161:0,104,1800 ./.:146:99:118:0,120,1800 ./.:181:99:136:0,120,1800 ./.:101:99:86:0,120,1800 ./.:55:99:43:0,93,1121 ./.:108:99:87:0,120,1800 ./.:174:99:121:0,85,1800 ./.:149:99:113:0,120,1800 ./.:127:99:96:0,98,1800 ./.:194:99:134:0,120,1800 ./.:149:99:128:0,120,1800 ./.:153:99:141:0,120,1800 ./.:137:99:136:0,120,1800 ./.:186:99:172:0,120,1800 ./.:42:99:33:0,61,928 ./.:69:99:48:0,112,1791 ./.:39:99:35:0,64,842 ./.:26:60:25:0,60,900 ./.:60:99:51:0,71,1230 ./.:61:99:51:0,104,1199 ./.:67:99:63:0,91,1631 ./.:63:99:57:0,120,1800 ./.:66:99:58:0,75,1619 ./.:58:99:47:0,90,1111 ./.:119:99:88:0,120,1800 ./.:103:99:80:0,120,1800 ./.:108:99:77:0,120,1800 ./.:180:99:137:0,120,1800 ./.:128:99:99:0,72,1800 ./.:61:99:56:0,62,1333 ./.:59:99:52:0,120,1800 ./.:117:99:103:0,120,1800 ./.:64:99:55:0,106,1318 ./.:62:99:41:0,66,978./.:51:99:39:0,115,967 ./.:153:99:126:0,85,1800 ./.:199:99:153:0,120,1800 ./.:144:99:121:0,120,1800 ./.:137:99:111:0,120,1800 ./.:64:99:47:0,72,1375 ./.:65:99:49:0,73,1214 ./.:45:99:44:0,110,1455 ./.:143:99:101:0,96,1800 ./.:168:99:157:0,120,1800 ./.:122:99:104:0,120,1800 ./.:156:99:123:0,120,1800 ./.:243:99:179:0,90,1800 ./.:117:99:111:0,120,1800 ./.:184:99:161:0,120,1800 ./.:149:99:145:0,120,1800 ./.:168:99:137:0,82,1800 ./.:164:99:126:0,120,1800 ./.:176:99:149:0,120,1800 ./.:129:99:97:0,120,1800 ./.:182:99:161:0,120,1800 ./.:52:99:49:0,120,1800 ./.:215:99:171:0,120,1800 ./.:217:99:182:0,120,1800 ./.:128:99:94:0,120,1800 ./.:132:99:125:0,116,1800 ./.:131:99:106:0,120,1800 ./.:152:99:133:0,120,1800 ./.:138:99:118:0,120,1800 ./.:139:99:112:0,120,1800 ./.:207:99:190:0,120,1800 ./.:163:99:133:0,120,1800 ./.:192:99:157:0,120,1800 ./.:152:99:121:0,108,1800 ./.:160:99:134:0,120,1800 ./.:204:99:168:0,120,1800 ./.:229:99:178:0,120,1800 ./.:190:99:147:0,120,1800 ./.:193:99:180:0,87,1800 ./.:256:99:246:0,120,1800 ./.:226:99:155:0,120,1800 ./.:206:99:146:0,120,1800 ./.:213:99:155:0,120,1800 ./.:214:99:154:0,120,1800 ./.:126:99:114:0,115,1800 ./.:140:99:133:0,120,1800 ./.:106:99:105:0,120,1800 ./.:156:99:124:0,91,1800 ./.:103:99:100:0,120,1800 ./. ./.:100:99:91:0,71,1800 ./.:141:99:135:0,120,1800 ./.:120:99:116:0,120,1800 ./.:102:99:87:0,79,1800 ./.:87:99:80:0,64,1800 ./.:145:99:111:0,94,1800 ./.:107:99:101:0,120,1800 ./.:104:99:84:0,120,1800 ./.:109:99:93:0,120,1800 ./.:117:99:111:0,120,1800 ./.:103:99:83:0,120,1800 ./.:92:99:62:0,80,1532 ./.:161:99:126:0,120,1800 ./.:160:99:114:0,120,1800 ./.:221:99:156:0,105,1800 ./.:104:99:83:0,120,1800 ./.:126:99:124:0,120,1800 ./.:116:99:99:0,120,1800 ./.:103:99:83:0,120,1800 ./.:136:99:108:0,120,1800 ./.:117:99:108:0,120,1800 ./.:134:99:115:0,120,1800 ./.:146:99:139:0,120,1800 ./.:137:99:103:0,120,1800 ./.:127:99:90:0,90,1800 ./.:112:99:96:0,120,1800 ./.:241:99:182:0,120,1800 ./.:144:99:126:0,120,1800 ./.:136:99:123:0,120,1800 ./.:201:99:176:0,120,1800 ./.:121:99:102:0,120,1800 ./.:31:90:26:0,62,669 ./.:87:99:61:0,120,1800 ./.:30:77:24:0,69,693 ./.:80:99:74:0,120,1800 ./.:72:99:46:0,61,996 ./.:170:99:160:0,120,1800 ./.:74:99:69:0,73,1594 ./.:93:99:71:0,115,1733 ./.:104:99:92:0,120,1800 ./.:133:99:108:0,71,1800 ./.:120:99:100:0,72,1800 ./.:145:99:134:0,120,1800 ./.:164:99:149:0,120,1800 ./.:80:99:69:0,111,1465 ./.:96:99:66:0,120,1800 ./.:146:99:129:0,120,1800 ./.:83:99:52:0,102,1127 ./.:170:99:143:0,120,1800 ./.:133:99:103:0,120,1800 ./.:129:99:112:0,120,1800 ./.:88:99:81:0,120,1800 ./.:109:99:86:0,120,1800 ./.:161:99:149:0,120,1800 ./.:211:99:162:0,95,1800 ./.:114:99:91:0,120,1800 ./.:137:99:108:0,120,1800 ./.:121:99:93:0,106,1800 ./.:118:99:99:0,120,1800 ./.:177:99:133:0,120,1800

The variant is not present in cases batch 1,2,3 and controls batch2 (in CombineGVCFs file)

When I run GenotypeGVCFs with 6 files (3 case 3 control files) I have this: According to this, the variant is present in 19 HOM and 32 HET all from batch 3.

1 47571951 . T TTA 39185.39 . AC=70;AF=0.045;AN=1554;DP=59833;FS=6.566;GQ_MEAN=107.66;GQ_STDDEV=49.77;InbreedingCoeff=0.4981;MLEAC=68;MLEAF=0.044;MQ=58.61;MQ0=0;NCC=2;QD=32.97 GT:AD:DP:GQ:PL 1/1:0,0:56:99:917,162,0 0/1:9,0:31:99:241,0,135 0/1:9,0:31:70:70,0,157 0/0:0,0:17:0:0,0,0 0/0:13,0:29:39:0,39,265 0/1:4,0:23:54:54,0,62 0/0:69,0:69:99:0,120,1800 0/0:52,0:52:73:0,73,1184 0/0:47,0:47:94:0,94,1143 0/0:54,0:54:99:0,120,1800 0/0:58,0:58:99:0,120,1800 0/1:2,0:14:19:60,0,19 0/0:3,0:8:9:0,9,53 1/1:0,0:16:47:283,47,0 0/1:3,0:14:26:57,0,260/0:39,0:39:61:0,61,923 0/0:36,0:36:80:0,80,921 0/0:30,0:30:65:0,65,754 0/0:25,0:25:69:0,69,1035 0/0:42,0:42:74:0,74,1047 0/0:40,0:40:85:0,85,950 0/1:12,0:96:50:998,0,50 0/0:35,0:81:72:0,72,842 0/1:24,0:81:99:717,0,371 0/1:32,0:110:99:726,0,512 0/1:26,0:97:99:308,0,420 0/1:15,0:69:99:652,0,179 0/0:115,0:115:99:0,120,1800 0/0:87,0:87:99:0,120,1800 0/0:126,0:126:99:0,120,1800 0/0:122,0:122:81:0,81,1800 0/0:148,0:148:99:0,120,1800 0/0:112,0:112:99:0,120,1800 0/0:46,0:98:99:0,101,1070 0/1:37,0:147:99:1242,0,569 0/0:47,0:97:56:0,56,1011 0/1:16,0:87:99:934,0,150 0/0:34,0:88:79:0,79,762 0/1:19,0:93:99:786,0,242 0/0:136,0:136:99:0,120,1800 0/0:143,0:143:99:0,120,1800 0/0:161,0:161:99:0,104,1800 0/0:118,0:118:99:0,120,1800 0/0:136,0:136:99:0,120,1800 0/1:13,0:73:99:745,0,130 0/0:25,0:49:20:0,20,574 1/1:0,0:100:99:1925,295,0 0/1:22,0:97:99:772,0,292 1/1:0,0:101:99:1906,291,0 1/1:0,0:85:99:1632,250,0 0/0:86,0:86:99:0,120,1800 0/0:43,0:43:93:0,93,1121 0/0:87,0:87:99:0,120,1800 0/0:121,0:121:85:0,85,1800 0/0:113,0:113:99:0,120,1800 0/1:10,0:63:85:664,0,85 0/0:8,0:21:19:0,19,202 0/0:17,0:39:48:0,48,323 0/0:14,0:34:14:0,14,304 0/0:28,0:57:63:0,63,711 0/0:20,0:37:59:0,59,504 0/0:96,0:96:98:0,98,1800 0/0:134,0:134:99:0,120,1800 0/0:128,0:128:99:0,120,1800 0/0:141,0:141:99:0,120,1800 0/0:136,0:136:99:0,120,1800 0/0:172,0:172:99:0,120,1800 0/0:6,0:19:5:0,5,126 0/1:3,0:12:28:28,0,52 0/0:9,0:17:3:0,3,174 ./.:4,0:4 0/0:1,0:24:3:0,3,40 1/1:0,0:60:99:1101,177,0 0/0:33,0:33:61:0,61,928 0/0:48,0:48:99:0,112,1791 0/0:35,0:35:64:0,64,842 0/0:25,0:25:60:0,60,900 0/0:14,0:20:19:0,19,293 0/0:7,0:19:21:0,21,157 0/0:8,0:14:23:0,23,187 0/0:10,0:21:30:0,30,412 0/1:5,0:27:47:265,0,47 0/0:51,0:51:71:0,71,1230 0/0:51,0:51:99:0,104,1199 0/0:63,0:63:91:0,91,1631 0/0:57,0:57:99:0,120,1800 0/0:58,0:58:75:0,75,1619 0/0:47,0:47:90:0,90,1111 0/0:45,0:79:99:0,105,1054 0/0:20,0:76:58:0,58,485 0/0:23,0:59:69:0,69,540 0/0:33,0:69:69:0,69,777 0/1:20,0:75:99:746,0,269 0/0:88,0:88:99:0,120,1800 0/0:80,0:80:99:0,120,1800 0/0:77,0:77:99:0,120,1800 0/0:137,0:137:99:0,120,1800 0/0:99,0:99:72:0,72,1800 0/1:3,0:28:20:225,0,20 0/1:8,0:33:98:292,0,98 0/0:12,0:22:35:0,35,264 0/0:7,0:12:21:0,21,167 1/1:0,0:26:51:319,51,0 0/1:3,0:26:31:245,0,31 0/0:56,0:56:62:0,62,1333 0/0:52,0:52:99:0,120,1800 0/0:103,0:103:99:0,120,1800 0/0:55,0:55:99:0,106,1318 0/0:41,0:41:66:0,66,978 0/0:39,0:39:99:0,115,967 0/1:14,0:74:99:587,0,161 0/0:16,0:50:14:0,14,340 1/1:0,0:73:99:1434,216,0 1/1:0,0:82:99:1515,239,0 0/1:20,0:47:99:131,0,380 0/1:6,0:49:12:591,0,12 0/0:126,0:126:85:0,85,1800 0/0:153,0:153:99:0,120,1800 0/0:121,0:121:99:0,120,1800 0/0:111,0:111:99:0,120,1800 0/0:47,0:47:72:0,72,1375 0/0:49,0:49:73:0,73,1214 0/0:44,0:44:99:0,110,1455 1/1:0,0:54:99:1011,159,0 0/1:12,0:46:99:317,0,191 0/0:17,0:28:51:0,51,413 0/0:21,0:35:63:0,63,506 0/0:29,0:48:51:0,51,696 0/1:15,0:45:99:345,0,292 0/1:6,0:36:44:429,0,44 0/0:11,0:25:18:0,18,243 0/0:8,0:31:1:0,1,163 0/0:101,0:101:96:0,96,1800 0/0:157,0:157:99:0,120,1800 0/0:104,0:104:99:0,120,1800 0/0:123,0:123:99:0,120,1800 0/0:179,0:179:90:0,90,1800 0/0:111,0:111:99:0,120,1800 0/0:161,0:161:99:0,120,1800 0/0:145,0:145:99:0,120,1800 0/0:137,0:137:82:0,82,1800 0/0:126,0:126:99:0,120,1800 0/0:149,0:149:99:0,120,1800 0/0:97,0:97:99:0,120,1800 0/0:161,0:161:99:0,120,1800 0/0:49,0:49:99:0,120,1800 0/0:171,0:171:99:0,120,1800 0/0:182,0:182:99:0,120,1800 0/0:94,0:94:99:0,120,1800 0/0:18,0:29:20:0,20,366 0/1:7,0:59:14:668,0,14 1/1:0,0:66:99:1230,192,0 0/1:16,0:57:99:547,0,551 1/1:4,0:32:23:477,23,0 0/0:20,0:32:35:0,35,405 0/0:125,0:125:99:0,116,1800 0/0:106,0:106:99:0,120,1800 0/0:133,0:133:99:0,120,1800 0/0:118,0:118:99:0,120,1800 0/0:112,0:112:99:0,120,1800 0/0:190,0:190:99:0,120,1800 0/0:133,0:133:99:0,120,1800 0/0:157,0:157:99:0,120,1800 0/0:121,0:121:99:0,108,1800 0/0:134,0:134:99:0,120,1800 0/0:168,0:168:99:0,120,1800 0/0:178,0:178:99:0,120,1800 0/0:147,0:147:99:0,120,1800 0/0:180,0:180:87:0,87,1800 0/0:246,0:246:99:0,120,1800 0/0:155,0:155:99:0,120,1800 0/0:146,0:146:99:0,120,1800 0/0:155,0:155:99:0,120,1800 0/0:154,0:154:99:0,120,1800 0/0:114,0:114:99:0,115,1800 0/0:133,0:133:99:0,120,1800 0/0:105,0:105:99:0,120,1800 0/0:124,0:124:91:0,91,1800 0/0:100,0:100:99:0,120,1800 ./.:0,0 0/0:91,0:91:71:0,71,1800 0/0:135,0:135:99:0,120,1800 0/0:116,0:116:99:0,120,1800 0/0:87,0:87:79:0,79,1800 0/0:80,0:80:64:0,64,1800 0/0:111,0:111:94:0,94,1800 0/0:101,0:101:99:0,120,1800 0/0:84,0:84:99:0,120,1800 0/0:93,0:93:99:0,120,1800 0/0:111,0:111:99:0,120,1800 0/0:83,0:83:99:0,120,1800 0/0:62,0:62:80:0,80,1532 1/1:0,0:137:99:2707,405,0 0/1:28,0:79:82:82,0,565 1/1:0,0:167:99:3260,494,0 0/0:126,0:126:99:0,120,1800 0/0:114,0:114:99:0,120,1800 0/0:156,0:156:99:0,105,1800 1/1:4,0:35:4:451,4,0 1/1:0,0:148:99:2594,429,0 1/1:0,0:63:99:1130,182,0 1/1:0,0:69:99:1340,204,0 0/0:43,0:112:36:0,36,891 0/0:81,0:135:99:0,194,1884 0/0:63,0:63:79:0,79,1800 0/0:82,0:82:99:0,120,1800 0/0:87,0:87:99:0,120,1800 0/0:127,0:127:99:0,120,1800 0/0:117,0:117:99:0,100,1800 0/0:64,0:64:99:0,115,1800 0/0:84,0:84:99:0,120,1800 0/0:75,0:75:99:0,120,1800 0/0:69,0:69:67:0,67,1800 0/0:81,0:81:99:0,120,1800 0/0:85,0:85:99:0,120,1800 0/0:83,0:83:99:0,120,1800 0/0:80,0:80:99:0,120,1800 0/0:104,0:104:99:0,120,1800 0/0:72,0:72:99:0,120,1800 0/0:72,0:72:99:0,120,1800 0/0:73,0:73:99:0,120,1800 0/0:91,0:91:99:0,120,1800 0/0:65,0:65:99:0,120,1800 0/0:99,0:99:99:0,120,1800 0/0:59,0:59:99:0,106,1800 0/0:97,0:97:99:0,120,1800 0/0:58,0:58:99:0,120,1800 0/0:123,0:123:99:0,120,1800 0/0:97,0:97:99:0,120,1800 0/0:86,0:86:99:0,120,1800 0/0:106,0:106:99:0,120,1800 0/0:50,0:50:99:0,100,1800 0/0:100,0:100:99:0,101,1800 0/0:104,0:104:99:0,120,1800 0/0:95,0:95:99:0,120,1800 0/0:67,0:67:99:0,120,1800 0/0:69,0:69:99:0,120,1800 0/0:75,0:75:99:0,120,1800 0/0:95,0:95:99:0,120,1800 0/0:83,0:83:99:0,120,1800 0/0:53,0:53:89:0,89,1705 0/0:70,0:70:99:0,110,1800 0/0:68,0:68:99:0,120,1800 0/0:63,0:63:99:0,120,1800 0/0:83,0:83:99:0,120,1800 0/0:124,0:124:99:0,120,1800 0/0:99,0:99:99:0,120,1800 0/0:83,0:83:99:0,120,1800 0/0:108,0:108:99:0,120,1800 0/0:108,0:108:99:0,120,1800 0/0:115,0:115:99:0,120,1800 0/1:15,0:52:99:448,0,250 0/0:139,0:139:99:0,120,1800 0/0:103,0:103:99:0,120,1800 0/0:90,0:90:90:0,90,1800 0/0:96,0:96:99:0,120,1800 0/0:182,0:182:99:0,120,1800 0/0:126,0:126:99:0,120,1800 0/0:123,0:123:99:0,120,1800 0/0:176,0:176:99:0,120,1800 0/0:102,0:102:99:0,120,1800 0/0:26,0:26:62:0,62,669 0/0:61,0:61:99:0,120,1800 0/0:24,0:24:69:0,69,693 0/0:3,0:8:9:0,9,104 0/0:74,0:74:99:0,120,1800 0/0:46,0:46:61:0,61,996 0/0:160,0:160:99:0,120,1800 0/0:69,0:69:73:0,73,1594 0/0:71,0:71:99:0,115,1733 0/0:92,0:92:99:0,120,1800 0/0:108,0:108:71:0,71,1800 0/0:100,0:100:72:0,72,1800 0/0:134,0:134:99:0,120,1800 0/0:149,0:149:99:0,120,1800 0/0:69,0:69:99:0,111,1465 0/0:66,0:66:99:0,120,1800 0/0:129,0:129:99:0,120,1800 0/0:52,0:52:99:0,102,1127 0/0:143,0:143:99:0,120,1800 0/0:103,0:103:99:0,120,1800 0/0:112,0:112:99:0,120,1800 0/0:81,0:81:99:0,120,1800 0/0:86,0:86:99:0,120,1800 1/1:2,0:19:3:104,3,0 0/0:149,0:149:99:0,120,1800 0/0:162,0:162:95:0,95,1800 0/0:91,0:91:99:0,120,1800 0/0:108,0:108:99:0,120,1800 0/0:93,0:93:99:0,106,1800 0/0:99,0:99:99:0,120,1800 0/0:63,0:63:86:0,86,1514 0/0:46,0:46:99:0,110,1343 0/0:50,0:50:95:0,95,1202 0/0:65,0:65:99:0,115,1541 0/0:74,0:74:99:0,120,1800 0/0:38,0:38:70:0,70,921 0/0:40,0:40:88:0,88,1080 0/0:46,0:46:61:0,61,1216 0/0:45,0:45:99:0,106,1119 0/0:36,0:36:78:0,78,873 0/0:35,0:35:67:0,67,771 0/0:70,0:70:85:0,85,1741 0/0:43,0:43:81:0,81,948 0/0:51,0:51:99:0,120,1800 0/0:70,0:70:99:0,110,1701 0/0:44,0:44:99:0,114,1003 0/0:57,0:57:99:0,116,1376 0/0:52,0:52:99:0,113,1547 0/0:51,0:51:93:0,93,1122 0/0:53,0:53:73:0,73,1273 0/0:25,0:25:69:0,69,710 0/0:63,0:63:99:0,120,1800 0/0:34,0:34:69:0,69,830 0/0:47,0:47:75:0,75,1155 0/0:28,0:28:72:0,72,780 0/0:45,0:45:93:0,93,1516 0/0:45,0:45:66:0,66,1240 0/0:62,0:62:99:0,120,1800 0/0:48,0:48:86:0,86,1358 0/0:61,0:61:64:0,64,1553 0/0:86,0:86:99:0,120,1800 0/0:84,0:84:99:0,120,1800 0/0:51,0:51:65:0,65,1133 0/0:53,0:53:99:0,104,1392 0/0:46,0:46:99:0,110,1197 0/0:51,0:51:88:0,88,1222 0/0:40,0:40:84:0,84,1260 0/0:41,0:41:95:0,95,1017 0/0:62,0:62:99:0,120,1800 0/0:41,0:41:87:0,87,1049 0/0:12,0:12:21:0,21,361 0/0:55,0:55:99:0,120,1800 0/0:50,0:50:90:0,90,1350 0/0:36,0:36:99:0,107,875 0/0:68,0:68:99:0,120,1800 0/0:55,0:55:99:0,120,1800 0/0:42,0:42:75:0,75,1348 0/0:41,0:41:79:0,79,1120 0/0:45,0:45:99:0,109,1051 0/0:36,0:36:69:0,69,984 0/0:58,0:58:99:0,120,1800 0/0:31,0:31:62:0,62,741 0/0:35,0:35:80:0,80,878 0/0:50,0:50:99:0,117,1755 0/0:55,0:55:89:0,89,1485 0/0:39,0:39:68:0,68,990 0/0:55,0:55:99:0,120,1800 0/0:51,0:51:64:0,64,1167 0/0:23,0:23:66:0,66,990 0/0:63,0:63:94:0,94,1800 0/0:60,0:60:99:0,119,1800 0/0:41,0:41:72:0,72,983 0/0:66,0:66:84:0,84,1669 0/0:67,0:67:99:0,120,1800 0/0:39,0:39:99:0,110,1071 0/0:48,0:48:99:0,116,1495 0/0:40,0:40:99:0,104,1095 0/0:32,0:32:71:0,71,851 0/0:50,0:50:75:0,75,1383 0/0:54,0:54:99:0,119,1505 0/0:28,0:28:62:0,62,681 0/0:66,0:66:93:0,93,1645 0/0:33,0:33:61:0,61,859 0/0:47,0:47:99:0,101,1121 0/0:47,0:47:71:0,71,1078 0/0:42,0:42:65:0,65,1144 0/0:59,0:59:99:0,111,1785 0/0:32,0:32:72:0,72,1080 0/0:37,0:37:84:0,84,1260 0/0:55,0:55:99:0,109,1361 0/0:42,0:42:99:0,106,990 0/0:32,0:32:80:0,80,769 0/0:41,0:41:90:0,90,1065 0/0:50,0:50:99:0,114,1710 0/0:78,0:78:99:0,120,1800 0/0:48,0:48:94:0,94,1089 0/0:43,0:43:80:0,80,1061 0/0:31,0:31:66:0,66,697 0/0:33,0:33:98:0,98,890 0/0:43,0:43:99:0,101,1257 0/0:34,0:34:65:0,65,680 0/0:82,0:82:99:0,120,1800 0/0:38,0:38:62:0,62,960 0/0:54,0:54:89:0,89,1393 0/0:37,0:37:81:0,81,989 0/0:67,0:67:99:0,120,1800 0/0:35,0:35:80:0,80,892 0/0:53,0:53:93:0,93,1181 0/0:58,0:58:99:0,120,1800 0/0:44,0:44:91:0,91,971 0/0:45,0:45:80:0,80,1246 0/0:49,0:49:95:0,95,1159 0/0:40,0:40:95:0,95,969 0/0:72,0:72:99:0,120,1800 0/0:21,0:21:63:0,63,735 0/0:45,0:45:99:0,120,1800 0/0:63,0:63:99:0,120,1800 0/0:52,0:52:99:0,117,1755 0/0:44,0:44:79:0,79,1273 0/0:54,0:54:99:0,104,1617 0/0:59,0:59:99:0,100,1721 0/0:65,0:65:67:0,67,1504 0/0:52,0:52:93:0,93,1450 0/0:73,0:73:99:0,120,1800 0/0:51,0:51:81:0,81,1437 0/0:47,0:47:72:0,72,1161 0/0:50,0:50:74:0,74,1800 0/0:32,0:32:71:0,71,849 0/0:71,0:71:99:0,108,1664 0/0:28,0:28:63:0,63,809 0/0:45,0:45:88:0,88,1208 0/0:57,0:57:70:0,70,1187 0/0:35,0:35:63:0,63,783 0/0:53,0:53:99:0,120,1800 0/0:36,0:36:86:0,86,954 0/0:40,0:40:68:0,68,1010 0/0:42,0:42:84:0,84,1159 0/0:43,0:43:99:0,99,1264 0/0:37,0:37:66:0,66,870 0/0:90,0:90:88:0,88,1800 0/0:47,0:47:66:0,66,1334 0/0:51,0:51:95:0,95,1155 0/0:35,0:35:99:0,103,953 0/0:73,0:73:99:0,120,1800 0/0:36,0:36:62:0,62,954 0/0:79,0:79:89:0,89,1800 0/0:53,0:53:63:0,63,1149 0/0:63,0:63:99:0,112,1560 0/0:67,0:67:99:0,120,1800 0/0:106,0:106:99:0,120,1800 0/0:32,0:32:81:0,81,1215 0/0:40,0:40:97:0,97,1051 0/0:36,0:36:76:0,76,980 0/0:49,0:49:99:0,120,1800 0/0:43,0:43:93:0,93,1014 0/0:30,0:30:69:0,69,734 0/0:65,0:65:99:0,120,1800 0/0:71,0:71:99:0,120,1800 0/0:44,0:44:73:0,73,1209 0/0:38,0:38:80:0,80,961 0/0:56,0:56:97:0,97,1383 0/0:41,0:41:88:0,88,1337 0/0:44,0:44:71:0,71,1112 0/0:65,0:65:99:0,120,1800 0/0:74,0:74:99:0,120,1800 0/0:56,0:56:99:0,120,1800 0/0:34,0:34:90:0,90,918 0/0:34,0:34:67:0,67,906 0/0:24,0:24:60:0,60,793 0/0:27,0:27:76:0,76,711 0/0:68,0:68:99:0,113,1525 0/0:63,0:63:99:0,118,1641 0/0:41,0:41:99:0,102,978 0/0:82,0:82:97:0,97,1800 0/0:44,0:44:67:0,67,1081 0/0:37,0:37:62:0,62,842 0/0:38,0:38:98:0,98,959 0/0:72,0:72:99:0,120,1800 0/0:58,0:58:75:0,75,1535 0/0:31,0:31:67:0,67,910 0/0:39,0:39:61:0,61,865 0/0:62,0:62:73:0,73,1540 0/0:66,0:66:99:0,120,1800 0/0:133,0:133:99:0,120,1800 0/0:64,0:64:99:0,120,1800 0/0:46,0:46:96:0,96,1440 0/0:43,0:43:99:0,105,1575 0/0:50,0:50:99:0,115,1673 0/0:129,0:129:71:0,71,1800 0/0:84,0:84:99:0,120,1800 0/0:91,0:91:93:0,93,1800 0/0:91,0:91:99:0,120,1800 0/0:81,0:81:99:0,120,1800 0/0:96,0:96:99:0,120,1800 0/0:84,0:84:99:0,120,1800 0/0:70,0:70:99:0,120,1800 0/0:80,0:80:99:0,120,1800 0/0:65,0:65:99:0,120,1800 0/0:124,0:124:99:0,120,1800 0/0:69,0:69:96:0,96,1800 0/0:63,0:63:99:0,120,1800 0/0:88,0:88:99:0,120,1800 0/0:79,0:79:99:0,120,1800 0/0:77,0:77:99:0,120,1800 0/0:71,0:71:76:0,76,1800 0/0:64,0:64:99:0,120,1800 0/0:76,0:76:99:0,120,1800 0/0:83,0:83:99:0,120,1800 0/0:83,0:83:99:0,120,1800 0/0:77,0:77:99:0,120,1800 0/0:143,0:143:99:0,120,1800 0/0:80,0:80:99:0,120,1800 0/0:88,0:88:99:0,120,1800 0/0:73,0:73:78:0,78,1800 0/0:110,0:110:99:0,120,1800 0/0:78,0:78:99:0,120,1800 0/0:99,0:99:99:0,120,1800 0/0:90,0:90:99:0,120,1800 0/0:53,0:53:99:0,120,1800 0/0:40,0:40:81:0,81,1215 0/0:119,0:119:99:0,120,1800 0/0:100,0:100:99:0,120,1800 0/0:92,0:92:99:0,120,1800 0/0:105,0:105:99:0,120,1800 0/0:129,0:129:99:0,120,1800 0/0:99,0:99:94:0,94,1800 0/0:140,0:140:99:0,120,1800 0/0:128,0:128:99:0,120,1800 0/0:96,0:96:99:0,120,1800 0/0:99,0:99:99:0,120,1800 0/0:86,0:86:99:0,120,1800 0/0:100,0:100:99:0,120,1800 0/0:110,0:110:99:0,120,1800 0/0:125,0:125:99:0,120,1800 0/0:149,0:149:99:0,120,1800 0/0:107,0:107:99:0,120,1800 0/0:81,0:81:99:0,120,1800 0/0:105,0:105:99:0,120,1800 0/0:89,0:89:99:0,120,1800 0/0:125,0:125:99:0,120,1800 0/0:120,0:120:99:0,120,1800 0/0:52,0:52:99:0,120,1800 0/0:91,0:91:99:0,120,1800 0/0:114,0:114:99:0,120,1800 0/0:94,0:94:99:0,120,1800 0/0:74,0:74:99:0,120,1800 0/0:80,0:80:99:0,120,1800 0/0:105,0:105:99:0,120,1800 0/0:83,0:83:99:0,111,1800 0/0:86,0:86:99:0,120,1800

I am not sure, why suddenly the variant that is not recorded in the majority of individual g.vcfs and bam files appeared in the final file.
Thank you very much for your help.

an_kate

Cannot find HaplotypeCaller

$
0
0

I downloaded GATK and ran the script. It gave this bizarre error where it couldn't find HaplotypeCaller. So I downloaded GATK again. I am still having the same error.

Here is my input:

java -jar GenomeAnalysisTK.jar \
-R hg38.fa \
-T HaplotypeCaller \
-I AE2CH6SK4DG-EXT.chr1.bam \
-I AE2CH6SK4DG-EXT.chr2.bam \
-I AE2CH6SK4DG-EXT.chr3.bam \
-I AE2CH6SK4DG-EXT.chr4.bam \
-I AE2CH6SK4DG-EXT.chr5.bam \
-I AE2CH6SK4DG-EXT.chr6.bam \
-I AE2CH6SK4DG-EXT.chr7.bam \
-I AE2CH6SK4DG-EXT.chr8.bam \
-I AE2CH6SK4DG-EXT.chr9.bam \
-I AE2CH6SK4DG-EXT.chr10.bam \
-I AE2CH6SK4DG-EXT.chr11.bam \
-I AE2CH6SK4DG-EXT.chr12.bam \
-I AE2CH6SK4DG-EXT.chr13.bam \
-I AE2CH6SK4DG-EXT.chr14.bam \
-I AE2CH6SK4DG-EXT.chr15.bam \
-I AE2CH6SK4DG-EXT.chr16.bam \
-I AE2CH6SK4DG-EXT.chr17.bam \
-I AE2CH6SK4DG-EXT.chr18.bam \
-I AE2CH6SK4DG-EXT.chr19.bam \
-I AE2CH6SK4DG-EXT.chr20.bam \
-I AE2CH6SK4DG-EXT.chr21.bam \
-I AE2CH6SK4DG-EXT.chr22.bam \
-I AE2CH6SK4DG-EXT.chrM.bam \
-I AE2CH6SK4DG-EXT.chrX.bam \
-I AE2CH6SK4DG-EXT.chrY.bam \
--emitRefConfidence GVCF \
--dbsnp me.genotypes.vcf \
-o me.raw.vcf

This is the error message:


ERROR A USER ERROR has occurred (version 3.7-0-gcfedb67):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: HaplotypeCaller
ERROR ------------------------------------------------------------------------------------------

Did something happen to the last build?


Error: Argument with name 'drf' isn't defined.

$
0
0

Hello,

I'm currently trying to call SNP on several samples (8 bam, 1 pseudoref) and 75.15% of my reads have failed the DuplicateReadFilter. I tried to disable this filter with "-drf DuplicateRead" in the command line but that error came up (drf not defined).

Any idea?

Regards,

Quentin Jehanne

CollectHSmetrics

$
0
0

Hi everyone,

Thank you in advance for reading this. Has anyone else had the following problem when running CollectHSmetrics:

[Tue Apr 12 12:17:52 EDT 2016] picard.analysis.directed.CollectHsMetrics done. Elapsed time: 14.54 minutes.
Runtime.totalMemory()=920649728
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at picard.analysis.directed.TargetMetricsCollector$PerUnitTargetMetricCollector.calculateTargetCoverageMetrics(TargetMetricsCollector.java:637)
at picard.analysis.directed.TargetMetricsCollector$PerUnitTargetMetricCollector.finish(TargetMetricsCollector.java:577)
at picard.metrics.MultiLevelCollector$AllReadsDistributor.finish(MultiLevelCollector.java:208)
at picard.metrics.MultiLevelCollector.finish(MultiLevelCollector.java:324)
at picard.analysis.directed.CollectTargetedMetrics.doWork(CollectTargetedMetrics.java:152)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

The Bait and Target I used was just the header of my BAM file pulled by Samtools view header.

I saw something in the FAQ saying that, "The BAM index format imposes a limit of 512MB for a single reference sequence. If this limit is exceeded, various errors may occur depending on what steps have been taken."

Any explanation on how to address this? Thank you again!

Error running haplotypecaller (v3.3)

$
0
0

I am running the haplotypecaller on multiple samples all but one work correctly with the following commands, however one is resulting in a binary file instead of the gvcf. Any suggestions how to fix this?

HaplotypeCaller -i map/30.rmdup.bam -o gvcf/30.gvcf -r reference_F.fasta -p 10 -a gvcf/30.gvcf -g -e hc.log

Mutect2 gets different results when I change the downsample level

$
0
0

I use mutect2 of GATK 3.6 and GATK 3.7 to call variant. I know there is a downsampling in mutect2 which has an important influence on the result. So I change the downsampling level. For example: the default value is:

 maxReadsInRegionPerSample = 1000;
 minReadsPerAlignmentStart = 5;

I change these parameters to a bigger one:

 maxReadsInRegionPerSample = 2000;
 minReadsPerAlignmentStart = 10;

Then I compile the code, run it and get the result named downsample_2x.vcf. However, compared to the default result original.vcf, the result is very strange:

There are more variants in downsample_2x.vcf, which is easy to understand because there are much more samples. However, there are also less variants in downsample_2x.vcf(That is, variants in original.vcf are not show in downsample_2x.vcf, around 200 within total 900 variants). Since the sample get bigger, why there are less variants? It's difficult for me to understand. If the result with more samples is much more accurate, how about these missing 200 variants?Any reply will be much appreicated!

Mangrove charcoal Saudi Arabia

$
0
0

Mangrove Charcoal is one among the hardwood charcoals and also one among the heaviest charcoals. In other words, its structure is extremely dense and also exhausting. This charcoal is used for BBQ in restaurants, outside picnic charcoal packs, and a few industrial fields like non-ferrous metal production.
One advantage of this charcoal is that it provides a special aroma to BBQ when burning.

VariantAnnotator error with Freebayes merge: Must initialize the cache of allele anyploid indices...

$
0
0

Main question (for the community):
How to solve / avoid getting the error Must initialize the cache of allele anyploid indices for ploidy 1 with the VariantAnnotator (3.7-0) and freebayes1.1.0 .

The best solution might be Do not use third party tools! as the mod will say. Thus I might not need a perfect solution and so experiences merging variants from different sources are welcome. My workflow producing the results: (BWA->AddOrReplaceReadgroups-> MergeSamFiles->MarkDuplicates-> IndelrealignmentKnownIndels->(Freebayes + HC) -> CombineVariants -> snpEff+ VariantAnnotator). Optionally with a VariantsToAllelicPrimitives after freebayes.

The usual stuff

stack trace,an afflicting record and my solution

java -Xmx8g -Djava.io.tmpdir=/scratch/ -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=1 -jar /data/GATK/3.7-0-foss-2016a-Java-1.8.0_121/GenomeAnalysisTK.jar -T VariantAnnotator -R /data/ftp.broadinstitute.org/bundle/2.8/b37//human_g1k_v37_decoy.fasta --dbsnp /data/ftp.broadinstitute.org/bundle/2.8/b37//dbsnp_138.b37.vcf -I 1.bam -I 2.bam -I 3.bam -I 4.bam -I 5.bam -I 6.bam --excludeAnnotation MVLikelihoodRatio --excludeAnnotation TechnologyComposition --excludeAnnotation DepthPerSampleHC --excludeAnnotation PercentNBaseSolid --excludeAnnotation PossibleDeNovo --excludeAnnotation AS_RMSMappingQuality --excludeAnnotation ClusteredReadPosition --filter_bases_not_stored --useAllAnnotations --snpEffFile /scratch/snpEffGatk.intermediate.vcf --resource:cosmic,VCF /data/sftp-cancer.sanger.ac.uk/files/grch37/cosmic//v72/VCF/CosmicCodingMuts.bundle_2.8_b37.vcf.gz -E cosmic.ID --resource:1000gPhase1Snps,vcf /data//ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz -E 1000gPhase1Snps.AF -E 1000gPhase1Snps.AFR_AF -E 1000gPhase1Snps.AMR_AF -E 1000gPhase1Snps.ASN_AF -E 1000gPhase1Snps.EUR_AF --resource:1000gPhase1Indels,vcf /data/ftp.broadinstitute.org/bundle/2.8/b37//1000G_phase1.indels.b37.vcf -E 1000gPhase1Indels.AF -E 1000gPhase1Indels.AFR_AF -E 1000gPhase1Indels.AMR_AF -E 1000gPhase1Indels.ASN_AF -E 1000gPhase1Indels.EUR_AF --resource:dbSnp,vcf /data/ftp.broadinstitute.org/bundle/2.8/b37//dbsnp_138.b37.vcf -E dbSnp.CAF -E dbSnp.COMMON -E dbSnp.dbSNPBuildID --resource:exac,vcf /data//ExAC/release0.3/ExAC.r0.3.sites.vep.vcf.gz -E exac.AC -E exac.AN -E exac.AC_Adj -E exac.AC_Hom -E exac.AC_Het -E exac.AC_Hemi -E exac.AN_AFR -E exac.AC_AFR -E exac.AC_AMR -E exac.AN_AMR -E exac.AC_EAS -E exac.AN_EAS -E exac.AC_FIN -E exac.AN_FIN -E exac.AC_OTH -E exac.AN_OTH -E exac.AN_SAS -E exac.AC_SAS -V:input,VCF /scratch/3_178927848_178927848.vcf --out /scratch/3_178927848_178927848.vcf -L /scratch/test.vcf &>/dev/stdout  | tail -n 150
INFO  10:10:06,670 HelpFormatter - ---------------------------------------------------------------------------------- 
INFO  10:10:06,672 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18 
INFO  10:10:06,672 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
INFO  10:10:06,672 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
INFO  10:10:06,672 HelpFormatter - [Thu Jun 08 10:10:06 CEST 2017] Executing on Linux 2.6.32-642.6.2.el6.x86_64 amd64 
INFO  10:10:06,672 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13 
INFO  10:10:06,675 HelpFormatter - Program Args: -T VariantAnnotator -R /data/ftp.broadinstitute.org/bundle/2.8/b37//human_g1k_v37_decoy.fasta --dbsnp /data//ftp.broadinstitute.org/bundle/2.8/b37//dbsnp_138.b37.vcf -I /1.bam -I 2.bam -I 3.bam -I 4.bam -I 5.bam -I 6.bam --excludeAnnotation MVLikelihoodRatio --excludeAnnotation TechnologyComposition --excludeAnnotation DepthPerSampleHC --excludeAnnotation PercentNBaseSolid --excludeAnnotation PossibleDeNovo --excludeAnnotation AS_RMSMappingQuality --excludeAnnotation ClusteredReadPosition --filter_bases_not_stored --useAllAnnotations --snpEffFile /scratch/npEffGatk.intermediate.vcf --resource:cosmic,VCF /data/sftp-cancer.sanger.ac.uk/files/grch37/cosmic//v72/VCF/CosmicCodingMuts.bundle_2.8_b37.vcf.gz -E cosmic.ID --resource:1000gPhase1Snps,vcf /data/ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz -E 1000gPhase1Snps.AF -E 1000gPhase1Snps.AFR_AF -E 1000gPhase1Snps.AMR_AF -E 1000gPhase1Snps.ASN_AF -E 1000gPhase1Snps.EUR_AF --resource:1000gPhase1Indels,vcf /data/ftp.broadinstitute.org/bundle/2.8/b37//1000G_phase1.indels.b37.vcf -E 1000gPhase1Indels.AF -E 1000gPhase1Indels.AFR_AF -E 1000gPhase1Indels.AMR_AF -E 1000gPhase1Indels.ASN_AF -E 1000gPhase1Indels.EUR_AF --resource:dbSnp,vcf /data/ftp.broadinstitute.org/bundle/2.8/b37/dbsnp_138.b37.vcf -E dbSnp.CAF -E dbSnp.COMMON -E dbSnp.dbSNPBuildID --resource:exac,vcf /data/ExAC/release0.3/ExAC.r0.3.sites.vep.vcf.gz -E exac.AC -E exac.AN -E exac.AC_Adj -E exac.AC_Hom -E exac.AC_Het -E exac.AC_Hemi -E exac.AN_AFR -E exac.AC_AFR -E exac.AC_AMR -E exac.AN_AMR -E exac.AC_EAS -E exac.AN_EAS -E exac.AC_FIN -E exac.AN_FIN -E exac.AC_OTH -E exac.AN_OTH -E exac.AN_SAS -E exac.AC_SAS -V:input,VCF /scratch/3_178927848_178927848.vcf --out /scratch/3_178927848_178927848.vcf -L /scratch/3_178927848_178927848.vcf
INFO  10:10:06,680 HelpFormatter - Executing as umcg-mterpstra@pg-interactive on Linux 2.6.32-642.6.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13. 
INFO  10:10:06,680 HelpFormatter - Date/Time: 2017/06/08 10:10:06 
INFO  10:10:06,680 HelpFormatter - ---------------------------------------------------------------------------------- 
INFO  10:10:06,680 HelpFormatter - ---------------------------------------------------------------------------------- 
INFO  10:10:06,711 GenomeAnalysisEngine - Strictness is SILENT 
INFO  10:10:06,790 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 
INFO  10:10:06,796 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  10:10:07,894 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 1.10 
INFO  10:10:10,340 IntervalUtils - Processing 6042 bp from intervals 
WARN  10:10:10,345 IndexDictionaryUtils - Track cosmic doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:10:10,345 IndexDictionaryUtils - Track 1000gPhase1Snps doesn't have a sequence dictionary built in, skipping dictionary validation 
WARN  10:10:10,346 IndexDictionaryUtils - Track exac doesn't have a sequence dictionary built in, skipping dictionary validation 
INFO  10:10:10,483 GenomeAnalysisEngine - Preparing for traversal over 6 BAM files 
INFO  10:10:10,613 GenomeAnalysisEngine - Done preparing for traversal 
INFO  10:10:10,613 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  10:10:10,614 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  10:10:10,614 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
WARN  10:10:11,001 AS_RankSumTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,002 AS_StrandBiasTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,002 AS_InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. 
WARN  10:10:11,003 AS_RankSumTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,003 AS_RankSumTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,003 AS_RankSumTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,003 AS_RankSumTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,003 AS_StrandBiasTest - Allele-specific annotations can only be used with HaplotypeCaller, CombineGVCFs and GenotypeGVCFs -- no data will be output 
WARN  10:10:11,004 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
WARN  10:10:11,004 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. 
WARN  10:10:11,004 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
WARN  10:10:28,397 BaseQualitySumPerAlleleBySample - Annotation will not be calculated, can only be called from MuTect2, not org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator 
WARN  10:10:28,397 OxoGReadCounts - Annotation will not be calculated, can only be called from MuTect2, not org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator 
WARN  10:10:28,398 AnnotationUtils - SAC annotation will not be calculated, must be called from HaplotypeCaller or MuTect2, not VariantAnnotator 
WARN  10:10:28,398 AnnotationUtils - SB annotation will not be calculated, must be called from HaplotypeCaller or MuTect2, not VariantAnnotator 
WARN  10:10:28,429 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator 
WARN  10:10:28,429 HardyWeinberg - Too few genotypes 
WARN  10:10:28,435 SpanningDeletions - Annotation will not be calculated, must be called from UnifiedGenotyper, not org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator 
WARN  10:10:28,437 TransmissionDisequilibriumTest - Transmission disequilibrium test annotation requires a valid ped file be passed in. 
##### ERROR --
##### ERROR stack trace 
java.lang.IllegalStateException: Must initialize the cache of allele anyploid indices for ploidy 1
    at htsjdk.variant.variantcontext.GenotypeLikelihoods.getAlleles(GenotypeLikelihoods.java:532)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.getLikelihoodIndexes(GATKVariantContextUtils.java:681)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.determineLikelihoodIndexesToUse(GATKVariantContextUtils.java:639)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.subsetAlleles(GATKVariantContextUtils.java:610)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.splitVariantContextToBiallelics(GATKVariantContextUtils.java:1071)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.splitVariantContextToBiallelics(GATKVariantContextUtils.java:1019)
    at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.splitVariantContextToBiallelics(GATKVariantContextUtils.java:1000)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.getMinRepresentationBiallelics(VariantAnnotatorEngine.java:535)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateExpressions(VariantAnnotatorEngine.java:452)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:226)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:212)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:355)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:112)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Must initialize the cache of allele anyploid indices for ploidy 1
##### ERROR ------------------------------------------------------------------------------------------


Afflicting record (multiple present in VCF)

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  1   2   3   4   5   6
3   178927848   rs67871207  AT  A   61.21   .   AC=4;AF=1.00;AN=4;DB;DP=5;ExcessHet=3.0103;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=60.00;QD=20.40;SOR=3.258;set=GATK GT:AD:DP:GQ:PL  ./../.  1/1:0,1:1:3:28,3,0  ./. ./. 1/1:0,3:3:9:66,9,0
3   178927848   .   ATTTTTTTTTTTTA  ATTTTTTTTTTTA,ATTTTTTTTTTA,ATTTTTTTTCTTTA   117.54  .   AB=0.666667,0.333333,0;ABP=3.73412,3.73412,0;AC=3,1,2;AF=0.500,0.167,0.333;AN=6;AO=3,1,1;CIGAR=1M1D12M,1M2D11M,9M1X4M;DP=5;DPB=4.64286;DPRA=2,3,1;EPP=3.73412,5.18177,5.18177;EPPR=0;GTI=1;LEN=1,2,1;MEANALT=1.5,2,1;MQM=60,60,60;MQMR=0;NS=3;NUMALT=3;ODDS=0.405465;PAIRED=0,0,0;PAIREDR=0;PAO=0,0,0;PQA=0,0,0;PQR=0;PRO=0;QA=86,36,36;QR=0;RO=0;RPL=1,0,0;RPP=3.73412,5.18177,5.18177;RPPR=0;RPR=2,1,1;RUN=1,1,1;SAF=3,1,1;SAP=9.52472,5.18177,5.18177;SAR=0,0,0;SRF=0;SRP=0;SRR=0;TYPE=del,del,snp;set=freebayes;technology.illumina=1,1,1   GT:AD:AO:DP:PL:QA:QR:RO .   .   1/1:0,1,0,0:1,0,0:1:36,3,0,36,3,36,36,3,36,36:36,0,0:0:0    .   3/3:0,0,0,1:0,0,1:1:36,36,36,36,36,36,3,3,3,0:0,0,36:0:0    1/2:0,2,1,0:2,1,0:3:71,33,27,41,0,38,71,33,41,71:50,36,0:0:0

My solution

#keep only simple variants not multiallelic
perl -i.bak -wpe 'if(not((m/^#/) ||  (m/TYPE=complex;/||m/TYPE=snp;/||m/TYPE=del;/||m/TYPE=ins;/||m/TYPE=mnp;/))){$_="";}' probematic.vcf
#CombineVariants with  -genotypeMergeOptions PRIORITIZE -priority GATK,freebayes  --filteredrecordsmergetype KEEP_UNCONDITIONAL
#select the GATK records if two records have the same CHROM/POS
#VariantAnnotator....

need GenePattern modules for the latest Picard tools

$
0
0

(repost of part of a mail to Picard team)

Is there someone out there having new GP modules for as many as possible Picard.jar subcommands (I love them all)

I am preparing a variant training for the fall on our institute in-dev-Genepattern server and would love to teach people use the new Picard as often as possible (and when it opens, will also love to add GATK v4).

The problem for Picard tools is that the java command has changed from a ‘command.jar’ to a ’picard.jar subcommand’ and some arguments may have been added or modified which makes the process of old GP module recycling useless.

The good news is that the new core argument syntax has been made uniform, it may therefore be better to start from scratch and create a generic GP module that fits them (almost) all??

If someone could provide such basic module (or more :-), it would be greatly appreciated.

Best Regards,
Stephane

(VIB: www.vib.be)


ERROR stack trace java.lang.NumberFormatException: For input string: "44.00"

$
0
0

Hi there !

I'm having the following error message (see below) when using VariantFiltration (GATK version 3.7.0).
I have used this same code before without having any trouble.
Any help welcome ! I'm stuck. I attach the VCF file I'm trying to annotate as the error seems to be in there...

Cheers,

Adrien

command

java -Xms4G -Xmx4G -jar /gs7k1/home/arieux/Softwares/GATK/GenomeAnalysisTK.jar \
-T VariantFiltration \
-R /gs7k1/home/arieux/Data/Xcc/ref/306/ref_306.fasta \
--variant test.vcf \
-o test.annotated.vcf \
--filterExpression "RA>0.1" --filterName "Heteroplastic" \
--filterExpression "prof<15" --filterName "Low_depth" \
--filterExpression "QUAL <= 0 || MQ <= 1" --filterName "Bad_mapping"

error message

INFO 12:43:00,472 HelpFormatter - -----------------------------------------------------------------------------------
INFO 12:43:00,476 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO 12:43:00,476 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 12:43:00,476 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 12:43:00,477 HelpFormatter - [Thu Jun 08 12:43:00 CEST 2017] Executing on Linux 2.6.32-504.16.2.el6.x86_64 amd64
INFO 12:43:00,477 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13
INFO 12:43:00,483 HelpFormatter - Program Args: -T VariantFiltration -R /gs7k1/home/arieux/Data/Xcc/ref/306/ref_306.fasta --variant 160930_SNK268_B_L008_JBO-8b.vcf -o 160930_SNK268_B_L008_JBO-8b.annotated.vcf --filterExpression RA>0.1 --filterName Heteroplastic --filterExpression prof<15 --filterName Low_depth --filterExpression QUAL <= 0 || MQ <= 1 --filterName Bad_mapping
INFO 12:43:00,490 HelpFormatter - Executing as arieux@cc2-bigmem2 on Linux 2.6.32-504.16.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13.
INFO 12:43:00,491 HelpFormatter - Date/Time: 2017/06/08 12:43:00
INFO 12:43:00,494 HelpFormatter - -----------------------------------------------------------------------------------
INFO 12:43:00,494 HelpFormatter - -----------------------------------------------------------------------------------
INFO 12:43:00,527 GenomeAnalysisEngine - Strictness is SILENT
INFO 12:43:00,644 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
WARN 12:43:00,705 RMDTrackBuilder - Index file /work/arieux/X.citrii.Fasteris2016/test_herbarium/data_merged/new_pipeline/test_problem_python/160930_SNK268_B_L008_JBO-8b.vcf.idx is out of date (index older than input file), deleting and updating the index file
INFO 12:43:00,912 RMDTrackBuilder - Writing Tribble index to disk for file /work/arieux/X.citrii.Fasteris2016/test_herbarium/data_merged/new_pipeline/test_problem_python/160930_SNK268_B_L008_JBO-8b.vcf.idx
INFO 12:43:01,070 GenomeAnalysisEngine - Preparing for traversal
INFO 12:43:01,071 GenomeAnalysisEngine - Done preparing for traversal
INFO 12:43:01,072 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 12:43:01,072 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 12:43:01,073 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime

ERROR --
ERROR stack trace

java.lang.NumberFormatException: For input string: "44.00"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.apache.commons.jexl2.JexlArithmetic.toLong(JexlArithmetic.java:906)
at org.apache.commons.jexl2.JexlArithmetic.compare(JexlArithmetic.java:718)
at org.apache.commons.jexl2.JexlArithmetic.lessThanOrEqual(JexlArithmetic.java:807)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:956)
at org.apache.commons.jexl2.parser.ASTLENode.jjtAccept(ASTLENode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:1283)
at org.apache.commons.jexl2.parser.ASTOrNode.jjtAccept(ASTOrNode.java:18)
at org.apache.commons.jexl2.Interpreter.interpret(Interpreter.java:232)
at org.apache.commons.jexl2.ExpressionImpl.evaluate(ExpressionImpl.java:65)
at htsjdk.variant.variantcontext.JEXLMap.evaluateExpression(JEXLMap.java:178)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:94)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:15)
at htsjdk.variant.variantcontext.VariantContextUtils.match(VariantContextUtils.java:341)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.matchesFilter(VariantFiltration.java:483)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.buildVCfilters(VariantFiltration.java:474)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.filter(VariantFiltration.java:379)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.map(VariantFiltration.java:318)
at org.broadinstitute.gatk.tools.walkers.filters.VariantFiltration.map(VariantFiltration.java:99)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: For input string: "44.00"
ERROR ------------------------------------------------------------------------------------------

GATK ver. 2.7

$
0
0

I'm trying to run an older bacterial phylogenetic pipeline that was validated with GATK ver. 2.7. I have found documentation on this version, but not a link for download. Are older versions of GATK still available and, if so, where?

(howto) Apply hard filters to a call set

$
0
0

Objective

Apply hard filters to a variant callset that is too small for VQSR or for which truth/training sets are not available.

Caveat

This document is intended to illustrate how to compose and run the commands involved in applying the hard filtering method. The annotations and values used may not reflect the most recent recommendations. Be sure to read the documentation about why you would use hard filters and how to understand and improve upon the generic hard filtering recommendations that we provide.

Steps

  1. Extract the SNPs from the call set
  2. Determine parameters for filtering SNPs
  3. Apply the filter to the SNP call set
  4. Extract the Indels from the call set
  5. Determine parameters for filtering indels
  6. Apply the filter to the Indel call set

1. Extract the SNPs from the call set

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \ 
    -T SelectVariants \ 
    -R reference.fa \ 
    -V raw_variants.vcf \ 
    -selectType SNP \ 
    -o raw_snps.vcf 

Expected Result

This creates a VCF file called raw_snps.vcf, containing just the SNPs from the original file of raw variants.


2. Determine parameters for filtering SNPs

SNPs matching any of these conditions will be considered bad and filtered out, i.e. marked FILTER in the output VCF file. The program will specify which parameter was chiefly responsible for the exclusion of the SNP using the culprit annotation. SNPs that do not match any of these conditions will be considered good and marked PASS in the output VCF file.

  • QualByDepth (QD) 2.0

This is the variant confidence (from the QUAL field) divided by the unfiltered depth of non-reference samples.

  • FisherStrand (FS) 60.0

Phred-scaled p-value using Fisher’s Exact Test to detect strand bias (the variation being seen on only the forward or only the reverse strand) in the reads. More bias is indicative of false positive calls.

  • RMSMappingQuality (MQ) 40.0

This is the Root Mean Square of the mapping quality of the reads across all samples.

  • MappingQualityRankSumTest (MQRankSum) -12.5

This is the u-based z-approximation from the Mann-Whitney Rank Sum Test for mapping qualities (reads with ref bases vs. those with the alternate allele). Note that the mapping quality rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles, i.e. this will only be applied to heterozygous calls.

  • ReadPosRankSumTest (ReadPosRankSum) -8.0

This is the u-based z-approximation from the Mann-Whitney Rank Sum Test for the distance from the end of the read for reads with the alternate allele. If the alternate allele is only seen near the ends of reads, this is indicative of error. Note that the read position rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles, i.e. this will only be applied to heterozygous calls.

  • StrandOddsRatio (SOR) 3.0

The StrandOddsRatio annotation is one of several methods that aims to evaluate whether there is strand bias in the data. Higher values indicate more strand bias.


3. Apply the filter to the SNP call set

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \ 
    -T VariantFiltration \ 
    -R reference.fa \ 
    -V raw_snps.vcf \ 
    --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" \ 
    --filterName "my_snp_filter" \ 
    -o filtered_snps.vcf 

Expected Result

This creates a VCF file called filtered_snps.vcf, containing all the original SNPs from the raw_snps.vcf file, but now the SNPs are annotated with either PASS or FILTER depending on whether or not they passed the filters.

For SNPs that failed the filter, the variant annotation also includes the name of the filter. That way, if you apply several different filters (simultaneously or sequentially), you can keep track of which filter(s) each SNP failed, and later you can retrieve specific subsets of your calls using the SelectVariants tool. To learn more about composing different types of filtering expressions and retrieving subsets of variants using SelectVariants, please see the online GATK documentation.


4. Extract the Indels from the call set

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \ 
    -T SelectVariants \ 
    -R reference.fa \ 
    -V raw_HC_variants.vcf \ 
    -selectType INDEL \ 
    -o raw_indels.vcf 

Expected Result

This creates a VCF file called raw_indels.vcf, containing just the Indels from the original file of raw variants.


5. Determine parameters for filtering Indels.

Indels matching any of these conditions will be considered bad and filtered out, i.e. marked FILTER in the output VCF file. The program will specify which parameter was chiefly responsible for the exclusion of the indel using the culprit annotation. Indels that do not match any of these conditions will be considered good and marked PASS in the output VCF file.

  • QualByDepth (QD) 2.0

This is the variant confidence (from the QUAL field) divided by the unfiltered depth of non-reference samples.

  • FisherStrand (FS) 200.0

Phred-scaled p-value using Fisher’s Exact Test to detect strand bias (the variation being seen on only the forward or only the reverse strand) in the reads. More bias is indicative of false positive calls.

  • ReadPosRankSumTest (ReadPosRankSum) 20.0

This is the u-based z-approximation from the Mann-Whitney Rank Sum Test for the distance from the end of the read for reads with the alternate allele. If the alternate allele is only seen near the ends of reads, this is indicative of error. Note that the read position rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles, i.e. this will only be applied to heterozygous calls.

  • StrandOddsRatio (SOR) 10.0

The StrandOddsRatio annotation is one of several methods that aims to evaluate whether there is strand bias in the data. Higher values indicate more strand bias.


6. Apply the filter to the Indel call set

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \ 
    -T VariantFiltration \ 
    -R reference.fa \ 
    -V raw_indels.vcf \ 
    --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" \ 
    --filterName "my_indel_filter" \ 
    -o filtered_indels.vcf 

Expected Result

This creates a VCF file called filtered_indels.vcf, containing all the original Indels from the raw_indels.vcf file, but now the Indels are annotated with either PASS or FILTER depending on whether or not they passed the filters.

For Indels that failed the filter, the variant annotation also includes the name of the filter. That way, if you apply several different filters (simultaneously or sequentially), you can keep track of which filter(s) each Indel failed, and later you can retrieve specific subsets of your calls using the SelectVariants tool. To learn more about composing different types of filtering expressions and retrieving subsets of variants using SelectVariants, please see the online GATK documentation.

BAM problem prevents a fix_misencoded_quality_scores step

$
0
0

I ran the following code:
java -Xmx100g -jar /work/reecygroup/GATK/GenomeAnalysisTK.jar \
-T BaseRecalibrator --unsafe -nct 16 \
-R /work/reecygroup/index/bos_taurus/bos_taurus_all_dna.fa \
-I 2005-5105.realigned.bam \
-knownSites /work/reecygroup/christine/Bos_taurus.dbSNP.vcf \
-o 2005-5105.realigned.grp \
--fix_misencoded_quality_scores

and got the error:

Bad input: while fixing mis-encoded base qualities we encountered a read that was correctly encoded; we cannot handle such a mixture of reads so unfortunately the BAM must be fixed with some other tool

The "index" and "knownSites" files have been successfully used in other run for quite some time, and the input BAM file (2005-5105.realigned.bam) was generated with a previous GATK IndelRealigner step. I wonder what could be wrong and what I need to do in order to fix it?

GATK4: execution stops with display of a cartoon

$
0
0

Apologies if this has been asked and answered. I don't even know how to frame a search of the forum for this.

I'm running the GenomeAnalysisTk-4_1.jar. After the run of any GATK4 tool (even just --help) a cartoon pops up, and the job will not exit until I manually close the image window. Quite a pain when running a big scatter-gather under cromwell. A Google Images search came up with "Best guess for this image: fashion accessory" which was not terribly helpful.

java -jar GenomeAnalysisTk-4_1.jar --help
Viewing all 12345 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>