Calling variants in RNAseq
Overview This document describes the details of the GATK Best Practices workflow for SNP and indel calling on RNAseq data. Please note that any command lines are only given as example of how the tools...
View Article(How to) Map and clean up short read sequence data efficiently
If you are interested in emulating the methods used by the Broad Genomics Platform to pre-process your short read sequencing data, you have landed on the right page. The parsimonious operating...
View ArticleGenomeLocPArser error in SplitNCigarReads
I don't understand the error message I'm getting and googling it or trying different reference and input files haven't helped. ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are...
View ArticleErrors about input files having missing or incompatible contigs
These errors occur when the names or sizes of contigs don't match between input files. This is a classic problem that typically happens when you get some files from collaborators, you try to use them...
View Article2017 Feb workshop presentation slides and tutorial materials
These are the materials that we are presenting at the February 2017 GATK workshop in Leuven, Belgium. Materials Link DAY 1: GATK Best Practices talks Slide decks presented on Day 1 Google Drive Folder...
View ArticleExclusion of Beagle from Best Practice
Dear all, Until 2012 or so, the Best Practice workflow seemed to contain a step using Beagle. However, now it does not. I searched related topics in this forum, but I could not find clear reasons....
View ArticleError in ValidateSamFile when multiplexing MarkDuplicates
I'm using GATK 3.7 and Picard v2.9.2 and when passing multiple input BAMs to MarkDuplicates (my data is multiplexed), I get an error when trying to validate the resulting BAM file using...
View ArticleMarkDuplicates---“did not start with a parseable number”
I was running RNA-seq data through the MarkDuplicates in Picard package for SNP calling getting the message: WARNING 2016-08-31 16:48:12 AbstractOpticalDuplicateFinderCommandLineProgram A field field...
View ArticleNo plots generated by AnalyzeCovariates in GATK4-Alpha
I'm using GATK4-Alpha AnalyzeCovariates java -Xmx80G -jar $GATK AnalyzeCovariates -before recal.table -after after_recal.table -plots recal_plots.pdf The tool appears to finish however I then get an...
View ArticleDynamically pass multiple input to Picard's MarkDuplicates (multiplexed data)
To pass multiple BAM files to MarkDuplicates we use the following syntax: java -jar picard.jar MarkDuplicates \ INPUT=lane1.bam \ INPUT=lane2.bam \ OUTPUT=dedup.bam \ METRICS_FILE=dedub_metrics.txt...
View ArticleApplying GATK to microbiome data
Hi, I am interested in knowing how to apply gatk tools to microbiome data. Specifically, I would like to override the assumption of ploidy in the HaplotypeCaller and making it flexible, in that one...
View ArticleIs it possible to build a reference with artificial sequences in GATK? And how?
Just wondering if it's possible to build a reference with artificial sequences in GATK? And how? Thank you!
View Articleindel realingment in RNAseq reads
I am Trying to find SNP in the transcriptome. I have finished the Split'N'Trim and reassign mapping qualities step. after this can I do indel realingment using these command: But sorted bam file is...
View Articlewhat is max-alternate-alleles ?
HI, firstly ,set max-alternate-alleles is 6 . I guess max-alternate-alleles means that at a site , the ref allele is A , the number of other alleles type (C,G) found by reads do not more than 6 . but...
View ArticleWhat is the difference between genotypemergeoption "UNIQUIFY" "UNSORTED" on...
Hi. I used gatk software followed Best practices for Germline SNP & Indel Discovery in Whole Genome. I get SNPs and INDEL and want to merge SNP and INDEL each. I use Combine variants of gatk...
View ArticlePhasing a whole genome
Hello GATK team, I am trying to phase data from WGS (Whole Genome Sequencing). The VCF contains 4.5milion variants. I am running GATK with the following command: java -Xmx200g -jar GenomeAnalysisTK.jar...
View ArticleArrayIndexOutOfBoundsException error in BaseRecalibratorSpark
Hi, Thank you for your time. I ran BaseRecalibratorSpark with GATK4 and GATK4-protected on Amazon instance. Both of them gave me error java.lang.ArrayIndexOutOfBoundsException: 1073741865. When running...
View ArticleHow does VariantEval's CountVariants option determine MNPs?
Hi, I am trying to understand what CountVariants is calling as an MNP in my dataset. When I run CountVariants, I get ~3K MNPs called across my samples. However, when I run SelectVariants with...
View ArticleSplitNCigarReads trimming into exons of RNA-seq reads
I would like to use GATK to call variants in RNA-seq including RNA editing sites. I have followed the guide article #3891 "Calling variants in RNAseq", except for starting with already aligned...
View ArticledbSNP VCF for BQSR
We are planing to work on GRCh38, and BQSR requires dbSNP VCF as input. The latest one is v146 in GRCh38 here ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_GRCh38p2/VCF/ There are two files...
View Article