Quantcast
Channel: Recent Discussions — GATK-Forum
Browsing all 12345 articles
Browse latest View live

Are there any Broad-specific instructions for using GATK?

In general you should use FireCloud, which has all the major GATK workflows preloaded, is more scalable and makes it easier to share any work you do with external collaborators, since the portal is...

View Article


Image may be NSFW.
Clik here to view.

Docker - container - image - registry

A container is something quite similar to a virtual machine, which can be used to contain and execute all the software required to run a particular program or set of programs. The container includes an...

View Article


(How to) Run GATK in a Docker container

This document explains how to install and use Docker to run GATK on a local machine. For a primer on what Docker containers are for and related terminology, see this Dictionary entry. Contents Install...

View Article

GenomicsDB

GenomicsDB is a datastore format developed by our collaborators at Intel to store variant call data (where "datastore" = something that we mere mortals can think of as a database, even though IT...

View Article

Google Dataproc - Spark cluster service

Dataproc is Google's Spark cluster service, which you can use to run GATK tools that are Spark-enabled very quickly and efficiently. To use it, you need a Google login and billing account, as well as...

View Article


Image may be NSFW.
Clik here to view.

(How to) Create a Spark cluster on Google Dataproc

As noted in our brief primer on Dataproc, there are two ways to create and control a Spark cluster on Dataproc: through a form in Google's web-based console, or directly through gcloud, _ak.a. Google...

View Article

Errors about input files having missing or incompatible contigs

These errors occur when the names or sizes of contigs don't match between input files. This is a classic problem that typically happens when you get some files from collaborators, you try to use them...

View Article

Image may be NSFW.
Clik here to view.

Errors in SAM/BAM files can be diagnosed with ValidateSamFile

The problem You're trying to run a GATK or Picard tool that operates on a SAM or BAM file, and getting some cryptic error that doesn't clearly tell you what's wrong. Bits of the stack trace (the pile...

View Article


Allele Depth (AD) is lower than expected

The problem: You're trying to evaluate the support for a particular call, but the numbers in the DP (total depth) and AD (allele depth) fields aren't making any sense. For example, the sum of all the...

View Article


Can't use VQSR on non-model organism or small dataset

The problem: Our preferred method for filtering variants after the calling step is to use VQSR, a.k.a. recalibration. However, it requires well-curated training/truth resources, which are typically not...

View Article

Image may be NSFW.
Clik here to view.

Errors about contigs in BAM or VCF files not being properly ordered or sorted

This is not as common as the "wrong reference build" problem, but it still pops up every now and then: a collaborator gives you a BAM or VCF file that's derived from the correct reference, but for...

View Article

Missing annotations in the output callset VCF

The problem You specified -A <some annotation> in a command line invoking one of the annotation-capable tools (HaplotypeCaller, MuTect2, GenotypeGVCFs and VariantAnnotator), but that annotation...

View Article

Image may be NSFW.
Clik here to view.

Expected variant at a specific site was not called

This can happen when you expect a call to be made based on the output of other variant calling tools, or based on examination of the data in a genome browser like IGV. There are several possibilities,...

View Article


Image may be NSFW.
Clik here to view.

Need to run programs that require different versions of Java

The problem We sometimes need to be able to use multiple versions of Java on the same computer to run command-line tools that have different version requirements. For example, at one point, GATK...

View Article

Errors about misencoded quality scores

The problem You get an error like this: SAM/BAM/CRAM file <filename> appears to be using the wrong encoding for quality scores Why this happens The standard format for quality score encodings is...

View Article


Errors about read group (RG) information

See the Dictionary entry on read groups for more information about what they represent and why they're very important. Note that the command line examples in this article have not yet been updated for...

View Article

Image may be NSFW.
Clik here to view.

Java version issues

As documented here, GATK requires a particular major version of Java. If you try to run it with any other version, you'll get an error that will include this line: Unsupported major.minor version To...

View Article


Image may be NSFW.
Clik here to view.

Pipelining recommendations

We use Cromwell + WDL for all batch execution purposes. WDL is a community-driven user-friendly scripting language managed by the OpenWDL organization. Cromwell is an open-source workflow execution...

View Article

Image may be NSFW.
Clik here to view.

GATK on Amazon Web Services

We are soon adding support for running Cromwell on AWS Batch, integrating with AWS products. This will allow you to login with your AWS credentials, access your files in S3, and run your WDL files...

View Article

GATK on Google Cloud

At this time we are able to offer two services for running WDL workflows on Google Cloud using the Cromwell execution engine and the Google Pipelines API. Note that while access to both of these...

View Article
Browsing all 12345 articles
Browse latest View live