This is becoming a bit of a yearly tradition; next week we're heading over to Bio-IT World Expo in Boston (so a short hop across the Charles River) to announce the majorly rebooted version of GATK which we've affectionately dubbed GATK4. Because it will be version 4.
Look, if you've ever seen the names we give our tools, you know that naming things isn't exactly where we put our creativity to work. It's a precious resource, and anyway we rather like things to be self-explanatory.
Yes, technically we already announced GATK4 at Bio-IT last year, but no, this is not a re-run. Last year was a heads-up that we were working on this significant new reimplementation of the toolkit. We were mostly there to talk about the core features of the new framework, which famously excited the Spark-savvy in the crowd (because it supports Apache Spark). But it was definitely still under heavy development; while we had the CNV tools just about ready for testing, as I recall there wasn't even a glimmer of the HaplotypeCaller in there yet.
This year is very different. We have a toolkit that is in the final stages of polishing up for public consumption. We have multiple Best Practices workflows, because we're not just about the germline SNPs and indels anymore. And we also have numbers. Dates for the beta and full releases, performance estimates...
All of which we'll present during a luncheon event we're holding with our wonderful partners at Intel Life Sciences, who have contributed some of GATK4's key new features. The luncheon will take place Wednesday the 24th at 12:40 PM, at a location TBD (because I can't figure it out from the Bio-IT program, which is not self-explanatory). We'll be in Track 1: Data and Storage Management, which may sound super boring (no offense to other speakers in this track) but come on and join us if you can; I predict you'll be pleasantly surprised.
As a coda, we'll be holding Q&A sessions in the Intel Hospitality Suite, aka Dartmouth room in the WTC, at the following times: Wednesday the 24th from 1:30 PM to 3:15 PM, and Thursday the 25th from 10:30 to 11:30 AM. Swing on by if you have any burning questions about GATK4.
We look forward to seeing you there! And if you can't make it because of trivial considerations like geographical incompatibility (oceans, shmoceans), check out this blog or follow @gatk_dev on Twitter. We'll post a summary of the announcements shortly after the luncheon presentation.
Here's the program abstract:
12:40 Luncheon Presentation I: Broad Institute & Intel GATK 4.0 Optimization Overview
Eric Banks, Senior Director, Data Science and Data Engineering Group, Broad Institute
Geraldine Van der Auwera, Associate Director, Outreach and Communications, GATK, Broad Institute
Mark Bagley, Director, Center for Genomic Data Engineering, Intel
Paolo Narvaez, Senior Director, Engineering, Intel
Genomics research leader the Broad Institute of MIT and Harvard joins Intel to describe their collaboration to enhance the GATK environment and scale researchers’ ability to analyze massive amounts of genomic data from diverse sources worldwide. Topics include performance best practices and the latest on Genomics DB and FireCloud.
Note that we're not actually going to talk about FireCloud at the luncheon event (what can I say, abstracts are immutable descriptors of mutable structures) but we will be doing demos of FireCloud throughout Bio-IT at the Google booth. A more detailed announcement will be posted shortly to that effect on the FireCloud blog.
And look, our GATK4 luncheon made it into the official Bio-IT preview!
Eric Banks and Geraldine Van der Auwera of the Broad Institute along with Mark Bagley and Paolo Narvaez of Intel will co-host a luncheon session to describe their collaboration to enhance the GATK environment and scale researchers’ ability to analyze massive amounts of genomic data from diverse sources worldwide. Wednesday, May 24, 12:40 pm