So I've ran both the newest version of Picard CollectHsMetrics and an older version CalculateHsMetrics and with both versions I'm getting statistics for everything but the HS_LIBRARY_SIZE which has an empty column (red) and HS_PENALTIES are all 0 (Yellow). Any idea what can cause this? Intervals were created with picard.jar BedToIntervalList and there were no errors in any of the log files. All metrics were populated fine but HS_LIBRARY_SIZE and HS_PENALTY_XX.
HS_LIBRARY_SIZE = "" (that's right, nothing, empty) HS_PENALTY_XX are all 0
Picard SortVcf changing VCF file version
I am using Picard SortVcf to reorder the order to match the order of my reference genome and BAM files. And it works great, however it seems to be changing the VCF format from 4.0 to 4.2, and this is incompatible with the downstream steps I need it for. Is there any workaround for this?
Thanks!
installl GATK 4.beta.5
1.download gatk-4.beta.5.tar.gz
2. tar zxvf gatk-4.beta.5.tar.gz
3. cd gatk-4.beta.5
4. ./gradlew bundle
bug:
FAILURE: Build failed with an exception. * Where: Build file "/mnt/local-disk2/bliu/soft/gatk-4.beta.5/build.gradle" line: 244
>
* What went wrong: A problem occurred evaluating root project "gatk". Cannot find ".git" directory
How to deal with it?
Test sample data of GATK Best Practice
Hi GATK team,
I want to know test-data set of GATK Best Practice.
My aim is check GATK4 perfect-run on my environment.
Especially, from "fastq and reference" to variant-called-vcf file.
(both Germline and Somatic).
I want to run GATK4_best_practice_pipeline in my environment.
And after I got result, I want to check my result compare with "GATK team right answer" result.
So, If you don't mind, Would you show me the data_set for testing GATK4 ?
(If it's ok, The data_set is little and can_run_faster, It is more nice for me...)
Regards,
Should I remove unmapped bam files from the data for variant calling analysis?
I heard GATK uses unmapped bam during realignment. So is it necessary to remove unmapped regions in bam file?
Re-starting GenotypeGVCFs from certain position?
Hi,
I was joint genotyping about 400 samples with GenotypeGVCFs when I had node failure. The run was at about 83% which two weeks to get there. Is there a way for me to continue from this position rather than re-start the entire joint genotyping process? Thanks.
Mutect2 --artifact_detection_mode
I have a question how to create Panel of Normals (PON) using Mutect2: do we still need to add the dbSNP and Cosmic parameters when running the following command for each normal BAM in artifact_detection_mode? Will command lines with/without dbSNP and cosmic affect the PON VCF calling?
"java -jar GenomeAnalysisTK.jar \
-T MuTect2 \
-R reference.fasta \
-I:tumor normal1.bam \
[--dbsnp dbSNP.vcf] \
[--cosmic COSMIC.vcf] \
--artifact_detection_mode \
[-L targets.interval_list] \
-o output.normal1.vcf"
Thanks,
ERROR MESSAGE: Unable to read index file, for input source:
Hi,GATK term
I have set up a bwa + gatk best pratices pipeline for variant calling on pannel data(700 samples ).
I use the haplotype caller in gvcf mode.The bam to the gvcf commands is :
java -Xmx20g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R human_g1k_v37_decoy.fasta -I TR_713.final.bam -nct 4 -ERC GVCF -o TR_713.GATK.var.g.vcf
When I run the command, there is no error messages , it finshed successful .
But when I joint gvcf to vcf with the following commands:
java -Xmx60g -jar GenomeAnalysisTK.jar \
-T GenotypeGVCFs \
-R human_g1k_v37_decoy.fasta \
-nt 6 \
-V *g..vcf \
......
-o All.TR.raw.vcf
After running this commands, I got the error message is as follows:
ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version nightly-2016-09-23-gfade77f):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Unable to read index file, for input source: TR_713.GATK.var.g.vcf.idx
ERROR ------------------------------------------------------------------------------------------
Every times I run this commands , it would have the error message but with different sample .
Appreciate your help very much.
Best
May
I need help. GATK Haplotypecaller says my dict is empty but it isn't??
The Error message I get is the following
INFO 12:25:14,659 HelpFormatter - ----------------------------------------------------------------------------------
INFO 12:25:14,662 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.6-0-g89b7209, Compiled 2016/06/01 22:27:29
INFO 12:25:14,662 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 12:25:14,663 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk
INFO 12:25:14,663 HelpFormatter - [Thu Jan 12 12:25:14 EST 2017] Executing on Linux 2.6.32-573.26.1.el6.x86_64 amd64
INFO 12:25:14,663 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13 JdkDeflater
INFO 12:25:14,667 HelpFormatter - Program Args: -R /ufrc/ewang/carl.shotwell/maps/mm10/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/mm10/hg19.fa -T HaplotypeCaller -I Control1_fixed.bam -stand_call_conf 10 -stand_emit_conf 10 -o Control1.raw.snps.indels.vcf
INFO 12:25:14,671 HelpFormatter - Executing as carl.shotwell@c23b-s1.ufhpc on Linux 2.6.32-573.26.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13.
INFO 12:25:14,671 HelpFormatter - Date/Time: 2017/01/12 12:25:14
INFO 12:25:14,671 HelpFormatter - ----------------------------------------------------------------------------------
INFO 12:25:14,671 HelpFormatter - ----------------------------------------------------------------------------------
INFO 12:25:14,705 GenomeAnalysisEngine - Strictness is SILENT
INFO 12:25:14,853 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
INFO 12:25:14,860 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 12:25:14,998 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.14
INFO 12:25:15,061 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO 12:25:15,064 GenomeAnalysisEngine - Reads file is unmapped. Skipping validation against reference.
INFO 12:25:15,166 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
ERROR --
ERROR stack trace
java.lang.IllegalArgumentException: Dictionary cannot have size zero
at org.broadinstitute.gatk.utils.MRUCachingSAMSequenceDictionary.(MRUCachingSAMSequenceDictionary.java:62)
at org.broadinstitute.gatk.utils.GenomeLocParser$1.initialValue(GenomeLocParser.java:78)
at org.broadinstitute.gatk.utils.GenomeLocParser$1.initialValue(GenomeLocParser.java:75)
at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180)
at java.lang.ThreadLocal.get(ThreadLocal.java:170)
at org.broadinstitute.gatk.utils.GenomeLocParser.getContigInfo(GenomeLocParser.java:91)
at org.broadinstitute.gatk.utils.GenomeLocParser.getContigs(GenomeLocParser.java:204)
at org.broadinstitute.gatk.utils.GenomeLocParser.(GenomeLocParser.java:135)
at org.broadinstitute.gatk.utils.GenomeLocParser.(GenomeLocParser.java:108)
at org.broadinstitute.gatk.utils.GenomeLocSortedSet.createSetFromSequenceDictionary(GenomeLocSortedSet.java:421)
at org.broadinstitute.gatk.engine.datasources.reads.BAMScheduler.createOverMappedReads(BAMScheduler.java:66)
at org.broadinstitute.gatk.engine.datasources.reads.IntervalSharder.shardOverMappedReads(IntervalSharder.java:55)
at org.broadinstitute.gatk.engine.datasources.reads.SAMDataSource.createShardIteratorOverMappedReads(SAMDataSource.java:1217)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getShardStrategy(GenomeAnalysisEngine.java:657)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:307)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Dictionary cannot have size zero
ERROR ------------------------------------------------------------------------------------------
However, I have checked both the dict and the reference index, built them again manually, and they do not have a size of 0.
Here is my code
java -jar GenomeAnalysisTK.jar -R /ufrc/ewang/carl.shotwell/maps/mm10/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/mm10/hg19.fa -T HaplotypeCaller -I Control1_fixed.bam -stand_call_conf 10 -stand_emit_conf 10 -o Control1.raw.snps.indels.vcf
GATK 3.8-0 PrintReads fatal error
Hello,
Could you please help me to figure out this fatal error in running PrintReads?
After I updated GATK to version 3.8-0. I kept getting this fatal error in running PrintReads. I can skip this step and run HaplotypeCaller with -BQSR option.
parsing sample: SRR098333
INFO 17:12:12,287 HelpFormatter - ----------------------------------------------------------------------------------
INFO 17:12:12,289 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50
INFO 17:12:12,289 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 17:12:12,289 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 17:12:12,289 HelpFormatter - [Sat Sep 09 17:12:12 EDT 2017] Executing on Linux 2.6.32-358.23.2.el6.x86_64 amd64
INFO 17:12:12,289 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17
INFO 17:12:12,293 HelpFormatter - Program Args: -T PrintReads -nct 8 -R ./refs/GATK_Resource_Bundle/b37/human_g1k_v37.fasta -BQSR SRR098333.recal_data.table -I SRR098333.bwa_mem.sorted_dups_removed_indelrealigner.bam -o SRR098333.bwa_mem.sorted_dups_removed_indelrealigner_BQSR.bam
...
INFO 17:43:53,363 ProgressMeter - 3:83180289 4.6776207E7 31.7 m 40.0 s 18.6% 2.8 h 2.3 h
#
A fatal error has been detected by the Java Runtime Environment:
#
SIGSEGV (0xb) at pc=0x00007fe6c2da3f8b, pid=1932, tid=140629048932096
#
JRE version: Java(TM) SE Runtime Environment (8.0_65-b17) (build 1.8.0_65-b17)
Java VM: Java HotSpot(TM) 64-Bit Server VM (25.65-b01 mixed mode linux-amd64 )
Problematic frame:
V [libjvm.so+0x64bf8b] InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x16b
#
Core dump written. Default location: core or core.1932
#
An error report file with more information is saved as:
hs_err_pid1932.log
#
If you would like to submit a bug report, please visit:
http://bugreport.java.com/bugreport/crash.jsp
#
/var/spool/slurmd/job465052/slurm_script: line 18: 1932 Aborted (core dumped) java -Xms16g -Xmx200g -jar /home/apps/GATK/GenomeAnalysisTK-3.8.0/GenomeAnalysisTK.jar -T PrintReads -nct 8 -R ./refs/GATK_Resource_Bundle/b37/human_g1k_v37.fasta -BQSR SRR09833$SLURM_ARRAY_TASK_ID.recal_data.table -I SRR09833$SLURM_ARRAY_TASK_ID.bwa_mem.sorted_dups_removed_indelrealigner.bam -o SRR09833$SLURM_ARRAY_TASK_ID.bwa_mem.sorted_dups_removed_indelrealigner_BQSR.bam
How do I submit a detailed bug report?
Note: only do this if you have been explicitly asked to do so.
Scenario:
You posted a question about a problem you had with GATK tools, we answered that we think it's a bug, and we asked you to submit a detailed bug report.
Here's what you need to provide:
- The exact command line that you used when you had the problem (in a text file)
- The full log output (program output in the console) from the start of the run to the end or error message (in a text file)
- A snippet of the BAM file if applicable and the index (.bai) file associated with it
- If a non-standard reference (i.e. not available in our resource bundle) was used, we need the .fasta, .fai, and .dict files for the reference
- Any other relevant files such as recalibration plots
A snippet file is a slice of the original BAM file which contains the problematic region and is sufficient to reproduce the error. We need it in order to reproduce the problem on our end, which is the first necessary step to finding and fixing the bug. We ask you to provide this as a snippet rather than the full file so that you don't have to upload (and we don't have to process) huge giga-scale files.
Here's how you create a snippet file:
- Look at the error message and see if it cites a specific position where the error occurred
- If not, identify what region caused the problem by running with
-L
argument and progressively narrowing down the interval - Once you have the region, use PrintReads with
-L
to write the problematic region (with 500 bp padding on either side) to a new file -- this is your snippet file. - Test your command line on this snippet file to make sure you can still reproduce the error on it.
And finally, here's how you send us the files:
- Put all those files into a
.zip
or.tar.gz
archive Upload them onto our FTP server with the following credentials:
location: ftp.broadinstitute.org username: gsapubftp password: 5WvQWSfi
Post in the original discussion thread that you have done this
- Be sure to tell us the name of your archive file!
We will get back to you --hopefully with a bug fix!-- as soon as we can.
GATK4 run on spark cluster,-Unable to find _SUCCESS file
Hello:
I tried GATK4 on spark cluster ,but the output bam result could not on my spark output path (HDFS).
A USER ERROR has occurred: Couldn't write file /user/zhusitao/output/Mbam because writing failed with exception /user/zhusitao/output/Mbam.parts/_SUCCESS: Unable to find _SUCCESS file. Could you give me some advice to run GATK4 successfully on spark cluster.I will be waiting for your reply!
Best wishes!
Sitao Zhu
Is there any way to generate interval list from available exome data?
Hi all,
I am following the "best practice" suggested by broad institute to call variants from whole exome sequencing data. Currently, I am using Mutect2 to call variants from tumor sample and normal sample based on latest reference genome GRCh38. But, I don't have interval list to use -L option. Is there any way to generate interval list from exome sample which I have? or Is there any default interval list for exome data?
Thank You
Change format of AD field to Number=R?
Hi
In GATK version 3.5 I see the following in VCF headers:
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
However, the number of values in the AD field should always be the number of alleles (including the reference), right? The 4.2 VCF spec has a value R to represent this. Therefore, could the header line be changed to the following in future GATK releases?
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
Why is this important? Well, one reason is that bcftools norm uses this when splitting multiallelic variants into multiple biallelics. With the '.' this isn't done correctly, but with the 'R' it is. See the comment from freeseek at https://github.com/samtools/bcftools/issues/40 for further details.
(howto) Recalibrate base quality scores = run BQSR
Objective
Recalibrate base quality scores in order to correct sequencing errors and other experimental artifacts.
Prerequisites
- TBD
Steps
- Analyze patterns of covariation in the sequence dataset
- Do a second pass to analyze covariation remaining after recalibration
- Generate before/after plots
- Apply the recalibration to your sequence data
1. Analyze patterns of covariation in the sequence dataset
Action
Run the following GATK command:
java -jar GenomeAnalysisTK.jar \
-T BaseRecalibrator \
-R reference.fa \
-I input_reads.bam \
-L 20 \
-knownSites dbsnp.vcf \
-knownSites gold_indels.vcf \
-o recal_data.table
Expected Result
This creates a GATKReport file called recal_data.table
containing several tables. These tables contain the covariation data that will be used in a later step to recalibrate the base qualities of your sequence data.
It is imperative that you provide the program with a set of known sites, otherwise it will refuse to run. The known sites are used to build the covariation model and estimate empirical base qualities. For details on what to do if there are no known sites available for your organism of study, please see the online GATK documentation.
Note that -L 20
is used here and in the next steps to restrict analysis to only chromosome 20 in the b37 human genome reference build. To run against a different reference, you may need to change the name of the contig according to the nomenclature used in your reference.
2. Do a second pass to analyze covariation remaining after recalibration
Action
Run the following GATK command:
java -jar GenomeAnalysisTK.jar \
-T BaseRecalibrator \
-R reference.fa \
-I input_reads.bam \
-L 20 \
-knownSites dbsnp.vcf \
-knownSites gold_indels.vcf \
-BQSR recal_data.table \
-o post_recal_data.table
Expected Result
This creates another GATKReport file, which we will use in the next step to generate plots. Note the use of the -BQSR
flag, which tells the GATK engine to perform on-the-fly recalibration based on the first recalibration data table.
3. Generate before/after plots
Action
Run the following GATK command:
java -jar GenomeAnalysisTK.jar \
-T AnalyzeCovariates \
-R reference.fa \
-L 20 \
-before recal_data.table \
-after post_recal_data.table \
-plots recalibration_plots.pdf
Expected Result
This generates a document called recalibration_plots.pdf
containing plots that show how the reported base qualities match up to the empirical qualities calculated by the BaseRecalibrator. Comparing the before and after plots allows you to check the effect of the base recalibration process before you actually apply the recalibration to your sequence data. For details on how to interpret the base recalibration plots, please see the online GATK documentation.
4. Apply the recalibration to your sequence data
Action
Run the following GATK command:
java -jar GenomeAnalysisTK.jar \
-T PrintReads \
-R reference.fa \
-I input_reads.bam \
-L 20 \
-BQSR recal_data.table \
-o recal_reads.bam
Expected Result
This creates a file called recal_reads.bam
containing all the original reads, but now with exquisitely accurate base substitution, insertion and deletion quality scores. By default, the original quality scores are discarded in order to keep the file size down. However, you have the option to retain them by adding the flag –emit_original_quals
to the PrintReads command, in which case the original qualities will also be written in the file, tagged OQ
.
Notice how this step uses a very simple tool, PrintReads, to apply the recalibration. What’s happening here is that we are loading in the original sequence data, having the GATK engine recalibrate the base qualities on-the-fly thanks to the -BQSR
flag (as explained earlier), and just using PrintReads to write out the resulting data to the new file.
Tutorial data not accessible anymore?
Hi,
I am trying to follow the tutorial (https://software.broadinstitute.org/gatk/documentation/topic?name=tutorials) but unfortunately the data are not accessible anymore. I tried to get some via https://drive.google.com/drive/folders/1dS9wr_h6nh3BhPp1KKTGyXJS3upTw4j0 but it seems, once in igv that chromosome 20 is devoid of data..
java : java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
IGV version 2.3.97 (157)
Linux Ubuntu 16.04
Thank you very much,
Picard IlluminaBaseCallsToSam - clocs file issue - more elements than expected
I am attempting to demultiplex a lane of Illumina HiSeq2500 data using Picard IlluminaBasecallsToSam (v2.5, Java(TM) SE Runtime Environment (build 1.8.0_20-b26)). This tool fails to complete and states that
picard.PicardException: Read the number of expected bins( 65600) but still had more elements in file( /Illumina/Basecalls/170808_SN218_0895_AHN3VWBCXY/Data/Intensities/L001/s_1_1107.clocs)
I've also run Picard CheckIlluminaDirectory and everything turns out to be fine. Also, I can successfully demultiplex lane 2 from this run that has the same read-structure. I suspected that a file was corrupted during network transfer so I re-ran RTA but Picard IlluminaBasecallsToSam still has the same result.
I'd appreciate any idea about what could be causing this error.
Complete error below:
[Tue Aug 15 11:01:59 EDT 2017] picard.illumina.IlluminaBasecallsToSam BASECALLS_DIR=/Illumina/Basecalls/170808_SN218_0895_AHN3VWBCXY/Data/Intensities/BaseCalls BARCODES_DIR=/Project/Capture LANE=1 RUN_BARCODE=HN3VWBCXY170810 READ_GROUP_ID=HN3VWBCXY170810 SEQUENCING_CENTER=ABC READ_STRUCTURE=98T8B6M8B98T LIBRARY_PARAMS=/Project/DemultCapture.txt NUM_PROCESSORS=6 IGNORE_UNEXPECTED_BARCODES=true TMP_DIR=[/Project/TMP] PLATFORM=illumina ADAPTERS_TO_CHECK=[INDEXED, DUAL_INDEXED, NEXTERA_V2, FLUIDIGM] FORCE_GC=true APPLY_EAMSS_FILTER=true MAX_READS_IN_RAM_PER_TILE=1200000 MINIMUM_QUALITY=2 INCLUDE_NON_PF_READS=true MOLECULAR_INDEX_TAG=RX MOLECULAR_INDEX_BASE_QUALITY_TAG=QX VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue Aug 15 11:01:59 EDT 2017] Executing as rb@rcsgc22 on Linux 2.6.32-358.2.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26; Picard version: 2.5.0(2c370988aefe41f579920c8a6a678a201c5261c1_1466708365)
INFO 2017-08-15 11:04:47 IlluminaBasecallsToSam DONE_READING STRUCTURE IS 98T8B6M8B98T
INFO 2017-08-15 11:05:36 IlluminaBasecallsConverter Read 1,000,000 records. Elapsed time: 00:03:36s. Time for last 1,000,000: 35s. Last read position: */*
INFO 2017-08-15 11:06:14 IlluminaBasecallsConverter Read 2,000,000 records. Elapsed time: 00:04:14s. Time for last 1,000,000: 37s. Last read position: */*
INFO 2017-08-15 11:06:32 IlluminaBasecallsConverter Read 3,000,000 records. Elapsed time: 00:04:32s. Time for last 1,000,000: 18s. Last read position: */*
INFO 2017-08-15 11:07:00 IlluminaBasecallsConverter Read 4,000,000 records. Elapsed time: 00:04:59s. Time for last 1,000,000: 27s. Last read position: */*
INFO 2017-08-15 11:07:00 IlluminaBasecallsConverter Before explicit GC, Runtime.totalMemory()=6693060608
INFO 2017-08-15 11:07:01 IlluminaBasecallsConverter After explicit GC, Runtime.totalMemory()=6512705536
INFO 2017-08-15 11:07:22 IlluminaBasecallsConverter Read 5,000,000 records. Elapsed time: 00:05:22s. Time for last 1,000,000: 22s. Last read position: */*
INFO 2017-08-15 11:07:33 IlluminaBasecallsConverter Read 6,000,000 records. Elapsed time: 00:05:33s. Time for last 1,000,000: 11s. Last read position: */*
INFO 2017-08-15 11:07:41 IlluminaBasecallsConverter Read 7,000,000 records. Elapsed time: 00:05:41s. Time for last 1,000,000: 8s. Last read position: */*
INFO 2017-08-15 11:07:56 IlluminaBasecallsConverter Read 8,000,000 records. Elapsed time: 00:05:56s. Time for last 1,000,000: 14s. Last read position: */*
INFO 2017-08-15 11:08:09 IlluminaBasecallsConverter Read 9,000,000 records. Elapsed time: 00:06:09s. Time for last 1,000,000: 12s. Last read position: */*
INFO 2017-08-15 11:08:24 IlluminaBasecallsConverter Read 10,000,000 records. Elapsed time: 00:06:24s. Time for last 1,000,000: 15s. Last read position: */*
INFO 2017-08-15 11:08:35 IlluminaBasecallsConverter Read 11,000,000 records. Elapsed time: 00:06:35s. Time for last 1,000,000: 10s. Last read position: */*
INFO 2017-08-15 11:08:48 IlluminaBasecallsConverter Read 12,000,000 records. Elapsed time: 00:06:48s. Time for last 1,000,000: 12s. Last read position: */*
INFO 2017-08-15 11:08:58 IlluminaBasecallsConverter Read 13,000,000 records. Elapsed time: 00:06:58s. Time for last 1,000,000: 10s. Last read position: */*
INFO 2017-08-15 11:09:06 IlluminaBasecallsConverter Read 14,000,000 records. Elapsed time: 00:07:06s. Time for last 1,000,000: 8s. Last read position: */*
INFO 2017-08-15 11:09:16 IlluminaBasecallsConverter Read 15,000,000 records. Elapsed time: 00:07:16s. Time for last 1,000,000: 10s. Last read position: */*
INFO 2017-08-15 11:09:25 IlluminaBasecallsConverter Read 16,000,000 records. Elapsed time: 00:07:24s. Time for last 1,000,000: 8s. Last read position: */*
INFO 2017-08-15 11:09:38 IlluminaBasecallsConverter Read 17,000,000 records. Elapsed time: 00:07:38s. Time for last 1,000,000: 13s. Last read position: */*
INFO 2017-08-15 11:09:57 IlluminaBasecallsConverter Write 1,000,000 records. Elapsed time: 00:07:57s. Time for last 1,000,000: 13s. Last read position: */*
INFO 2017-08-15 11:10:03 IlluminaBasecallsConverter Read 18,000,000 records. Elapsed time: 00:08:03s. Time for last 1,000,000: 25s. Last read position: */*
INFO 2017-08-15 11:10:08 IlluminaBasecallsConverter Write 2,000,000 records. Elapsed time: 00:08:07s. Time for last 1,000,000: 10s. Last read position: */*
INFO 2017-08-15 11:10:17 IlluminaBasecallsConverter Write 3,000,000 records. Elapsed time: 00:08:17s. Time for last 1,000,000: 9s. Last read position: */*
INFO 2017-08-15 11:10:27 IlluminaBasecallsConverter Read 19,000,000 records. Elapsed time: 00:08:26s. Time for last 1,000,000: 23s. Last read position: */*
INFO 2017-08-15 11:10:29 IlluminaBasecallsConverter Write 4,000,000 records. Elapsed time: 00:08:29s. Time for last 1,000,000: 12s. Last read position: */*
INFO 2017-08-15 11:10:39 IlluminaBasecallsConverter Write 5,000,000 records. Elapsed time: 00:08:38s. Time for last 1,000,000: 9s. Last read position: */*
INFO 2017-08-15 11:10:48 IlluminaBasecallsConverter Write 6,000,000 records. Elapsed time: 00:08:48s. Time for last 1,000,000: 9s. Last read position: */*
INFO 2017-08-15 11:10:48 IlluminaBasecallsConverter Read 20,000,000 records. Elapsed time: 00:08:48s. Time for last 1,000,000: 21s. Last read position: */*
INFO 2017-08-15 11:10:59 IlluminaBasecallsConverter Write 7,000,000 records. Elapsed time: 00:08:59s. Time for last 1,000,000: 10s. Last read position: */*
INFO 2017-08-15 11:11:08 IlluminaBasecallsConverter Write 8,000,000 records. Elapsed time: 00:09:08s. Time for last 1,000,000: 9s. Last read position: */*
INFO 2017-08-15 11:11:14 IlluminaBasecallsConverter Read 21,000,000 records. Elapsed time: 00:09:14s. Time for last 1,000,000: 25s. Last read position: */*
INFO 2017-08-15 11:11:18 IlluminaBasecallsConverter Write 9,000,000 records. Elapsed time: 00:09:17s. Time for last 1,000,000: 9s. Last read position: */*
INFO 2017-08-15 11:11:27 IlluminaBasecallsConverter Write 10,000,000 records. Elapsed time: 00:09:27s. Time for last 1,000,000: 9s. Last read position: */*
INFO 2017-08-15 11:11:36 IlluminaBasecallsConverter Read 22,000,000 records. Elapsed time: 00:09:36s. Time for last 1,000,000: 22s. Last read position: */*
INFO 2017-08-15 11:11:37 IlluminaBasecallsConverter Write 11,000,000 records. Elapsed time: 00:09:36s. Time for last 1,000,000: 9s. Last read position: */*
INFO 2017-08-15 11:11:46 IlluminaBasecallsConverter Write 12,000,000 records. Elapsed time: 00:09:45s. Time for last 1,000,000: 8s. Last read position: */*
INFO 2017-08-15 11:11:55 IlluminaBasecallsConverter Write 13,000,000 records. Elapsed time: 00:09:55s. Time for last 1,000,000: 9s. Last read position: */*
INFO 2017-08-15 11:11:56 IlluminaBasecallsConverter Read 23,000,000 records. Elapsed time: 00:09:56s. Time for last 1,000,000: 20s. Last read position: */*
INFO 2017-08-15 11:12:00 IlluminaBasecallsConverter Before explicit GC, Runtime.totalMemory()=7281311744
INFO 2017-08-15 11:12:00 IlluminaBasecallsConverter After explicit GC, Runtime.totalMemory()=7281311744
Exception in thread "pool-1-thread-6" ERROR 2017-08-15 11:12:04 IlluminaBasecallsConverter Failure encountered in worker thread; attempting to shut down remaining worker threads and terminate ...
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator.awaitWorkComplete(IlluminaBasecallsConverter.java:709)
at picard.illumina.IlluminaBasecallsConverter.doTileProcessing(IlluminaBasecallsConverter.java:318)
at picard.illumina.IlluminaBasecallsToSam.doWork(IlluminaBasecallsToSam.java:230)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
picard.PicardException: Read the number of expected bins( 65600) but still had more elements in file( /Illumina/Basecalls/170808_SN218_0895_AHN3VWBCXY/Data/Intensities/L001/s_1_1107.clocs)
at picard.illumina.parser.readers.ClocsFileReader.hasNext(ClocsFileReader.java:150)
at picard.illumina.parser.PosParser$1.hasNext(PosParser.java:98)
at picard.illumina.parser.PerTileParser.hasNext(PerTileParser.java:120)
at picard.illumina.parser.PerTileParser.maybeAdvance(PerTileParser.java:99)
at picard.illumina.parser.PerTileParser.next(PerTileParser.java:109)
at picard.illumina.parser.IlluminaDataProvider.next(IlluminaDataProvider.java:133)
at picard.illumina.IlluminaBasecallsConverter$TileReader.process(IlluminaBasecallsConverter.java:555)
at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$2.run(IlluminaBasecallsConverter.java:657)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-1-thread-1" java.lang.ArrayIndexOutOfBoundsException
at htsjdk.samtools.util.BlockCompressedOutputStream.deflateBlock(BlockCompressedOutputStream.java:357)
at htsjdk.samtools.util.BlockCompressedOutputStream.write(BlockCompressedOutputStream.java:250)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:206)
at htsjdk.samtools.util.BinaryCodec.writeByteBuffer(BinaryCodec.java:174)
at htsjdk.samtools.util.BinaryCodec.writeInt(BinaryCodec.java:220)
at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:132)
at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:134)
at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:190)
at picard.illumina.IlluminaBasecallsToSam$SAMFileWriterWrapper.write(IlluminaBasecallsToSam.java:483)
at picard.illumina.IlluminaBasecallsToSam$SAMFileWriterWrapper.write(IlluminaBasecallsToSam.java:472)
at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$3.run(IlluminaBasecallsConverter.java:831)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-1-thread-5" java.lang.ArrayIndexOutOfBoundsException
at htsjdk.samtools.util.BlockCompressedOutputStream.deflateBlock(BlockCompressedOutputStream.java:357)
at htsjdk.samtools.util.BlockCompressedOutputStream.write(BlockCompressedOutputStream.java:250)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:206)
at htsjdk.samtools.util.BinaryCodec.writeByteBuffer(BinaryCodec.java:174)
at htsjdk.samtools.util.BinaryCodec.writeInt(BinaryCodec.java:220)
at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:131)
at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:134)
at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:190)
at picard.illumina.IlluminaBasecallsToSam$SAMFileWriterWrapper.write(IlluminaBasecallsToSam.java:483)
at picard.illumina.IlluminaBasecallsToSam$SAMFileWriterWrapper.write(IlluminaBasecallsToSam.java:472)
at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$3.run(IlluminaBasecallsConverter.java:831)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[Tue Aug 15 11:12:06 EDT 2017] picard.illumina.IlluminaBasecallsToSam done. Elapsed time: 10.12 minutes.
Runtime.totalMemory()=7301234688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Failure encountered in worker thread; see log for details.
at picard.illumina.IlluminaBasecallsConverter.doTileProcessing(IlluminaBasecallsConverter.java:321)
at picard.illumina.IlluminaBasecallsToSam.doWork(IlluminaBasecallsToSam.java:230)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Exception in thread "pool-1-thread-3" picard.PicardException: IOException opening cluster binary file /Illumina/Basecalls/170808_SN218_0895_AHN3VWBCXY/Data/Intensities/BaseCalls/L001/s_1_1109.filter
at picard.illumina.parser.readers.MMapBackedIteratorFactory.getBuffer(MMapBackedIteratorFactory.java:119)
at picard.illumina.parser.readers.MMapBackedIteratorFactory.getByteIterator(MMapBackedIteratorFactory.java:66)
at picard.illumina.parser.readers.FilterFileReader.<init>(FilterFileReader.java:68)
at picard.illumina.parser.FilterParser$1.<init>(FilterParser.java:55)
at picard.illumina.parser.FilterParser.makeTileIterator(FilterParser.java:54)
at picard.illumina.parser.PerTileParser.advanceTile(PerTileParser.java:80)
at picard.illumina.parser.PerTileParser.hasNext(PerTileParser.java:121)
at picard.illumina.parser.IlluminaDataProvider.hasNext(IlluminaDataProvider.java:104)
at picard.illumina.IlluminaBasecallsConverter$TileReader.process(IlluminaBasecallsConverter.java:554)
at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$2.run(IlluminaBasecallsConverter.java:657)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:314)
at picard.illumina.parser.readers.MMapBackedIteratorFactory.getBuffer(MMapBackedIteratorFactory.java:113)
... 12 more
Exception in thread "pool-1-thread-2" picard.PicardException: IOException opening cluster binary file /Illumina/Basecalls/170808_SN218_0895_AHN3VWBCXY/Data/Intensities/L001/s_1_1110.clocs
at picard.illumina.parser.readers.MMapBackedIteratorFactory.getBuffer(MMapBackedIteratorFactory.java:119)
at picard.illumina.parser.readers.MMapBackedIteratorFactory.getByteIterator(MMapBackedIteratorFactory.java:66)
at picard.illumina.parser.readers.ClocsFileReader.<init>(ClocsFileReader.java:85)
at picard.illumina.parser.PosParser.makeTileIterator(PosParser.java:83)
at picard.illumina.parser.PerTileParser.advanceTile(PerTileParser.java:80)
at picard.illumina.parser.PerTileParser.hasNext(PerTileParser.java:121)
at picard.illumina.parser.PerTileParser.maybeAdvance(PerTileParser.java:99)
at picard.illumina.parser.PerTileParser.next(PerTileParser.java:109)
at picard.illumina.parser.IlluminaDataProvider.next(IlluminaDataProvider.java:133)
at picard.illumina.IlluminaBasecallsConverter$TileReader.process(IlluminaBasecallsConverter.java:555)
at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$2.run(IlluminaBasecallsConverter.java:657)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:314)
at picard.illumina.parser.readers.MMapBackedIteratorFactory.getBuffer(MMapBackedIteratorFactory.java:113)
... 13 more
Exception in thread "pool-1-thread-7" picard.PicardException: Error reading from file /Project/CorrectCapture/s_1_1111_barcode.txt
at picard.util.BasicInputParser.readNextLine(BasicInputParser.java:120)
at picard.util.AbstractInputParser.advance(AbstractInputParser.java:85)
at picard.util.AbstractInputParser.advance(AbstractInputParser.java:44)
at htsjdk.samtools.util.AbstractIterator.hasNext(AbstractIterator.java:44)
at picard.illumina.parser.readers.BarcodeFileReader.hasNext(BarcodeFileReader.java:42)
at picard.illumina.parser.BarcodeParser$BarcodeDataIterator.hasNext(BarcodeParser.java:69)
at picard.illumina.parser.PerTileParser.hasNext(PerTileParser.java:120)
at picard.illumina.parser.IlluminaDataProvider.hasNext(IlluminaDataProvider.java:104)
at picard.illumina.IlluminaBasecallsConverter$TileReader.process(IlluminaBasecallsConverter.java:554)
at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$2.run(IlluminaBasecallsConverter.java:657)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: htsjdk.samtools.util.RuntimeIOException: java.nio.channels.ClosedByInterruptException
at htsjdk.samtools.util.BufferedLineReader.readLine(BufferedLineReader.java:74)
at picard.util.BasicInputParser.readNextLine(BasicInputParser.java:103)
... 12 more
Caused by: java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:163)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at htsjdk.samtools.util.BufferedLineReader.readLine(BufferedLineReader.java:70)
... 13 more
Should CheckIlluminaDirectory be able to handle a non-standard read structure
I have a flowcell (from a 10x library) in which the 'natural' read structure is 178T8B14B5T. However, I want to interpret the flowcell as 178T8B14T5S, so that is what I passed to CheckIlluminaDirectory. I get the exception below. It looks like the code is trying to check all the cycles, including the skips. However, CbclReader.outputCycles is initialized only with enough elements to hold the non-skip cycles. Is this a bug? Or is it wrong to pass a read structure with skips in it?
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 200
at picard.illumina.parser.readers.CbclReader.readSurfaceTile(CbclReader.java:119)
at picard.illumina.parser.readers.CbclReader.(CbclReader.java:102)
at picard.illumina.CheckIlluminaDirectory.doWork(CheckIlluminaDirectory.java:170)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
Tutorial files provenance: ASHG15
This document is intended to be a record of how the tutorial files were prepared for the AHSG 2015 hands-on workshop.
Reference genome
This produces a 64 Mb file (uncompressed) which is small enough for our purposes, so we don't need to truncate it further, simplifying future data file preparations.
# Extract just chromosome 20
samtools faidx /humgen/gsa-hpprojects/GATK/bundle/current/b37/human_g1k_v37.fasta 20 > human_g1k_b37_20.fasta
# Create the reference index
samtools faidx human_g1k_b37_20.fasta
# Create sequence dictionary
java -jar $PICARD CreateSequenceDictionary R=human_g1k_b37_20.fasta O=human_g1k_b37_20.dict
# Recap files
-rw-rw-r-- 1 vdauwera wga 164 Oct 1 14:56 human_g1k_b37_20.dict
-rw-rw-r-- 1 vdauwera wga 64075950 Oct 1 14:41 human_g1k_b37_20.fasta
-rw-rw-r-- 1 vdauwera wga 20 Oct 1 14:46 human_g1k_b37_20.fasta.fai
Sequence data
We are using the 2nd generation CEU Trio of NA12878 and her husband and child in a WGS dataset produced at Broad with files names after the library preps, Solexa-xxxxxx.bam.
1. Extract just chromosome 20:10M-20M bp and filter out chimeric pairs with -rf BadMate
java -jar $GATK -T PrintReads -R /path/to/bundle/current/b37/human_g1k_v37_decoy.fasta -I /path/to/Solexa-272221.bam -o NA12877_wgs_20_10M20M.bam -L 20:10000000-20000000 -rf BadMate
java -jar $GATK -T PrintReads -R /path/to/bundle/current/b37/human_g1k_v37_decoy.fasta -I /path/to/Solexa-272222.bam -o NA12878_wgs_20_10M20M.bam -L 20:10000000-20000000 -rf BadMate
java -jar $GATK -T PrintReads -R /path/to/bundle/current/b37/human_g1k_v37_decoy.fasta -I /path/to/Solexa-272228.bam -o NA12882_wgs_20_10M20M.bam -L 20:10000000-20000000 -rf BadMate
# Recap files
-rw-rw-r-- 1 vdauwera wga 36240 Oct 2 11:55 NA12877_wgs_20_10M20M.bai
-rw-rw-r-- 1 vdauwera wga 512866085 Oct 2 11:55 NA12877_wgs_20_10M20M.bam
-rw-rw-r-- 1 vdauwera wga 36176 Oct 2 11:53 NA12878_wgs_20_10M20M.bai
-rw-rw-r-- 1 vdauwera wga 502282846 Oct 2 11:53 NA12878_wgs_20_10M20M.bam
-rw-rw-r-- 1 vdauwera wga 36464 Oct 2 12:00 NA12882_wgs_20_10M20M.bai
-rw-rw-r-- 1 vdauwera wga 505001668 Oct 2 12:00 NA12882_wgs_20_10M20M.bam
2. Extract headers and edit manually to remove all contigs except 20 and sanitize internal filepaths
samtools view -H NA12877_wgs_20_10M20M.bam > NA12877_header.txt
samtools view -H NA12878_wgs_20_10M20M.bam > NA12878_header.txt
samtools view -H NA12882_wgs_20_10M20M.bam > NA12882_header.txt
Manual editing is not represented here; basically just delete unwanted contig SQ lines and remove identifying info from internal filepaths.
3. Flip BAM to SAM
java -jar $PICARD SamFormatConverter I=NA12877_wgs_20_10M20M.bam O=NA12877_wgs_20_10M20M.sam
java -jar $PICARD SamFormatConverter I=NA12878_wgs_20_10M20M.bam O=NA12878_wgs_20_10M20M.sam
java -jar $PICARD SamFormatConverter I=NA12882_wgs_20_10M20M.bam O=NA12882_wgs_20_10M20M.sam
#Recap files
-rw-rw-r-- 1 vdauwera wga 1694169101 Oct 2 12:28 NA12877_wgs_20_10M20M.sam
-rw-rw-r-- 1 vdauwera wga 1661483309 Oct 2 12:30 NA12878_wgs_20_10M20M.sam
-rw-rw-r-- 1 vdauwera wga 1696553456 Oct 2 12:31 NA12882_wgs_20_10M20M.sam
4. Re-header the SAMs
java -jar $PICARD ReplaceSamHeader I=NA12877_wgs_20_10M20M.sam O=NA12877_wgs_20_10M20M_RH.sam HEADER=NA12877_header.txt
java -jar $PICARD ReplaceSamHeader I=NA12878_wgs_20_10M20M.sam O=NA12878_wgs_20_10M20M_RH.sam HEADER=NA12878_header.txt
java -jar $PICARD ReplaceSamHeader I=NA12882_wgs_20_10M20M.sam O=NA12882_wgs_20_10M20M_RH.sam HEADER=NA12882_header.txt
# Recap files
-rw-rw-r-- 1 vdauwera wga 1694153715 Oct 2 12:35 NA12877_wgs_20_10M20M_RH.sam
-rw-rw-r-- 1 vdauwera wga 1661467923 Oct 2 12:37 NA12878_wgs_20_10M20M_RH.sam
-rw-rw-r-- 1 vdauwera wga 1696538104 Oct 2 12:38 NA12882_wgs_20_10M20M_RH.sam
5. Sanitize the SAMs to get rid of MATE_NOT_FOUND errors
java -jar $PICARD RevertSam I=NA12877_wgs_20_10M20M_RH.sam O=NA12877_wgs_20_10M20M_RS.sam SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=false REMOVE_DUPLICATE_INFORMATION=false REMOVE_ALIGNMENT_INFORMATION=false ATTRIBUTE_TO_CLEAR=null SANITIZE=true MAX_DISCARD_FRACTION=0.001
java -jar $PICARD RevertSam I=NA12878_wgs_20_10M20M_RH.sam O=NA12878_wgs_20_10M20M_RS.sam SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=false REMOVE_DUPLICATE_INFORMATION=false REMOVE_ALIGNMENT_INFORMATION=false ATTRIBUTE_TO_CLEAR=null SANITIZE=true MAX_DISCARD_FRACTION=0.001
java -jar $PICARD RevertSam I=NA12882_wgs_20_10M20M_RH.sam O=NA12882_wgs_20_10M20M_RS.sam SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=false REMOVE_DUPLICATE_INFORMATION=false REMOVE_ALIGNMENT_INFORMATION=false ATTRIBUTE_TO_CLEAR=null SANITIZE=true MAX_DISCARD_FRACTION=0.001
# Recap files
-rw-rw-r-- 1 vdauwera wga 1683827201 Oct 2 12:45 NA12877_wgs_20_10M20M_RS.sam
-rw-rw-r-- 1 vdauwera wga 1652093793 Oct 2 12:49 NA12878_wgs_20_10M20M_RS.sam
-rw-rw-r-- 1 vdauwera wga 1688143091 Oct 2 12:54 NA12882_wgs_20_10M20M_RS.sam
6. Sort the SAMs, convert back to BAM and create index
java -jar $PICARD SortSam I=NA12877_wgs_20_10M20M_RS.sam O=NA12877_wgs_20_10M20M_V.bam SORT_ORDER=coordinate CREATE_INDEX=TRUE
java -jar $PICARD SortSam I=NA12878_wgs_20_10M20M_RS.sam O=NA12878_wgs_20_10M20M_V.bam SORT_ORDER=coordinate CREATE_INDEX=TRUE
java -jar $PICARD SortSam I=NA12882_wgs_20_10M20M_RS.sam O=NA12882_wgs_20_10M20M_V.bam SORT_ORDER=coordinate CREATE_INDEX=TRUE
#recap files
-rw-rw-r-- 1 vdauwera wga 35616 Oct 2 13:08 NA12877_wgs_20_10M20M_V.bai
-rw-rw-r-- 1 vdauwera wga 508022682 Oct 2 13:08 NA12877_wgs_20_10M20M_V.bam
-rw-rw-r-- 1 vdauwera wga 35200 Oct 2 13:06 NA12878_wgs_20_10M20M_V.bai
-rw-rw-r-- 1 vdauwera wga 497742417 Oct 2 13:06 NA12878_wgs_20_10M20M_V.bam
-rw-rw-r-- 1 vdauwera wga 35632 Oct 2 13:04 NA12882_wgs_20_10M20M_V.bai
-rw-rw-r-- 1 vdauwera wga 500446729 Oct 2 13:04 NA12882_wgs_20_10M20M_V.bam
7. Validate BAMs; should all output "No errors found"
java -jar $PICARD ValidateSamFile I=NA12877_wgs_20_10M20M_V.bam M=SUMMARY
java -jar $PICARD ValidateSamFile I=NA12878_wgs_20_10M20M_V.bam M=SUMMARY
java -jar $PICARD ValidateSamFile I=NA12882_wgs_20_10M20M_V.bam M=SUMMARY
Not enough memory for depth of Coverage as single thread
Hi,
if I run depthOfCoverage without -nt it crashes somewhere in Chromosome 11 with
ERROR MESSAGE: An error occurred because you did not provide enough memory to run this program. You can use the -Xmx argument (before the -jar argument) to adjust the maximum heap size provided to Java. Note that this is a JVM argument, not a GATK argument.
This can't be the real reason since it only uses around 15% of our 512 Gb.
If I try with -nt 48 it finishes even though it loops over the last interval for about 90 minutes before it really stops. And I can't produce the interval files. So it is not an option for us.
I tried with GATK 3.7 and 3.8 on a local Server. The NextSeq run is composed of 25 panel diagnosics and 3 whole Exome samples.
Is there something we can try or is there information if this happens when using too many samples or to large bam files? Does depthOfCoverage limits itself? Why is there a difference between running with -nt and without?
Thanks in advance,
Daniel