HS_LIBRARY_SIZE = "" (that's right, nothing, empty) HS_PENALTY_XX are all 0

September 14, 2017, 9:33 pm

≫ Next: Picard SortVcf changing VCF file version

≪ Previous: Understanding and adapting the generic hard-filtering recommendations

So I've ran both the newest version of Picard CollectHsMetrics and an older version CalculateHsMetrics and with both versions I'm getting statistics for everything but the HS_LIBRARY_SIZE which has an empty column (red) and HS_PENALTIES are all 0 (Yellow). Any idea what can cause this? Intervals were created with picard.jar BedToIntervalList and there were no errors in any of the log files. All metrics were populated fine but HS_LIBRARY_SIZE and HS_PENALTY_XX.

↧

Picard SortVcf changing VCF file version

September 14, 2017, 9:49 pm

≫ Next: installl GATK 4.beta.5

≪ Previous: HS_LIBRARY_SIZE = "" (that's right, nothing, empty) HS_PENALTY_XX are all 0

I am using Picard SortVcf to reorder the order to match the order of my reference genome and BAM files. And it works great, however it seems to be changing the VCF format from 4.0 to 4.2, and this is incompatible with the downstream steps I need it for. Is there any workaround for this?

Thanks!

↧

installl GATK 4.beta.5

September 14, 2017, 11:27 pm

≫ Next: Test sample data of GATK Best Practice

≪ Previous: Picard SortVcf changing VCF file version

1.download gatk-4.beta.5.tar.gz
2. tar zxvf gatk-4.beta.5.tar.gz
3. cd gatk-4.beta.5
4. ./gradlew bundle
bug:

   FAILURE: Build failed with an exception.

   * Where:
     Build file "/mnt/local-disk2/bliu/soft/gatk-4.beta.5/build.gradle" line: 244

    * What went wrong:
     A problem occurred evaluating root project "gatk".
    Cannot find ".git" directory

How to deal with it?

↧

Test sample data of GATK Best Practice

September 15, 2017, 3:02 am

≫ Next: Should I remove unmapped bam files from the data for variant calling analysis?

≪ Previous: installl GATK 4.beta.5

Hi GATK team,
I want to know test-data set of GATK Best Practice.
My aim is check GATK4 perfect-run on my environment.
Especially, from "fastq and reference" to variant-called-vcf file.
(both Germline and Somatic).

I want to run GATK4_best_practice_pipeline in my environment.
And after I got result, I want to check my result compare with "GATK team right answer" result.

So, If you don't mind, Would you show me the data_set for testing GATK4 ?

(If it's ok, The data_set is little and can_run_faster, It is more nice for me...)

Regards,

↧

Should I remove unmapped bam files from the data for variant calling analysis?

September 15, 2017, 3:53 am

≫ Next: Re-starting GenotypeGVCFs from certain position?

≪ Previous: Test sample data of GATK Best Practice

I heard GATK uses unmapped bam during realignment. So is it necessary to remove unmapped regions in bam file?

↧

Re-starting GenotypeGVCFs from certain position?

September 15, 2017, 6:44 am

≫ Next: Mutect2 --artifact_detection_mode

≪ Previous: Should I remove unmapped bam files from the data for variant calling analysis?

Hi,

I was joint genotyping about 400 samples with GenotypeGVCFs when I had node failure. The run was at about 83% which two weeks to get there. Is there a way for me to continue from this position rather than re-start the entire joint genotyping process? Thanks.

↧

Mutect2 --artifact_detection_mode

September 15, 2017, 6:46 am

≫ Next: ERROR MESSAGE: Unable to read index file, for input source:

≪ Previous: Re-starting GenotypeGVCFs from certain position?

I have a question how to create Panel of Normals (PON) using Mutect2: do we still need to add the dbSNP and Cosmic parameters when running the following command for each normal BAM in artifact_detection_mode? Will command lines with/without dbSNP and cosmic affect the PON VCF calling?
"java -jar GenomeAnalysisTK.jar \
-T MuTect2 \
-R reference.fasta \
-I:tumor normal1.bam \
[--dbsnp dbSNP.vcf] \
[--cosmic COSMIC.vcf] \
--artifact_detection_mode \
[-L targets.interval_list] \
-o output.normal1.vcf"

Thanks,

↧

ERROR MESSAGE: Unable to read index file, for input source:

October 12, 2016, 2:28 am

≫ Next: I need help. GATK Haplotypecaller says my dict is empty but it isn't??

≪ Previous: Mutect2 --artifact_detection_mode

Hi,GATK term
I have set up a bwa + gatk best pratices pipeline for variant calling on pannel data(700 samples ).
I use the haplotype caller in gvcf mode.The bam to the gvcf commands is :
java -Xmx20g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R human_g1k_v37_decoy.fasta -I TR_713.final.bam -nct 4 -ERC GVCF -o TR_713.GATK.var.g.vcf
When I run the command, there is no error messages , it finshed successful .

But when I joint gvcf to vcf with the following commands:

java -Xmx60g -jar GenomeAnalysisTK.jar \
-T GenotypeGVCFs \
-R human_g1k_v37_decoy.fasta \
-nt 6 \
-V *g..vcf \
......
-o All.TR.raw.vcf

After running this commands, I got the error message is as follows:

ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version nightly-2016-09-23-gfade77f):

ERROR

ERROR This means that one or more arguments or inputs in your command are incorrect.

ERROR The error message below tells you what is the problem.

ERROR

ERROR If the problem is an invalid argument, please check the online documentation guide

ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.

ERROR

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://www.broadinstitute.org/gatk

ERROR

ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.

ERROR

ERROR MESSAGE: Unable to read index file, for input source: TR_713.GATK.var.g.vcf.idx

ERROR ------------------------------------------------------------------------------------------

Every times I run this commands , it would have the error message but with different sample .

Appreciate your help very much.

Best
May

↧

I need help. GATK Haplotypecaller says my dict is empty but it isn't??

January 12, 2017, 9:34 am

≫ Next: GATK 3.8-0 PrintReads fatal error

≪ Previous: ERROR MESSAGE: Unable to read index file, for input source:

The Error message I get is the following
INFO 12:25:14,659 HelpFormatter - ----------------------------------------------------------------------------------
INFO 12:25:14,662 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.6-0-g89b7209, Compiled 2016/06/01 22:27:29
INFO 12:25:14,662 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 12:25:14,663 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk
INFO 12:25:14,663 HelpFormatter - [Thu Jan 12 12:25:14 EST 2017] Executing on Linux 2.6.32-573.26.1.el6.x86_64 amd64
INFO 12:25:14,663 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13 JdkDeflater
INFO 12:25:14,667 HelpFormatter - Program Args: -R /ufrc/ewang/carl.shotwell/maps/mm10/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/mm10/hg19.fa -T HaplotypeCaller -I Control1_fixed.bam -stand_call_conf 10 -stand_emit_conf 10 -o Control1.raw.snps.indels.vcf
INFO 12:25:14,671 HelpFormatter - Executing as carl.shotwell@c23b-s1.ufhpc on Linux 2.6.32-573.26.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13.
INFO 12:25:14,671 HelpFormatter - Date/Time: 2017/01/12 12:25:14
INFO 12:25:14,671 HelpFormatter - ----------------------------------------------------------------------------------
INFO 12:25:14,671 HelpFormatter - ----------------------------------------------------------------------------------
INFO 12:25:14,705 GenomeAnalysisEngine - Strictness is SILENT
INFO 12:25:14,853 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
INFO 12:25:14,860 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 12:25:14,998 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.14
INFO 12:25:15,061 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO 12:25:15,064 GenomeAnalysisEngine - Reads file is unmapped. Skipping validation against reference.
INFO 12:25:15,166 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files

ERROR --

ERROR stack trace

java.lang.IllegalArgumentException: Dictionary cannot have size zero
at org.broadinstitute.gatk.utils.MRUCachingSAMSequenceDictionary.(MRUCachingSAMSequenceDictionary.java:62)
at org.broadinstitute.gatk.utils.GenomeLocParser$1.initialValue(GenomeLocParser.java:78)
at org.broadinstitute.gatk.utils.GenomeLocParser$1.initialValue(GenomeLocParser.java:75)
at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180)
at java.lang.ThreadLocal.get(ThreadLocal.java:170)
at org.broadinstitute.gatk.utils.GenomeLocParser.getContigInfo(GenomeLocParser.java:91)
at org.broadinstitute.gatk.utils.GenomeLocParser.getContigs(GenomeLocParser.java:204)
at org.broadinstitute.gatk.utils.GenomeLocParser.(GenomeLocParser.java:135)
at org.broadinstitute.gatk.utils.GenomeLocParser.(GenomeLocParser.java:108)
at org.broadinstitute.gatk.utils.GenomeLocSortedSet.createSetFromSequenceDictionary(GenomeLocSortedSet.java:421)
at org.broadinstitute.gatk.engine.datasources.reads.BAMScheduler.createOverMappedReads(BAMScheduler.java:66)
at org.broadinstitute.gatk.engine.datasources.reads.IntervalSharder.shardOverMappedReads(IntervalSharder.java:55)
at org.broadinstitute.gatk.engine.datasources.reads.SAMDataSource.createShardIteratorOverMappedReads(SAMDataSource.java:1217)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getShardStrategy(GenomeAnalysisEngine.java:657)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:307)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------

ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):

ERROR

ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.

ERROR If not, please post the error message, with stack trace, to the GATK forum.

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://www.broadinstitute.org/gatk

ERROR

ERROR MESSAGE: Dictionary cannot have size zero

ERROR ------------------------------------------------------------------------------------------

However, I have checked both the dict and the reference index, built them again manually, and they do not have a size of 0.
Here is my code
java -jar GenomeAnalysisTK.jar -R /ufrc/ewang/carl.shotwell/maps/mm10/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/mm10/hg19.fa -T HaplotypeCaller -I Control1_fixed.bam -stand_call_conf 10 -stand_emit_conf 10 -o Control1.raw.snps.indels.vcf

↧

GATK 3.8-0 PrintReads fatal error

September 16, 2017, 8:05 pm

≫ Next: How do I submit a detailed bug report?

≪ Previous: I need help. GATK Haplotypecaller says my dict is empty but it isn't??

Hello,

Could you please help me to figure out this fatal error in running PrintReads?

After I updated GATK to version 3.8-0. I kept getting this fatal error in running PrintReads. I can skip this step and run HaplotypeCaller with -BQSR option.

parsing sample: SRR098333
INFO 17:12:12,287 HelpFormatter - ----------------------------------------------------------------------------------
INFO 17:12:12,289 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50
INFO 17:12:12,289 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 17:12:12,289 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 17:12:12,289 HelpFormatter - [Sat Sep 09 17:12:12 EDT 2017] Executing on Linux 2.6.32-358.23.2.el6.x86_64 amd64
INFO 17:12:12,289 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17
INFO 17:12:12,293 HelpFormatter - Program Args: -T PrintReads -nct 8 -R ./refs/GATK_Resource_Bundle/b37/human_g1k_v37.fasta -BQSR SRR098333.recal_data.table -I SRR098333.bwa_mem.sorted_dups_removed_indelrealigner.bam -o SRR098333.bwa_mem.sorted_dups_removed_indelrealigner_BQSR.bam

...

INFO 17:43:53,363 ProgressMeter - 3:83180289 4.6776207E7 31.7 m 40.0 s 18.6% 2.8 h 2.3 h
#

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007fe6c2da3f8b, pid=1932, tid=140629048932096

JRE version: Java(TM) SE Runtime Environment (8.0_65-b17) (build 1.8.0_65-b17)

Java VM: Java HotSpot(TM) 64-Bit Server VM (25.65-b01 mixed mode linux-amd64 )

Problematic frame:

V [libjvm.so+0x64bf8b] InstanceKlass::oop_follow_contents(ParCompactionManager, oopDesc)+0x16b

Core dump written. Default location: core or core.1932

An error report file with more information is saved as:

hs_err_pid1932.log

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

#
/var/spool/slurmd/job465052/slurm_script: line 18: 1932 Aborted (core dumped) java -Xms16g -Xmx200g -jar /home/apps/GATK/GenomeAnalysisTK-3.8.0/GenomeAnalysisTK.jar -T PrintReads -nct 8 -R ./refs/GATK_Resource_Bundle/b37/human_g1k_v37.fasta -BQSR SRR09833$SLURM_ARRAY_TASK_ID.recal_data.table -I SRR09833$SLURM_ARRAY_TASK_ID.bwa_mem.sorted_dups_removed_indelrealigner.bam -o SRR09833$SLURM_ARRAY_TASK_ID.bwa_mem.sorted_dups_removed_indelrealigner_BQSR.bam

↧

How do I submit a detailed bug report?

November 28, 2012, 8:47 am

≫ Next: GATK4 run on spark cluster,-Unable to find _SUCCESS file

≪ Previous: GATK 3.8-0 PrintReads fatal error

Note: only do this if you have been explicitly asked to do so.

Scenario:

You posted a question about a problem you had with GATK tools, we answered that we think it's a bug, and we asked you to submit a detailed bug report.

Here's what you need to provide:

The exact command line that you used when you had the problem (in a text file)
The full log output (program output in the console) from the start of the run to the end or error message (in a text file)
A snippet of the BAM file if applicable and the index (.bai) file associated with it
If a non-standard reference (i.e. not available in our resource bundle) was used, we need the .fasta, .fai, and .dict files for the reference
Any other relevant files such as recalibration plots

A snippet file is a slice of the original BAM file which contains the problematic region and is sufficient to reproduce the error. We need it in order to reproduce the problem on our end, which is the first necessary step to finding and fixing the bug. We ask you to provide this as a snippet rather than the full file so that you don't have to upload (and we don't have to process) huge giga-scale files.

Here's how you create a snippet file:

Look at the error message and see if it cites a specific position where the error occurred
If not, identify what region caused the problem by running with -L argument and progressively narrowing down the interval
Once you have the region, use PrintReads with -L to write the problematic region (with 500 bp padding on either side) to a new file -- this is your snippet file.
Test your command line on this snippet file to make sure you can still reproduce the error on it.

And finally, here's how you send us the files:

Put all those files into a .zip or .tar.gz archive

Upload them onto our FTP server with the following credentials:

location: ftp.broadinstitute.org
username: gsapubftp
password: 5WvQWSfi

Post in the original discussion thread that you have done this
Be sure to tell us the name of your archive file!

We will get back to you --hopefully with a bug fix!-- as soon as we can.

↧

GATK4 run on spark cluster,-Unable to find _SUCCESS file

September 17, 2017, 8:43 pm

≫ Next: Is there any way to generate interval list from available exome data?

≪ Previous: How do I submit a detailed bug report?

Hello：
I tried GATK4 on spark cluster ,but the output bam result could not on my spark output path (HDFS).
A USER ERROR has occurred: Couldn't write file /user/zhusitao/output/Mbam because writing failed with exception /user/zhusitao/output/Mbam.parts/_SUCCESS: Unable to find _SUCCESS file. Could you give me some advice to run GATK4 successfully on spark cluster.I will be waiting for your reply!
Best wishes!
Sitao Zhu

↧

Is there any way to generate interval list from available exome data?

September 18, 2017, 5:09 am

≫ Next: Change format of AD field to Number=R?

≪ Previous: GATK4 run on spark cluster,-Unable to find _SUCCESS file

Hi all,
I am following the "best practice" suggested by broad institute to call variants from whole exome sequencing data. Currently, I am using Mutect2 to call variants from tumor sample and normal sample based on latest reference genome GRCh38. But, I don't have interval list to use -L option. Is there any way to generate interval list from exome sample which I have? or Is there any default interval list for exome data?
Thank You

↧

Change format of AD field to Number=R?

February 2, 2016, 11:41 am

≫ Next: (howto) Recalibrate base quality scores = run BQSR

≪ Previous: Is there any way to generate interval list from available exome data?

In GATK version 3.5 I see the following in VCF headers:

##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

However, the number of values in the AD field should always be the number of alleles (including the reference), right? The 4.2 VCF spec has a value R to represent this. Therefore, could the header line be changed to the following in future GATK releases?

##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

Why is this important? Well, one reason is that bcftools norm uses this when splitting multiallelic variants into multiple biallelics. With the '.' this isn't done correctly, but with the 'R' it is. See the comment from freeseek at https://github.com/samtools/bcftools/issues/40 for further details.

↧

(howto) Recalibrate base quality scores = run BQSR

June 17, 2013, 2:18 pm

≫ Next: Tutorial data not accessible anymore?

≪ Previous: Change format of AD field to Number=R?

Objective

Recalibrate base quality scores in order to correct sequencing errors and other experimental artifacts.

Prerequisites

Steps

Analyze patterns of covariation in the sequence dataset
Do a second pass to analyze covariation remaining after recalibration
Generate before/after plots
Apply the recalibration to your sequence data

1. Analyze patterns of covariation in the sequence dataset

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \
    -T BaseRecalibrator \
    -R reference.fa \
    -I input_reads.bam \
    -L 20 \
    -knownSites dbsnp.vcf \
    -knownSites gold_indels.vcf \
    -o recal_data.table

Expected Result

This creates a GATKReport file called recal_data.table containing several tables. These tables contain the covariation data that will be used in a later step to recalibrate the base qualities of your sequence data.

It is imperative that you provide the program with a set of known sites, otherwise it will refuse to run. The known sites are used to build the covariation model and estimate empirical base qualities. For details on what to do if there are no known sites available for your organism of study, please see the online GATK documentation.

Note that -L 20 is used here and in the next steps to restrict analysis to only chromosome 20 in the b37 human genome reference build. To run against a different reference, you may need to change the name of the contig according to the nomenclature used in your reference.

2. Do a second pass to analyze covariation remaining after recalibration

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \
    -T BaseRecalibrator \
    -R reference.fa \
    -I input_reads.bam \
    -L 20 \
    -knownSites dbsnp.vcf \
    -knownSites gold_indels.vcf \
    -BQSR recal_data.table \
    -o post_recal_data.table

Expected Result

This creates another GATKReport file, which we will use in the next step to generate plots. Note the use of the -BQSR flag, which tells the GATK engine to perform on-the-fly recalibration based on the first recalibration data table.

3. Generate before/after plots

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \
    -T AnalyzeCovariates \
    -R reference.fa \
    -L 20 \
    -before recal_data.table \
    -after post_recal_data.table \
    -plots recalibration_plots.pdf

Expected Result

This generates a document called recalibration_plots.pdf containing plots that show how the reported base qualities match up to the empirical qualities calculated by the BaseRecalibrator. Comparing the before and after plots allows you to check the effect of the base recalibration process before you actually apply the recalibration to your sequence data. For details on how to interpret the base recalibration plots, please see the online GATK documentation.

4. Apply the recalibration to your sequence data

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \
    -T PrintReads \
    -R reference.fa \
    -I input_reads.bam \
    -L 20 \
    -BQSR recal_data.table \
    -o recal_reads.bam

Expected Result

This creates a file called recal_reads.bam containing all the original reads, but now with exquisitely accurate base substitution, insertion and deletion quality scores. By default, the original quality scores are discarded in order to keep the file size down. However, you have the option to retain them by adding the flag –emit_original_quals to the PrintReads command, in which case the original qualities will also be written in the file, tagged OQ.

Notice how this step uses a very simple tool, PrintReads, to apply the recalibration. What’s happening here is that we are loading in the original sequence data, having the GATK engine recalibrate the base qualities on-the-fly thanks to the -BQSR flag (as explained earlier), and just using PrintReads to write out the resulting data to the new file.

↧

Tutorial data not accessible anymore?

September 18, 2017, 3:50 pm

≫ Next: Picard IlluminaBaseCallsToSam - clocs file issue - more elements than expected

≪ Previous: (howto) Recalibrate base quality scores = run BQSR

Hi,
I am trying to follow the tutorial (https://software.broadinstitute.org/gatk/documentation/topic?name=tutorials) but unfortunately the data are not accessible anymore. I tried to get some via https://drive.google.com/drive/folders/1dS9wr_h6nh3BhPp1KKTGyXJS3upTw4j0 but it seems, once in igv that chromosome 20 is devoid of data..

java : java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

IGV version 2.3.97 (157)

Linux Ubuntu 16.04

Thank you very much,

↧

Picard IlluminaBaseCallsToSam - clocs file issue - more elements than expected

August 15, 2017, 8:38 am

≫ Next: Should CheckIlluminaDirectory be able to handle a non-standard read structure

≪ Previous: Tutorial data not accessible anymore?

I am attempting to demultiplex a lane of Illumina HiSeq2500 data using Picard IlluminaBasecallsToSam (v2.5, Java(TM) SE Runtime Environment (build 1.8.0_20-b26)). This tool fails to complete and states that

picard.PicardException: Read the number of expected bins( 65600) but still had more elements in file( /Illumina/Basecalls/170808_SN218_0895_AHN3VWBCXY/Data/Intensities/L001/s_1_1107.clocs)

I've also run Picard CheckIlluminaDirectory and everything turns out to be fine. Also, I can successfully demultiplex lane 2 from this run that has the same read-structure. I suspected that a file was corrupted during network transfer so I re-ran RTA but Picard IlluminaBasecallsToSam still has the same result.

I'd appreciate any idea about what could be causing this error.

Complete error below:

[Tue Aug 15 11:01:59 EDT 2017] picard.illumina.IlluminaBasecallsToSam BASECALLS_DIR=/Illumina/Basecalls/170808_SN218_0895_AHN3VWBCXY/Data/Intensities/BaseCalls BARCODES_DIR=/Project/Capture LANE=1 RUN_BARCODE=HN3VWBCXY170810 READ_GROUP_ID=HN3VWBCXY170810 SEQUENCING_CENTER=ABC READ_STRUCTURE=98T8B6M8B98T LIBRARY_PARAMS=/Project/DemultCapture.txt NUM_PROCESSORS=6 IGNORE_UNEXPECTED_BARCODES=true TMP_DIR=[/Project/TMP]    PLATFORM=illumina ADAPTERS_TO_CHECK=[INDEXED, DUAL_INDEXED, NEXTERA_V2, FLUIDIGM] FORCE_GC=true APPLY_EAMSS_FILTER=true MAX_READS_IN_RAM_PER_TILE=1200000 MINIMUM_QUALITY=2 INCLUDE_NON_PF_READS=true MOLECULAR_INDEX_TAG=RX MOLECULAR_INDEX_BASE_QUALITY_TAG=QX VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue Aug 15 11:01:59 EDT 2017] Executing as rb@rcsgc22 on Linux 2.6.32-358.2.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26; Picard version: 2.5.0(2c370988aefe41f579920c8a6a678a201c5261c1_1466708365)
INFO    2017-08-15 11:04:47 IlluminaBasecallsToSam  DONE_READING STRUCTURE IS 98T8B6M8B98T
INFO    2017-08-15 11:05:36 IlluminaBasecallsConverter  Read     1,000,000 records.  Elapsed time: 00:03:36s.  Time for last 1,000,000:   35s.  Last read position: */*
INFO    2017-08-15 11:06:14 IlluminaBasecallsConverter  Read     2,000,000 records.  Elapsed time: 00:04:14s.  Time for last 1,000,000:   37s.  Last read position: */*
INFO    2017-08-15 11:06:32 IlluminaBasecallsConverter  Read     3,000,000 records.  Elapsed time: 00:04:32s.  Time for last 1,000,000:   18s.  Last read position: */*
INFO    2017-08-15 11:07:00 IlluminaBasecallsConverter  Read     4,000,000 records.  Elapsed time: 00:04:59s.  Time for last 1,000,000:   27s.  Last read position: */*
INFO    2017-08-15 11:07:00 IlluminaBasecallsConverter  Before explicit GC, Runtime.totalMemory()=6693060608
INFO    2017-08-15 11:07:01 IlluminaBasecallsConverter  After explicit GC, Runtime.totalMemory()=6512705536
INFO    2017-08-15 11:07:22 IlluminaBasecallsConverter  Read     5,000,000 records.  Elapsed time: 00:05:22s.  Time for last 1,000,000:   22s.  Last read position: */*
INFO    2017-08-15 11:07:33 IlluminaBasecallsConverter  Read     6,000,000 records.  Elapsed time: 00:05:33s.  Time for last 1,000,000:   11s.  Last read position: */*
INFO    2017-08-15 11:07:41 IlluminaBasecallsConverter  Read     7,000,000 records.  Elapsed time: 00:05:41s.  Time for last 1,000,000:    8s.  Last read position: */*
INFO    2017-08-15 11:07:56 IlluminaBasecallsConverter  Read     8,000,000 records.  Elapsed time: 00:05:56s.  Time for last 1,000,000:   14s.  Last read position: */*
INFO    2017-08-15 11:08:09 IlluminaBasecallsConverter  Read     9,000,000 records.  Elapsed time: 00:06:09s.  Time for last 1,000,000:   12s.  Last read position: */*
INFO    2017-08-15 11:08:24 IlluminaBasecallsConverter  Read    10,000,000 records.  Elapsed time: 00:06:24s.  Time for last 1,000,000:   15s.  Last read position: */*
INFO    2017-08-15 11:08:35 IlluminaBasecallsConverter  Read    11,000,000 records.  Elapsed time: 00:06:35s.  Time for last 1,000,000:   10s.  Last read position: */*
INFO    2017-08-15 11:08:48 IlluminaBasecallsConverter  Read    12,000,000 records.  Elapsed time: 00:06:48s.  Time for last 1,000,000:   12s.  Last read position: */*
INFO    2017-08-15 11:08:58 IlluminaBasecallsConverter  Read    13,000,000 records.  Elapsed time: 00:06:58s.  Time for last 1,000,000:   10s.  Last read position: */*
INFO    2017-08-15 11:09:06 IlluminaBasecallsConverter  Read    14,000,000 records.  Elapsed time: 00:07:06s.  Time for last 1,000,000:    8s.  Last read position: */*
INFO    2017-08-15 11:09:16 IlluminaBasecallsConverter  Read    15,000,000 records.  Elapsed time: 00:07:16s.  Time for last 1,000,000:   10s.  Last read position: */*
INFO    2017-08-15 11:09:25 IlluminaBasecallsConverter  Read    16,000,000 records.  Elapsed time: 00:07:24s.  Time for last 1,000,000:    8s.  Last read position: */*
INFO    2017-08-15 11:09:38 IlluminaBasecallsConverter  Read    17,000,000 records.  Elapsed time: 00:07:38s.  Time for last 1,000,000:   13s.  Last read position: */*
INFO    2017-08-15 11:09:57 IlluminaBasecallsConverter  Write     1,000,000 records.  Elapsed time: 00:07:57s.  Time for last 1,000,000:   13s.  Last read position: */*
INFO    2017-08-15 11:10:03 IlluminaBasecallsConverter  Read    18,000,000 records.  Elapsed time: 00:08:03s.  Time for last 1,000,000:   25s.  Last read position: */*
INFO    2017-08-15 11:10:08 IlluminaBasecallsConverter  Write     2,000,000 records.  Elapsed time: 00:08:07s.  Time for last 1,000,000:   10s.  Last read position: */*
INFO    2017-08-15 11:10:17 IlluminaBasecallsConverter  Write     3,000,000 records.  Elapsed time: 00:08:17s.  Time for last 1,000,000:    9s.  Last read position: */*
INFO    2017-08-15 11:10:27 IlluminaBasecallsConverter  Read    19,000,000 records.  Elapsed time: 00:08:26s.  Time for last 1,000,000:   23s.  Last read position: */*
INFO    2017-08-15 11:10:29 IlluminaBasecallsConverter  Write     4,000,000 records.  Elapsed time: 00:08:29s.  Time for last 1,000,000:   12s.  Last read position: */*
INFO    2017-08-15 11:10:39 IlluminaBasecallsConverter  Write     5,000,000 records.  Elapsed time: 00:08:38s.  Time for last 1,000,000:    9s.  Last read position: */*
INFO    2017-08-15 11:10:48 IlluminaBasecallsConverter  Write     6,000,000 records.  Elapsed time: 00:08:48s.  Time for last 1,000,000:    9s.  Last read position: */*
INFO    2017-08-15 11:10:48 IlluminaBasecallsConverter  Read    20,000,000 records.  Elapsed time: 00:08:48s.  Time for last 1,000,000:   21s.  Last read position: */*
INFO    2017-08-15 11:10:59 IlluminaBasecallsConverter  Write     7,000,000 records.  Elapsed time: 00:08:59s.  Time for last 1,000,000:   10s.  Last read position: */*
INFO    2017-08-15 11:11:08 IlluminaBasecallsConverter  Write     8,000,000 records.  Elapsed time: 00:09:08s.  Time for last 1,000,000:    9s.  Last read position: */*
INFO    2017-08-15 11:11:14 IlluminaBasecallsConverter  Read    21,000,000 records.  Elapsed time: 00:09:14s.  Time for last 1,000,000:   25s.  Last read position: */*
INFO    2017-08-15 11:11:18 IlluminaBasecallsConverter  Write     9,000,000 records.  Elapsed time: 00:09:17s.  Time for last 1,000,000:    9s.  Last read position: */*
INFO    2017-08-15 11:11:27 IlluminaBasecallsConverter  Write    10,000,000 records.  Elapsed time: 00:09:27s.  Time for last 1,000,000:    9s.  Last read position: */*
INFO    2017-08-15 11:11:36 IlluminaBasecallsConverter  Read    22,000,000 records.  Elapsed time: 00:09:36s.  Time for last 1,000,000:   22s.  Last read position: */*
INFO    2017-08-15 11:11:37 IlluminaBasecallsConverter  Write    11,000,000 records.  Elapsed time: 00:09:36s.  Time for last 1,000,000:    9s.  Last read position: */*
INFO    2017-08-15 11:11:46 IlluminaBasecallsConverter  Write    12,000,000 records.  Elapsed time: 00:09:45s.  Time for last 1,000,000:    8s.  Last read position: */*
INFO    2017-08-15 11:11:55 IlluminaBasecallsConverter  Write    13,000,000 records.  Elapsed time: 00:09:55s.  Time for last 1,000,000:    9s.  Last read position: */*
INFO    2017-08-15 11:11:56 IlluminaBasecallsConverter  Read    23,000,000 records.  Elapsed time: 00:09:56s.  Time for last 1,000,000:   20s.  Last read position: */*
INFO    2017-08-15 11:12:00 IlluminaBasecallsConverter  Before explicit GC, Runtime.totalMemory()=7281311744
INFO    2017-08-15 11:12:00 IlluminaBasecallsConverter  After explicit GC, Runtime.totalMemory()=7281311744
Exception in thread "pool-1-thread-6" ERROR 2017-08-15 11:12:04 IlluminaBasecallsConverter  Failure encountered in worker thread; attempting to shut down remaining worker threads and terminate ...
java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:502)
    at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator.awaitWorkComplete(IlluminaBasecallsConverter.java:709)
    at picard.illumina.IlluminaBasecallsConverter.doTileProcessing(IlluminaBasecallsConverter.java:318)
    at picard.illumina.IlluminaBasecallsToSam.doWork(IlluminaBasecallsToSam.java:230)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
picard.PicardException: Read the number of expected bins( 65600) but still had more elements in file( /Illumina/Basecalls/170808_SN218_0895_AHN3VWBCXY/Data/Intensities/L001/s_1_1107.clocs)
    at picard.illumina.parser.readers.ClocsFileReader.hasNext(ClocsFileReader.java:150)
    at picard.illumina.parser.PosParser$1.hasNext(PosParser.java:98)
    at picard.illumina.parser.PerTileParser.hasNext(PerTileParser.java:120)
    at picard.illumina.parser.PerTileParser.maybeAdvance(PerTileParser.java:99)
    at picard.illumina.parser.PerTileParser.next(PerTileParser.java:109)
    at picard.illumina.parser.IlluminaDataProvider.next(IlluminaDataProvider.java:133)
    at picard.illumina.IlluminaBasecallsConverter$TileReader.process(IlluminaBasecallsConverter.java:555)
    at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$2.run(IlluminaBasecallsConverter.java:657)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-1-thread-1" java.lang.ArrayIndexOutOfBoundsException
    at htsjdk.samtools.util.BlockCompressedOutputStream.deflateBlock(BlockCompressedOutputStream.java:357)
    at htsjdk.samtools.util.BlockCompressedOutputStream.write(BlockCompressedOutputStream.java:250)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:206)
    at htsjdk.samtools.util.BinaryCodec.writeByteBuffer(BinaryCodec.java:174)
    at htsjdk.samtools.util.BinaryCodec.writeInt(BinaryCodec.java:220)
    at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:132)
    at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:134)
    at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:190)
    at picard.illumina.IlluminaBasecallsToSam$SAMFileWriterWrapper.write(IlluminaBasecallsToSam.java:483)
    at picard.illumina.IlluminaBasecallsToSam$SAMFileWriterWrapper.write(IlluminaBasecallsToSam.java:472)
    at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$3.run(IlluminaBasecallsConverter.java:831)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Exception in thread "pool-1-thread-5" java.lang.ArrayIndexOutOfBoundsException
    at htsjdk.samtools.util.BlockCompressedOutputStream.deflateBlock(BlockCompressedOutputStream.java:357)
    at htsjdk.samtools.util.BlockCompressedOutputStream.write(BlockCompressedOutputStream.java:250)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:206)
    at htsjdk.samtools.util.BinaryCodec.writeByteBuffer(BinaryCodec.java:174)
    at htsjdk.samtools.util.BinaryCodec.writeInt(BinaryCodec.java:220)
    at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:131)
    at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:134)
    at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:190)
    at picard.illumina.IlluminaBasecallsToSam$SAMFileWriterWrapper.write(IlluminaBasecallsToSam.java:483)
    at picard.illumina.IlluminaBasecallsToSam$SAMFileWriterWrapper.write(IlluminaBasecallsToSam.java:472)
    at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$3.run(IlluminaBasecallsConverter.java:831)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[Tue Aug 15 11:12:06 EDT 2017] picard.illumina.IlluminaBasecallsToSam done. Elapsed time: 10.12 minutes.
Runtime.totalMemory()=7301234688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Failure encountered in worker thread; see log for details.
    at picard.illumina.IlluminaBasecallsConverter.doTileProcessing(IlluminaBasecallsConverter.java:321)
    at picard.illumina.IlluminaBasecallsToSam.doWork(IlluminaBasecallsToSam.java:230)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Exception in thread "pool-1-thread-3" picard.PicardException: IOException opening cluster binary file /Illumina/Basecalls/170808_SN218_0895_AHN3VWBCXY/Data/Intensities/BaseCalls/L001/s_1_1109.filter
    at picard.illumina.parser.readers.MMapBackedIteratorFactory.getBuffer(MMapBackedIteratorFactory.java:119)
    at picard.illumina.parser.readers.MMapBackedIteratorFactory.getByteIterator(MMapBackedIteratorFactory.java:66)
    at picard.illumina.parser.readers.FilterFileReader.<init>(FilterFileReader.java:68)
    at picard.illumina.parser.FilterParser$1.<init>(FilterParser.java:55)
    at picard.illumina.parser.FilterParser.makeTileIterator(FilterParser.java:54)
    at picard.illumina.parser.PerTileParser.advanceTile(PerTileParser.java:80)
    at picard.illumina.parser.PerTileParser.hasNext(PerTileParser.java:121)
    at picard.illumina.parser.IlluminaDataProvider.hasNext(IlluminaDataProvider.java:104)
    at picard.illumina.IlluminaBasecallsConverter$TileReader.process(IlluminaBasecallsConverter.java:554)
    at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$2.run(IlluminaBasecallsConverter.java:657)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedByInterruptException
    at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
    at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:314)
    at picard.illumina.parser.readers.MMapBackedIteratorFactory.getBuffer(MMapBackedIteratorFactory.java:113)
    ... 12 more
Exception in thread "pool-1-thread-2" picard.PicardException: IOException opening cluster binary file /Illumina/Basecalls/170808_SN218_0895_AHN3VWBCXY/Data/Intensities/L001/s_1_1110.clocs
    at picard.illumina.parser.readers.MMapBackedIteratorFactory.getBuffer(MMapBackedIteratorFactory.java:119)
    at picard.illumina.parser.readers.MMapBackedIteratorFactory.getByteIterator(MMapBackedIteratorFactory.java:66)
    at picard.illumina.parser.readers.ClocsFileReader.<init>(ClocsFileReader.java:85)
    at picard.illumina.parser.PosParser.makeTileIterator(PosParser.java:83)
    at picard.illumina.parser.PerTileParser.advanceTile(PerTileParser.java:80)
    at picard.illumina.parser.PerTileParser.hasNext(PerTileParser.java:121)
    at picard.illumina.parser.PerTileParser.maybeAdvance(PerTileParser.java:99)
    at picard.illumina.parser.PerTileParser.next(PerTileParser.java:109)
    at picard.illumina.parser.IlluminaDataProvider.next(IlluminaDataProvider.java:133)
    at picard.illumina.IlluminaBasecallsConverter$TileReader.process(IlluminaBasecallsConverter.java:555)
    at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$2.run(IlluminaBasecallsConverter.java:657)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedByInterruptException
    at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
    at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:314)
    at picard.illumina.parser.readers.MMapBackedIteratorFactory.getBuffer(MMapBackedIteratorFactory.java:113)
    ... 13 more
Exception in thread "pool-1-thread-7" picard.PicardException: Error reading from file /Project/CorrectCapture/s_1_1111_barcode.txt
    at picard.util.BasicInputParser.readNextLine(BasicInputParser.java:120)
    at picard.util.AbstractInputParser.advance(AbstractInputParser.java:85)
    at picard.util.AbstractInputParser.advance(AbstractInputParser.java:44)
    at htsjdk.samtools.util.AbstractIterator.hasNext(AbstractIterator.java:44)
    at picard.illumina.parser.readers.BarcodeFileReader.hasNext(BarcodeFileReader.java:42)
    at picard.illumina.parser.BarcodeParser$BarcodeDataIterator.hasNext(BarcodeParser.java:69)
    at picard.illumina.parser.PerTileParser.hasNext(PerTileParser.java:120)
    at picard.illumina.parser.IlluminaDataProvider.hasNext(IlluminaDataProvider.java:104)
    at picard.illumina.IlluminaBasecallsConverter$TileReader.process(IlluminaBasecallsConverter.java:554)
    at picard.illumina.IlluminaBasecallsConverter$TileReadAggregator$2.run(IlluminaBasecallsConverter.java:657)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: htsjdk.samtools.util.RuntimeIOException: java.nio.channels.ClosedByInterruptException
    at htsjdk.samtools.util.BufferedLineReader.readLine(BufferedLineReader.java:74)
    at picard.util.BasicInputParser.readNextLine(BasicInputParser.java:103)
    ... 12 more
Caused by: java.nio.channels.ClosedByInterruptException
    at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:163)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.BufferedReader.fill(BufferedReader.java:161)
    at java.io.BufferedReader.readLine(BufferedReader.java:324)
    at java.io.BufferedReader.readLine(BufferedReader.java:389)
    at htsjdk.samtools.util.BufferedLineReader.readLine(BufferedLineReader.java:70)
    ... 13 more

↧

Should CheckIlluminaDirectory be able to handle a non-standard read structure

August 22, 2017, 12:42 pm

≫ Next: Tutorial files provenance: ASHG15

≪ Previous: Picard IlluminaBaseCallsToSam - clocs file issue - more elements than expected

I have a flowcell (from a 10x library) in which the 'natural' read structure is 178T8B14B5T. However, I want to interpret the flowcell as 178T8B14T5S, so that is what I passed to CheckIlluminaDirectory. I get the exception below. It looks like the code is trying to check all the cycles, including the skips. However, CbclReader.outputCycles is initialized only with enough elements to hold the non-skip cycles. Is this a bug? Or is it wrong to pass a read structure with skips in it?

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 200
at picard.illumina.parser.readers.CbclReader.readSurfaceTile(CbclReader.java:119)
at picard.illumina.parser.readers.CbclReader.(CbclReader.java:102)
at picard.illumina.CheckIlluminaDirectory.doWork(CheckIlluminaDirectory.java:170)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)

↧

Tutorial files provenance: ASHG15

January 11, 2016, 10:30 am

≫ Next: Not enough memory for depth of Coverage as single thread

≪ Previous: Should CheckIlluminaDirectory be able to handle a non-standard read structure

This document is intended to be a record of how the tutorial files were prepared for the AHSG 2015 hands-on workshop.

Reference genome

This produces a 64 Mb file (uncompressed) which is small enough for our purposes, so we don't need to truncate it further, simplifying future data file preparations.

# Extract just chromosome 20
samtools faidx /humgen/gsa-hpprojects/GATK/bundle/current/b37/human_g1k_v37.fasta 20 > human_g1k_b37_20.fasta

# Create the reference index
samtools faidx human_g1k_b37_20.fasta

# Create sequence dictionary
java -jar $PICARD CreateSequenceDictionary R=human_g1k_b37_20.fasta O=human_g1k_b37_20.dict

# Recap files
-rw-rw-r-- 1 vdauwera wga      164 Oct  1 14:56 human_g1k_b37_20.dict
-rw-rw-r-- 1 vdauwera wga 64075950 Oct  1 14:41 human_g1k_b37_20.fasta
-rw-rw-r-- 1 vdauwera wga       20 Oct  1 14:46 human_g1k_b37_20.fasta.fai

Sequence data

We are using the 2nd generation CEU Trio of NA12878 and her husband and child in a WGS dataset produced at Broad with files names after the library preps, Solexa-xxxxxx.bam.

1. Extract just chromosome 20:10M-20M bp and filter out chimeric pairs with -rf BadMate

java -jar $GATK -T PrintReads -R /path/to/bundle/current/b37/human_g1k_v37_decoy.fasta -I /path/to/Solexa-272221.bam -o NA12877_wgs_20_10M20M.bam -L 20:10000000-20000000 -rf BadMate

java -jar $GATK -T PrintReads -R /path/to/bundle/current/b37/human_g1k_v37_decoy.fasta -I /path/to/Solexa-272222.bam -o NA12878_wgs_20_10M20M.bam -L 20:10000000-20000000 -rf BadMate

java -jar $GATK -T PrintReads -R /path/to/bundle/current/b37/human_g1k_v37_decoy.fasta -I /path/to/Solexa-272228.bam -o NA12882_wgs_20_10M20M.bam -L 20:10000000-20000000 -rf BadMate

# Recap files
-rw-rw-r-- 1 vdauwera wga     36240 Oct  2 11:55 NA12877_wgs_20_10M20M.bai
-rw-rw-r-- 1 vdauwera wga 512866085 Oct  2 11:55 NA12877_wgs_20_10M20M.bam
-rw-rw-r-- 1 vdauwera wga     36176 Oct  2 11:53 NA12878_wgs_20_10M20M.bai
-rw-rw-r-- 1 vdauwera wga 502282846 Oct  2 11:53 NA12878_wgs_20_10M20M.bam
-rw-rw-r-- 1 vdauwera wga     36464 Oct  2 12:00 NA12882_wgs_20_10M20M.bai
-rw-rw-r-- 1 vdauwera wga 505001668 Oct  2 12:00 NA12882_wgs_20_10M20M.bam

2. Extract headers and edit manually to remove all contigs except 20 and sanitize internal filepaths

samtools view -H NA12877_wgs_20_10M20M.bam > NA12877_header.txt

samtools view -H NA12878_wgs_20_10M20M.bam > NA12878_header.txt

samtools view -H NA12882_wgs_20_10M20M.bam > NA12882_header.txt

Manual editing is not represented here; basically just delete unwanted contig SQ lines and remove identifying info from internal filepaths.

3. Flip BAM to SAM

java -jar $PICARD SamFormatConverter I=NA12877_wgs_20_10M20M.bam O=NA12877_wgs_20_10M20M.sam

java -jar $PICARD SamFormatConverter I=NA12878_wgs_20_10M20M.bam O=NA12878_wgs_20_10M20M.sam

java -jar $PICARD SamFormatConverter I=NA12882_wgs_20_10M20M.bam O=NA12882_wgs_20_10M20M.sam

#Recap files
-rw-rw-r-- 1 vdauwera wga 1694169101 Oct  2 12:28 NA12877_wgs_20_10M20M.sam
-rw-rw-r-- 1 vdauwera wga 1661483309 Oct  2 12:30 NA12878_wgs_20_10M20M.sam
-rw-rw-r-- 1 vdauwera wga 1696553456 Oct  2 12:31 NA12882_wgs_20_10M20M.sam

4. Re-header the SAMs

java -jar $PICARD ReplaceSamHeader I=NA12877_wgs_20_10M20M.sam O=NA12877_wgs_20_10M20M_RH.sam HEADER=NA12877_header.txt

java -jar $PICARD ReplaceSamHeader I=NA12878_wgs_20_10M20M.sam O=NA12878_wgs_20_10M20M_RH.sam HEADER=NA12878_header.txt

java -jar $PICARD ReplaceSamHeader I=NA12882_wgs_20_10M20M.sam O=NA12882_wgs_20_10M20M_RH.sam HEADER=NA12882_header.txt

# Recap files
-rw-rw-r-- 1 vdauwera wga 1694153715 Oct  2 12:35 NA12877_wgs_20_10M20M_RH.sam
-rw-rw-r-- 1 vdauwera wga 1661467923 Oct  2 12:37 NA12878_wgs_20_10M20M_RH.sam
-rw-rw-r-- 1 vdauwera wga 1696538104 Oct  2 12:38 NA12882_wgs_20_10M20M_RH.sam

5. Sanitize the SAMs to get rid of MATE_NOT_FOUND errors

java -jar $PICARD RevertSam I=NA12877_wgs_20_10M20M_RH.sam O=NA12877_wgs_20_10M20M_RS.sam SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=false REMOVE_DUPLICATE_INFORMATION=false REMOVE_ALIGNMENT_INFORMATION=false ATTRIBUTE_TO_CLEAR=null SANITIZE=true MAX_DISCARD_FRACTION=0.001

java -jar $PICARD RevertSam I=NA12878_wgs_20_10M20M_RH.sam O=NA12878_wgs_20_10M20M_RS.sam SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=false REMOVE_DUPLICATE_INFORMATION=false REMOVE_ALIGNMENT_INFORMATION=false ATTRIBUTE_TO_CLEAR=null SANITIZE=true MAX_DISCARD_FRACTION=0.001

java -jar $PICARD RevertSam I=NA12882_wgs_20_10M20M_RH.sam O=NA12882_wgs_20_10M20M_RS.sam SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=false REMOVE_DUPLICATE_INFORMATION=false REMOVE_ALIGNMENT_INFORMATION=false ATTRIBUTE_TO_CLEAR=null SANITIZE=true MAX_DISCARD_FRACTION=0.001

# Recap files
-rw-rw-r-- 1 vdauwera wga 1683827201 Oct  2 12:45 NA12877_wgs_20_10M20M_RS.sam
-rw-rw-r-- 1 vdauwera wga 1652093793 Oct  2 12:49 NA12878_wgs_20_10M20M_RS.sam
-rw-rw-r-- 1 vdauwera wga 1688143091 Oct  2 12:54 NA12882_wgs_20_10M20M_RS.sam

6. Sort the SAMs, convert back to BAM and create index

java -jar $PICARD SortSam I=NA12877_wgs_20_10M20M_RS.sam O=NA12877_wgs_20_10M20M_V.bam SORT_ORDER=coordinate CREATE_INDEX=TRUE

java -jar $PICARD SortSam I=NA12878_wgs_20_10M20M_RS.sam O=NA12878_wgs_20_10M20M_V.bam SORT_ORDER=coordinate CREATE_INDEX=TRUE

java -jar $PICARD SortSam I=NA12882_wgs_20_10M20M_RS.sam O=NA12882_wgs_20_10M20M_V.bam SORT_ORDER=coordinate CREATE_INDEX=TRUE

#recap files
-rw-rw-r-- 1 vdauwera wga     35616 Oct  2 13:08 NA12877_wgs_20_10M20M_V.bai
-rw-rw-r-- 1 vdauwera wga 508022682 Oct  2 13:08 NA12877_wgs_20_10M20M_V.bam
-rw-rw-r-- 1 vdauwera wga     35200 Oct  2 13:06 NA12878_wgs_20_10M20M_V.bai
-rw-rw-r-- 1 vdauwera wga 497742417 Oct  2 13:06 NA12878_wgs_20_10M20M_V.bam
-rw-rw-r-- 1 vdauwera wga     35632 Oct  2 13:04 NA12882_wgs_20_10M20M_V.bai
-rw-rw-r-- 1 vdauwera wga 500446729 Oct  2 13:04 NA12882_wgs_20_10M20M_V.bam

7. Validate BAMs; should all output "No errors found"

java -jar $PICARD ValidateSamFile I=NA12877_wgs_20_10M20M_V.bam M=SUMMARY

java -jar $PICARD ValidateSamFile I=NA12878_wgs_20_10M20M_V.bam M=SUMMARY

java -jar $PICARD ValidateSamFile I=NA12882_wgs_20_10M20M_V.bam M=SUMMARY

↧

Not enough memory for depth of Coverage as single thread

September 19, 2017, 2:35 am

≫ Next: When I call Indels from my vcf file using GATK analysis tools I ger an Error!

≪ Previous: Tutorial files provenance: ASHG15

Hi,

if I run depthOfCoverage without -nt it crashes somewhere in Chromosome 11 with

ERROR MESSAGE: An error occurred because you did not provide enough memory to run this program. You can use the -Xmx argument (before the -jar argument) to adjust the maximum heap size provided to Java. Note that this is a JVM argument, not a GATK argument.

This can't be the real reason since it only uses around 15% of our 512 Gb.

If I try with -nt 48 it finishes even though it loops over the last interval for about 90 minutes before it really stops. And I can't produce the interval files. So it is not an option for us.

I tried with GATK 3.7 and 3.8 on a local Server. The NextSeq run is composed of 25 panel diagnosics and 3 whole Exome samples.

Is there something we can try or is there information if this happens when using too many samples or to large bam files? Does depthOfCoverage limits itself? Why is there a difference between running with -nt and without?

Thanks in advance,
Daniel

↧

ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version nightly-2016-09-23-gfade77f):

ERROR

ERROR This means that one or more arguments or inputs in your command are incorrect.

ERROR The error message below tells you what is the problem.

ERROR

ERROR If the problem is an invalid argument, please check the online documentation guide

ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.

ERROR

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://www.broadinstitute.org/gatk

ERROR

ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.

ERROR

ERROR MESSAGE: Unable to read index file, for input source: TR_713.GATK.var.g.vcf.idx

ERROR ------------------------------------------------------------------------------------------

ERROR --

ERROR stack trace

ERROR ------------------------------------------------------------------------------------------

ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):

ERROR

ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.

ERROR If not, please post the error message, with stack trace, to the GATK forum.

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://www.broadinstitute.org/gatk

ERROR

ERROR MESSAGE: Dictionary cannot have size zero

ERROR ------------------------------------------------------------------------------------------

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007fe6c2da3f8b, pid=1932, tid=140629048932096

JRE version: Java(TM) SE Runtime Environment (8.0_65-b17) (build 1.8.0_65-b17)

Java VM: Java HotSpot(TM) 64-Bit Server VM (25.65-b01 mixed mode linux-amd64 )

Problematic frame:

V [libjvm.so+0x64bf8b] InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x16b

Core dump written. Default location: core or core.1932

An error report file with more information is saved as:

hs_err_pid1932.log

If you would like to submit a bug report, please visit:

Scenario:

Here's what you need to provide:

Here's how you create a snippet file:

And finally, here's how you send us the files:

We will get back to you --hopefully with a bug fix!-- as soon as we can.

Objective

Prerequisites

Steps

1. Analyze patterns of covariation in the sequence dataset

Action

Expected Result

2. Do a second pass to analyze covariation remaining after recalibration

Action

Expected Result

3. Generate before/after plots

Action

Expected Result

4. Apply the recalibration to your sequence data

Action

Expected Result

Reference genome

Sequence data

1. Extract just chromosome 20:10M-20M bp and filter out chimeric pairs with -rf BadMate

2. Extract headers and edit manually to remove all contigs except 20 and sanitize internal filepaths

3. Flip BAM to SAM

4. Re-header the SAMs

5. Sanitize the SAMs to get rid of MATE_NOT_FOUND errors

6. Sort the SAMs, convert back to BAM and create index

7. Validate BAMs; should all output "No errors found"

V [libjvm.so+0x64bf8b] InstanceKlass::oop_follow_contents(ParCompactionManager, oopDesc)+0x16b