Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all 12345 articles
Browse latest View live

Data Problem in GATK4

$
0
0

I was trying to run GATK4-Spark tools ReadsPipelineSpark , while the WES data (which about 1Gb) here is no error ,but when I try to run WGS( which about 60Gb ) there is always notice "We're supposed to be aligning paired reads, but there are an odd number of them". Is there any way I can check which part of my ubam file have this unpaired reads ? Because the data I use can run successfully without Spark


Following Best practice, I have ERROR:MATE_NOT_FOUND in all my files

$
0
0

I am running Mutect2 on about 100 pairs of tumor-normal samples of whole exome sequencing. I am using GATK4.0.9.0. The pipeline used the recommended best practice pipeline:
1. FastqToSam,
2. MarkIlluminaAdapters,
3. SamToFastq, bwa (mem -M -t4),
4. MergeBamAlignment (CREATE_INDEX=true, ADD_MATE_CIGAR=true, CLIP_ADAPTERS=false, CLIP_OVERLAPPING_READS=true, INCLUDE_SECONDARY_ALIGNMENTS=true, MAX_INSERTIONS_OR_DELETIONS=-1, PRIMARY_ALIGNMENT_STRATEGY=MostDistant, ATTRIBUTES_TO_RETAIN=XS )
5. MarkDuplicates (VALIDATION_STRINGENCY SILENT, OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500, CREATE_INDEX=true)
6. BaseRecalibrator (using 100bp-padded interval bed file of the exome design)
7. ApplyBQSR
8. generate the PON VCF successfully using all normal samples.
9. When I tried to run Mutect2 to call the somatic short variants, all pairs went through the analysis except one pair which failed to pass the same region after several trials. This part of the log after one failed trail:
...
19:12:18.761 INFO ProgressMeter - chr1:41708411 5.8 5020 858.5
19:12:28.648 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 262.425723624
19:12:28.649 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 13.26 sec
19:12:29.035 INFO Mutect2 - Shutting down engine
[October 10, 2018 7:12:29 PM EDT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 6.09 minutes.
Runtime.totalMemory()=2715287552
java.lang.IllegalArgumentException: readMaxLength must be > 0 but got 0
...

We I tested the BAM files using ValidateSamFile, I got these results for tumor and normal samples:
Error Type Count
ERROR:MATE_NOT_FOUND 3622058

Error Type Count
ERROR:MATE_NOT_FOUND 2773161

I used the FixMateInformation tool to fix the files, re-indexed, and re-ran Mutect2 successfully but I tested all BAM files and found them carrying the same error! Why did this happen? is it a bug or did I do something wrong? Why did one pair fail to go through the variant calling step while all other pairs did not? Does this affect the quality of variant calls?

Thank you

SAMException of Query asks for data past end of contig occured in mutect2

$
0
0

Dear team

Currently, I have downloaed the gatk4 (version 4.0.7.0) dockers and the "gatk4-data-processing-master" wdl coupled with the "gatk4-somatic-snvs-indels-master" wdl for somatic mutation detection. The followed command is utilized for the whole workflow with human genome reference data (version hg38);

(1) the command of "java -jar cromwell-34.jar run processing-for-variant-discovery-gatk4.wdl --inputs normal.json"
and "java -jar cromwell-34.jar run processing-for-variant-discovery-gatk4.wdl --inputs tissue.json" is called to generate bam and bai file for both cancer normal and tissue samples.
this step is successfully finished and bam and bai files are provided.

(2) the command "java -jar cromwell-34.jar run mutect2.wdl --inputs mutect2.json" is called for somatica mutation detection with normal and tissue bam and bai files as input.
unfortunately, the error of "htsjdk.samtools.SAMException: Query asks for data past end of contig" are occured in many contig of chromosome (for example, Query contig chrX start:224664443 stop:224664487 contigLength:156040895)

Someone can help me to fixed these errors, thanks a lot.

Here are those json files used in the issue

(a) the normal json file
{
"##_COMMENT1": "SAMPLE NAME AND UNMAPPED BAMS",
"PreProcessingForVariantDiscovery_GATK4.sample_name": "mytestN",
"PreProcessingForVariantDiscovery_GATK4.ref_name": "hg38",
"PreProcessingForVariantDiscovery_GATK4.flowcell_unmapped_bams_list": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/test7/normal_u
bam_list.txt",
"PreProcessingForVariantDiscovery_GATK4.unmapped_bam_suffix": ".bam",

"##COMMENT2": "REFERENCE FILES",
"PreProcessingForVariantDiscovery_GATK4.ref_dict": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens

assembly38.dict",
"PreProcessingForVariantDiscovery_GATK4.ref_fasta": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens
_assembly38.fasta",
"PreProcessingForVariantDiscovery_GATK4.ref_fasta_index": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_s
apiens_assembly38.fasta.fai",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_alt": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.alt",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_sa": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/
hg38/Homo_sapiens_assembly38.fasta.64.sa",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_amb": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.amb",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_bwt": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.bwt",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_ann": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.ann",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_pac": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.pac",

"##_COMMENT3": "KNOWN SITES RESOURCES",
"PreProcessingForVariantDiscovery_GATK4.dbSNP_vcf": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens
_assembly38.dbsnp138.sort.vcf",
"PreProcessingForVariantDiscovery_GATK4.dbSNP_vcf_index": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_s
apiens_assembly38.dbsnp138.sort.vcf.idx",
"PreProcessingForVariantDiscovery_GATK4.known_indels_sites_VCFs": [
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.known_indels.vcf.gz"
],
"PreProcessingForVariantDiscovery_GATK4.known_indels_sites_indices": [
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
],

"##_COMMENT4": "MISC PARAMETERS",
"PreProcessingForVariantDiscovery_GATK4.bwa_commandline": "bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta",
"PreProcessingForVariantDiscovery_GATK4.compression_level": 5,
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.num_cpu": "16",

"##_COMMENT5": "DOCKERS",
"PreProcessingForVariantDiscovery_GATK4.gotc_docker": "broadinstitute/genomes-in-the-cloud:2.3.1-1512499786",
"PreProcessingForVariantDiscovery_GATK4.gatk_docker": "broadinstitute/gatk:4.0.7.0",
"PreProcessingForVariantDiscovery_GATK4.python_docker": "python:2.7",

"##_COMMENT6": "PATHS",
"PreProcessingForVariantDiscovery_GATK4.gotc_path": "/usr/gitc/",
"PreProcessingForVariantDiscovery_GATK4.gatk_path": "/gatk/gatk",

"##_COMMENT7": "JAVA OPTIONS",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.MergeBamAlignment.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.MarkDuplicates.java_opt": "-Xms4000m",
"PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.java_opt_sort": "-Xms4000m",
"PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.java_opt_fix": "-Xms500m",
"PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator.java_opt": "-Xms4000m",
"PreProcessingForVariantDiscovery_GATK4.GatherBqsrReports.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.ApplyBQSR.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.GatherBamFiles.java_opt": "-Xms2000m",

"##_COMMENT8": "MEMORY ALLOCATION",
"PreProcessingForVariantDiscovery_GATK4.GetBwaVersion.mem_size": "1 GB",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.mem_size": "14 GB",
"PreProcessingForVariantDiscovery_GATK4.MergeBamAlignment.mem_size": "3500 MB",
"PreProcessingForVariantDiscovery_GATK4.MarkDuplicates.mem_size": "7 GB",
"PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.mem_size": "5000 MB",
"PreProcessingForVariantDiscovery_GATK4.CreateSequenceGroupingTSV.mem_size": "2 GB",
"PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator.mem_size": "6 GB",
"PreProcessingForVariantDiscovery_GATK4.GatherBqsrReports.mem_size": "3500 MB",
"PreProcessingForVariantDiscovery_GATK4.ApplyBQSR.mem_size": "3500 MB",
"PreProcessingForVariantDiscovery_GATK4.GatherBamFiles.mem_size": "3 GB",

"##_COMMENT9": "DISK SIZE ALLOCATION",
"PreProcessingForVariantDiscovery_GATK4.agg_small_disk": 200,
"PreProcessingForVariantDiscovery_GATK4.agg_medium_disk": 300,
"PreProcessingForVariantDiscovery_GATK4.agg_large_disk": 400,
"PreProcessingForVariantDiscovery_GATK4.flowcell_small_disk": 100,
"PreProcessingForVariantDiscovery_GATK4.flowcell_medium_disk": 200,

"##_COMMENT10": "PREEMPTIBLES",
"PreProcessingForVariantDiscovery_GATK4.preemptible_tries": 3,
"PreProcessingForVariantDiscovery_GATK4.agg_preemptible_tries": 3
}

(b) the tissue json file
{
"##_COMMENT1": "SAMPLE NAME AND UNMAPPED BAMS",
"PreProcessingForVariantDiscovery_GATK4.sample_name": "mytestT",
"PreProcessingForVariantDiscovery_GATK4.ref_name": "hg38",
"PreProcessingForVariantDiscovery_GATK4.flowcell_unmapped_bams_list": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/test7/tissue_u
bam_list.txt",
"PreProcessingForVariantDiscovery_GATK4.unmapped_bam_suffix": ".bam",

"##COMMENT2": "REFERENCE FILES",
"PreProcessingForVariantDiscovery_GATK4.ref_dict": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens

assembly38.dict",
"PreProcessingForVariantDiscovery_GATK4.ref_fasta": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens
_assembly38.fasta",
"PreProcessingForVariantDiscovery_GATK4.ref_fasta_index": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_s
apiens_assembly38.fasta.fai",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_alt": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.alt",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_sa": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/
hg38/Homo_sapiens_assembly38.fasta.64.sa",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_amb": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.amb",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_bwt": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.bwt",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_ann": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.ann",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_pac": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.pac",

"##_COMMENT3": "KNOWN SITES RESOURCES",
"PreProcessingForVariantDiscovery_GATK4.dbSNP_vcf": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens
_assembly38.dbsnp138.sort.vcf",
"PreProcessingForVariantDiscovery_GATK4.dbSNP_vcf_index": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_s
apiens_assembly38.dbsnp138.sort.vcf.idx",
"PreProcessingForVariantDiscovery_GATK4.known_indels_sites_VCFs": [
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.known_indels.vcf.gz"
],
"PreProcessingForVariantDiscovery_GATK4.known_indels_sites_indices": [
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
],

"##_COMMENT4": "MISC PARAMETERS",
"PreProcessingForVariantDiscovery_GATK4.bwa_commandline": "bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta",
"PreProcessingForVariantDiscovery_GATK4.compression_level": 5,
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.num_cpu": "16",

"##_COMMENT5": "DOCKERS",
"PreProcessingForVariantDiscovery_GATK4.gotc_docker": "broadinstitute/genomes-in-the-cloud:2.3.1-1512499786",
"PreProcessingForVariantDiscovery_GATK4.gatk_docker": "broadinstitute/gatk:4.0.7.0",
"PreProcessingForVariantDiscovery_GATK4.python_docker": "python:2.7",

"##_COMMENT6": "PATHS",
"PreProcessingForVariantDiscovery_GATK4.gotc_path": "/usr/gitc/",
"PreProcessingForVariantDiscovery_GATK4.gatk_path": "/gatk/gatk",

"##_COMMENT7": "JAVA OPTIONS",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.MergeBamAlignment.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.MarkDuplicates.java_opt": "-Xms4000m",
"PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.java_opt_sort": "-Xms4000m",
"PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.java_opt_fix": "-Xms500m",
"PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator.java_opt": "-Xms4000m",
"PreProcessingForVariantDiscovery_GATK4.GatherBqsrReports.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.ApplyBQSR.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.GatherBamFiles.java_opt": "-Xms2000m",

"##_COMMENT8": "MEMORY ALLOCATION",
"PreProcessingForVariantDiscovery_GATK4.GetBwaVersion.mem_size": "1 GB",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.mem_size": "14 GB",
"PreProcessingForVariantDiscovery_GATK4.MergeBamAlignment.mem_size": "3500 MB",
"PreProcessingForVariantDiscovery_GATK4.MarkDuplicates.mem_size": "7 GB",
"PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.mem_size": "5000 MB",
"PreProcessingForVariantDiscovery_GATK4.CreateSequenceGroupingTSV.mem_size": "2 GB",
"PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator.mem_size": "6 GB",
"PreProcessingForVariantDiscovery_GATK4.GatherBqsrReports.mem_size": "3500 MB",
"PreProcessingForVariantDiscovery_GATK4.ApplyBQSR.mem_size": "3500 MB",
"PreProcessingForVariantDiscovery_GATK4.GatherBamFiles.mem_size": "3 GB",

"##_COMMENT9": "DISK SIZE ALLOCATION",
"PreProcessingForVariantDiscovery_GATK4.agg_small_disk": 200,
"PreProcessingForVariantDiscovery_GATK4.agg_medium_disk": 300,
"PreProcessingForVariantDiscovery_GATK4.agg_large_disk": 400,
"PreProcessingForVariantDiscovery_GATK4.flowcell_small_disk": 100,
"PreProcessingForVariantDiscovery_GATK4.flowcell_medium_disk": 200,

"##_COMMENT10": "PREEMPTIBLES",
"PreProcessingForVariantDiscovery_GATK4.preemptible_tries": 3,
"PreProcessingForVariantDiscovery_GATK4.agg_preemptible_tries": 3
}

(c) the mutect2 json file
{
"##_COMMENT1": "Runtime",
"##Mutect2.oncotator_docker": "(optional) String?",
"Mutect2.gatk_docker": "broadinstitute/gatk:4.0.7.0",

"##_COMMENT2": "Workflow options",
"##_Mutect2.intervals": "gs://gatk-best-practices/somatic-b37/whole_exome_agilent_1.1_refseq_plus_3_boosters.Homo_sapiens_assembly19.baits.
interval_list",
"Mutect2.scatter_count": 50,
"Mutect2.artifact_modes": ["G/T", "C/T"],
"##_Mutect2.m2_extra_args": "(optional) String?",
"##_Mutect2.m2_extra_filtering_args": "(optional) String?",
"Mutect2.run_orientation_bias_filter": "False",
"Mutect2.run_oncotator": "False",

"##_COMMENT3": "Primary inputs",
"Mutect2.ref_fasta": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.fasta",
"Mutect2.ref_dict": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.dict",
"Mutect2.ref_fai": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.fasta.fai",
"Mutect2.normal_bam": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/test7/mytestN.hg38.bam",
"Mutect2.normal_bai": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/test7/mytestN.hg38.bai",
"Mutect2.tumor_bam": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/test7/mytestT.hg38.bam",
"Mutect2.tumor_bai": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/test7/mytestT.hg38.bai",

"##COMMENT4": "Primary resources",
"##_Mutect2.pon": "(optional) File?",
"##_Mutect2.pon_index": "(optional) File?",
"Mutect2.gnomad": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/somatic/af-only-gnomad.hg38.vcf.gz",
"Mutect2.gnomad_index": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/somatic/af-only-gnomad.hg38.vcf.gz.tbi",
"Mutect2.variants_for_contamination": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/somatic/small_exac_common

3.hg38.vcf.gz",
"Mutect2.variants_for_contamination_index": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/somatic/small_exac_c
ommon_3.hg38.vcf.gz.tbi",
"##Mutect2.realignment_index_bundle": "File? (optional)",

"##_COMMENT5": "Secondary resources",
"Mutect2.onco_ds_tar_gz": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/somatic/oncotator_v1_ds_April052016.ta
r.gz",
"Mutect2.default_config_file": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/somatic/onco_config.txt",
"##_Mutect2.sequencing_center": "(optional) String?",
"##_Mutect2.sequence_source": "(optional) String?",

"##_COMMENT6": "Secondary resources",
"##_Mutect2.MergeBamOuts.mem": "(optional) Int?",
"##_Mutect2.SplitIntervals.mem": "(optional) Int?",
"##_Mutect2.M2.mem": "(optional) Int?",
"##_Mutect2.MergeVCFs.mem": "(optional) Int?",
"##_Mutect2.oncotate_m2.mem": "(optional) Int?",

"##_COMMENT7": "Secondary resources",
"##_Mutect2.onco_ds_local_db_dir": "(optional) String?",
"##_Mutect2.sequencing_center": "(optional) String?",
"##_Mutect2.oncotate_m2.oncotator_exe": "(optional) String?",
"##_Mutect2.gatk4_override": "(optional) File?",
"##_Mutect2.CollectSequencingArtifactMetrics.mem": "(optional) Int?",

"##_COMMENT8": "Disk space",
"##_Mutect2.MergeVCFs.disk_space_gb": "(optional) Int?",
"##_Mutect2.Filter.disk_space_gb": "(optional) Int?",
"##_Mutect2.M2.disk_space_gb": "(optional) Int?",
"##_Mutect2.M2.disk_space_gb": 100,
"##_Mutect2.oncotate_m2.disk_space_gb": "(optional) Int?",
"##_Mutect2.SplitIntervals.disk_space_gb": "(optional) Int?",
"##_Mutect2.MergeBamOuts.disk_space_gb": "(optional) Int?",
"##_Mutect2.CollectSequencingArtifactMetrics.disk_space_gb": "(optional) Int?",
"##_Mutect2.emergency_extra_disk": "(optional) Int?",

"##_COMMENT9": "Preemptibles",
"##_Mutect2.MergeBamOuts.preemptible_attempts": "(optional) Int?",
"Mutect2.preemptible_attempts": 3
}

healthylifetimesupplement

$
0
0

Nuhydrate Serum For starters, wakame provides for a hyaluronic acid booster for your skin, something we lose with growing old. Wakame has been clinically that may enhance how much hyaluronic acid in the skin, which usually responsible for firmness and elasticity. In short, wakame can create a profound difference in improving tired, aged Skin Care Review.

https://www.healthylifetimesupplement.com/nuhydrate-serum/

Set up pipeline for WES and WGS

$
0
0

Hi,

working with WES and WGS data (GATK 4.0.10.0) I would like to know the set up differents between their pipelines for germline and somatic variant discovery.

I saw that I have to use two different interval lists (-L, BQSR & HC), as also in the VQSR step I have to exclude the -an DP option in the WES pipeline .

Moreover, in the site it is reported that there is a WES pipeline with hg19 and a WGS pipeline with hg38, in which the last one is in the Best Practices pages.

Could you provide me any link about the WES set up.

Many thanks

Anamax Reviews @http://perfecttips4health.com/anamax-reviews/@

http://www.beautyandsupplement.com/rejuvalex-reviews/

$
0
0

Rejuvalex reviews :-Rejuvalex is an enhancement which intended to advance sound hair and help you keep the correct things in parity to guarantee most good hair development and upkeep. The blend of the beneath specified vitamins and minerals while including silica, will help the wellbeing of hair and scalp and return hair radiance also. Rejuvalex dynamic fixings can back off or stop the continuation of male pattern baldness silica remove reestablishes torpid follicle normally.

click to know more >> http://www.beautyandsupplement.com/rejuvalex-reviews/

click to know more >> https://beautyandsupplement2.blogspot.com/

click to know more >> https://sites.google.com/site/viewrejuvalexreviews/

click to know more >> https://sites.google.com/view/rejuvalex-review/

click to know more >> https://beautyandsupplement2.blogspot.com/2018/09/ultra-t-booster.html
click to know more >> https://www.smore.com/ncs2t-rejuvalex-reviews
click to know more >> http://rejuvalexreview.classtell.com/rejuvalexreviews/
click to know more >> https://www.magisto.com/video/aEYfIgcTAzQqXBViCzE

GenotypeGVCFs warning message

$
0
0

when i use the GenotypeGVCFs , i got so many warning message.
In the forum ,i saw many message like these, but no one ask what's the means of "No valid combination operation found".
can you explain the reason of emitting these warning for me ?

my command:
gatk GenotypeGVCFs \
-R ~/seqlib/melonomics/genome/CM3.6.1_pseudomol.fa \
-V gendb://my_database \
-O merge.vcf

version:4.0.9.0

part log message
10:55:38.624 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:55:38.625 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:55:38.625 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:55:38.625 INFO GenotypeGVCFs - Deflater: IntelDeflater
10:55:38.625 INFO GenotypeGVCFs - Inflater: IntelInflater
10:55:38.625 INFO GenotypeGVCFs - GCS max retries/reopens: 20
10:55:38.625 INFO GenotypeGVCFs - Requester pays: disabled
10:55:38.625 INFO GenotypeGVCFs - Initializing engine
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records

WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
10:55:43.604 INFO GenotypeGVCFs - Done initializing engine
10:55:43.652 INFO ProgressMeter - Starting traversal
10:55:43.652 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
10:56:09.611 INFO ProgressMeter - chr00:1000 0.4 1000 2311.3
10:56:20.989 INFO ProgressMeter - chr00:21002 0.6 21000 33747.6
10:56:23.303 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
10:56:23.712 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
10:56:28.339 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
10:56:28.343 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
10:56:28.344 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
10:56:36.158 INFO ProgressMeter - chr00:96792 0.9 39000 44566.3


https://www.supplementmegamart.com/cdx-labs-cbd-oil/

$
0
0

CDX Labs CBD Oil A examine within the European Journal of Pain used an animal mannequin to see if CBD could assist folks with arthritis handle their ache. They use full-spectrum extraction to make sure their clients get more of the advantages of hemp merchandise. That invoice legalized the manufacturing of hemp under state pilot packages so long as those hemp merchandise contain less than3 p.c THC. It does not have euphoric effects and is more and more being utilized in oils, edibles and different kinds to deal with medical situations such as pain. For instance, CBD stops the body from absorbing anandamide, a compound associated with regulating ache. Sensi Seeds CBD products can be shipped to every nation that we also ship seeds to, enabling legal and efficient enjoyment of its therapeutic properties.
https://www.supplementmegamart.com/cdx-labs-cbd-oil/

ActiveRegion determination (HaplotypeCaller & Mutect2)

$
0
0

This document details the procedure used by HaplotypeCaller to define ActiveRegions on which to operate as a prelude to variant calling. For more context information on how this fits into the overall HaplotypeCaller method, please see the more general HaplotypeCaller documentation.

This procedure is also applied by Mutect2 for somatic short variant discovery. See this article for a direct comparison between HaplotypeCaller and Mutect2.

Note that some of the command line argument names in this article may not be up to date. If you encounter any problems, please let us know in the comments so we can fix them.


Contents

  1. Overview
  2. Calculating the raw activity profile
  3. Smoothing the activity profile
  4. Setting the ActiveRegion thresholds and intervals

1. Overview

To define active regions, the HaplotypeCaller operates in three phases. First, it computes an activity score for each individual genome position, yielding the raw activity profile, which is a wave function of activity per position. Then, it applies a smoothing algorithm to the raw profile, which is essentially a sort of averaging process, to yield the actual activity profile. Finally, it identifies local maxima where the activity profile curve rises above the preset activity threshold, and defines appropriate intervals to encompass the active profile within the preset size constraints.


2. Calculating the raw activity profile

Active regions are determined by calculating a profile function that characterizes “interesting” regions likely to contain variants. The raw profile is first calculated locus by locus.

In the normal case (no special mode is enabled) the per-position score is the probability that the position contains a variant as calculated using the reference-confidence model applied to the original alignment.

If using the mode for genotyping given alleles (GGA) or the advanced-level flag -useAlleleTrigger, and the site is overlapped by an allele in the VCF file provided through the -alleles argument, the score is set to 1. If the position is not covered by a provided allele, the score is set to 0.

This operation gives us a single raw value for each position on the genome (or within the analysis intervals requested using the -L argument).


3. Smoothing the activity profile

The final profile is calculated by smoothing this initial raw profile following three steps. The first two steps consist in spreading individual position raw profile values to contiguous bases. As a result each position will have more than one raw profile value that are added up in the third and last step to obtain a final unique and smoothed value per position.

  1. Unless one of the special modes is enabled (GGA or allele triggering), the position profile value will be copied over to adjacent regions if enough high quality soft-clipped bases immediately precede or follow that position in the original alignment. At time of writing, high-quality soft-clipped bases are those with quality score of Q29 or more. We consider that there are enough of such a soft-clips when the average number of high quality bases per soft-clip is 7 or more. In this case the site profile value is copied to all bases within a radius of that position as large as the average soft-clip length without exceeding a maximum of 50bp.

  2. Each profile value is then divided and spread out using a Gaussian kernel covering up to 50bp radius centered at its current position with a standard deviation, or sigma, set using the -bandPassSigma argument (current default is 17 bp). The larger the sigma, the broader the spread will be.

  3. For each position, the final smoothed value is calculated as the sum of all its profile values after steps 1 and 2.


4. Setting the ActiveRegion thresholds and intervals

The resulting profile line is cut in regions where it crosses the non-active to active threshold (currently set to 0.002). Then we make some adjustments to these boundaries so that those regions that are to be considered active, with a profile running over that threshold, fall within the minimum (fixed to 50bp) and maximum region size (customizable using -activeRegionMaxSize).

  • If the region size falls within the limits we leave it untouched (it's good to go).

  • If the region size is shorter than the minimum, it is greedily extended forward ignoring that cut point and we come back to step 1. Only if this is not possible because we hit a hard-limit (end of the chromosome or requested analysis interval) we will accept the small region as it is.

  • If it is too long, we find the lowest local minimum between the maximum and minimum region size. A local minimum is a profile value preceded by a large one right up-stream (-1bp) and an equal or larger value down-stream (+1bp). In case of a tie, the one further downstream takes precedence. If there is no local minimum we simply force the cut so that the region has the maximum active region size.

Of the resulting regions, those with a profile that runs over this threshold are considered active regions and progress to variant discovery and or calling whereas regions whose profile runs under the threshold are considered inactive regions and are discarded except if we are running HC in reference confidence mode.

There is a final post-processing step to clean up and trim the ActiveRegion:

  • Remove bases at each end of the read (hard-clipping) until there a base with a call quality equal or greater than minimum base quality score (customizable parameter -mbq, 10 by default).

  • Include or exclude remaining soft-clipped ends. Soft clipped ends will be used for assembly and calling unless the user has requested their exclusion (using -dontUseSoftClippedBases), if the read and its mate map to the same chromosome, and if they are in the correct standard orientation (i.e. LR and RL).

  • Clip off adaptor sequences of the read if present.

  • Discard all reads that no longer overlap with the ActiveRegion after the trimming operations described above.

  • Downsample remaining reads to a maximum of 1000 reads per sample, but respecting a minimum of 5 reads starting per position. This is performed after any downsampling by the traversal itself (-dt, -dfrac, -dcov etc.) and cannot be overriden from the command line.

http://www.jordan-retro-5.com/

$
0
0

I suppose that I am simply not prepared to deal with that. This article is jammed full of info. I couldn't divulge this Brea Skin Labs is a problem. That was validated by men and women.

The Internet is big. Here's something that my Grandmother relates, "The grass is always greener on the other side." Brea Skin Labs is one of the most influential kinds of Brea Skin Labs. Here's how to prevent worrying regarding Brea Skin Labs. My time could be better spent by letting Brea Skin Labs go.

http://www.jordan-retro-5.com/

bamout option - mutect2: invalid argument value

$
0
0

Hi!
Kindly help me in fixing following error while trying to use bamout option with mutect2:

java -jar GenomeAnalysisTK.jar -T MuTect2 -R hg19.fa -I:tumor tum.bam -I:normal norm.bam -o output.vcf --disableOptimizations --dontTrimActiveRegions --forceActive -L 17:61914938-61914948 -bamout chr17output.bam

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.6-0-g89b7209):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Invalid argument value '17:61914938-61914948' at position 14.
ERROR ------------------------------------------------------------------------------------------

If I try to use mutect2 without the options: "--dontTrimActiveRegions --forceActive -L 17:61914938-61914948 -bamout " it will work.
Why did this happen?

thank you in advance
best

Java is using too many resources (threads, memory or CPU)

$
0
0

Most resource allocation problems you run into will be associated with either Spark multithreading or Java. We detail the most common issues as well as the recommended solutions below. These solutions typically involve adding either Spark or Java arguments to your GATK command line; see the GATK command line documentation for instructions on adding these arguments to your command line as they must be provided in a way that is different from regular GATK arguments.


Too many threads?

GATK will not use more threads than you allow. If you're running one of the tools that can use Spark multithreading, you can control the number of threads it uses with the Spark-specific arguments --num-executors and --executor-cores.

In addition to the threads used by GATK itself, Java may run threads of its own for garbage collection. If that causes you problems, you can limit the maximum number of garbage collection threads used by Java using the Java argument -XX:ConcGCThreads=1 (shown here with the max limit set to a single thread).


Too much memory?

You can set an upper limit for how much memory Java can use to run your command using the Java argument -Xmx.


Too much CPU?

This is usually related to garbage collection, just like the threads issue mentioned above. The solution is the same; limit the maximum number of garbage collection threads used by Java using the Java argument -XX:ConcGCThreads.

https://stylebet77.com/first/ - 퍼스트카지노

$
0
0

2016 년에 설립 된 https://stylebet77.com/first/ 퍼스트카지노는 도박을 즐기는 사람들에게 안전하고 재미 있고 공정하며 규제되고 안전한 기회를 제공함으로써 우수한 엔터테인먼트를 제공하기 위해 설립되었습니다.

How to get best performance in Merging 6K samples using CombineGVCFs and GenotypeGVCFs (GATK 4.0.9)

$
0
0

I am looking for the right practices in merging 6K samples using CombineGVCFs and GenotypeGVCFs to get the best performance. I don't see Spark appended Suffix for these, does that mean these methods can't make use of spark?

I also came across some old threads on plans in bringing tileDB in 4.0 version, which could improve the performance. How can I make use of tileDB?

PS: I generated the gvcfs using HaplotypeCallerSpark in gatk4


HaplotypeCaller --dbsnp

$
0
0

The doc says "dbSNP is not used in any way for the calculations themselves. --dbsnp binds reference ordered data". Does it mean that the determination of whether a locus is a variant is not influenced by whether that variant is present at dbSNP? what does "--dbsnp binds reference ordered data" mean?

Also why isn't there a --indel option?

Are there any plans to add multi-interval support to GenomicsDBImport?

$
0
0

The reason I ask is that it's rather annoying when you've chunking your input data and one of your chunks crosses a chromosome boundary. it seems like according to the Github docs thqt GenomicsDB supports this with vcf2tiledb, but I'm not sure whether it will then work with GenotypeGVCFs?

Can I use GenomicsDBImport instead of combineGVCFs in GATK v3.8?

$
0
0

I was running GATK 3.8 and combineGVCFs of 719 individuals. However, my job was killed because during this process it's created a gigantic amount of temp files in the cluster. So, the cluster manager is forcing me to change to a different approach (GenomicsDBImport) and honestly, I don't know it will work. I am a beginner in this analysis and this change makes me worry a little bit.
Thanks a lot,
Carlos

Query regarding picard liftoverVcf

$
0
0

have been running liftoverVcf between genomes of the honey bee, it is successfully running but none of the SNPs are getting lifted over.
Command :
java -jar picard.jar LiftoverVcf I=/data2/OUTPUT/snp/5074_snp_R3_final.vcf.gz O=5074_lifted_over.vcf CHAIN=amel4.5toHAV3.1.over.chain.gz REJECT=5074_rejected_variants.vcf R=HAv3.1_genome.fa WARN_ON_MISSING_CONTIG=true
Output
java -jar picard.jar LiftoverVcf I=1007_snp_R3_final.vcf O=1024_lifted_over.vcf CHAIN=amel4.5toHAV3.1.over.chain.gz REJECT=1024_rejected_variants.vcf R=HAv3.1_genome.fa
INFO 2018-10-12 11:37:01 LiftoverVcf

********** NOTE: Picard's command line syntax is changing.


********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


********** The command line looks like this in the new syntax:


********** LiftoverVcf -I 1007_snp_R3_final.vcf -O 1024_lifted_over.vcf -CHAIN amel4.5toHAV3.1.over.chain.gz -REJECT 1024_rejected_variants.vcf -R HAv3.1_genome.fa


11:37:01.545 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data2/side_analysis/picard_liftover/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Oct 12 11:37:01 EDT 2018] LiftoverVcf INPUT=1007_snp_R3_final.vcf OUTPUT=1024_lifted_over.vcf CHAIN=amel4.5toHAV3.1.over.chain.gz REJECT=1024_rejected_variants.vcf REFERENCE_SEQUENCE=HAv3.1_genome.fa WARN_ON_MISSING_CONTIG=false LOG_FAILED_INTERVALS=true WRITE_ORIGINAL_POSITION=false WRITE_ORIGINAL_ALLELES=false LIFTOVER_MIN_MATCH=1.0 ALLOW_MISSING_FIELDS_IN_HEADER=false RECOVER_SWAPPED_REF_ALT=false TAGS_TO_REVERSE=[AF] TAGS_TO_DROP=[MAX_AF] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Fri Oct 12 11:37:01 EDT 2018] Executing as tanu@koschei on Linux 4.4.0-124-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.14-SNAPSHOT
INFO 2018-10-12 11:37:02 LiftoverVcf Loading up the target reference genome.
INFO 2018-10-12 11:37:03 LiftoverVcf Lifting variants over and sorting (not yet writing the output file.)
INFO 2018-10-12 11:37:18 LiftoverVcf Processed 1654903 variants.
INFO 2018-10-12 11:37:18 LiftoverVcf 1654903 variants failed to liftover.
INFO 2018-10-12 11:37:18 LiftoverVcf 0 variants lifted over but had mismatching reference alleles after lift over.
INFO 2018-10-12 11:37:18 LiftoverVcf 100.0000% of variants were not successfully lifted over and written to the output.
INFO 2018-10-12 11:37:18 LiftoverVcf liftover success by source contig:
INFO 2018-10-12 11:37:18 LiftoverVcf 1.1: 0 / 11968 (0.0000%)
INFO 2018-10-12 11:37:18 LiftoverVcf 1.10: 0 / 11140 (0.0000%)
INFO 2018-10-12 11:37:18 LiftoverVcf 1.11: 0 / 5 (0.0000%)
INFO 2018-10-12 11:37:18 LiftoverVcf 1.12: 0 / 5 (0.0000%)
INFO 2018-10-12 11:37:18 LiftoverVcf 1.13: 0 / 6 (0.0000%)
INFO 2018-10-12 11:37:18 LiftoverVcf 1.14: 0 / 2781 (0.0000%)
INFO 2018-10-12 11:37:18 LiftoverVcf 1.15: 0 / 10992 (0.0000%)
INFO 2018-10-12 12:14:10 LiftoverVcf lifted variants by target contig:
INFO 2018-10-12 12:14:10 LiftoverVcf no successfully lifted variants
WARNING 2018-10-12 12:14:10 LiftoverVcf 0 variants with a swapped REF/ALT were identified, but were not recovered. See RECOVER_SWAPPED_REF_ALT and associated caveats.
INFO 2018-10-12 12:14:10 LiftoverVcf Writing out sorted records to final VCF.
[Fri Oct 12 12:14:10 EDT 2018] picard.vcf.LiftoverVcf done. Elapsed time: 0.20 minutes.
Runtime.totalMemory()=3609722880
It is making the output files too.

Any suggestions or link which can help me.
Thank You,
With Regards,
Tanushree

more 512M chromosome problem in GATK

$
0
0

i use GATK to deal with Wheat_survey genome which have a 3B chromosome(700+M), it can`t call snp from 512M to 700M use default paramaters, how fix it???

Viewing all 12345 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>