Hi,
run RealignerTargetCreator but using multiple '-known file.vcf' arguments. One of the files is causing a problem. It is not very helpful message anyway:
GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T RealignerTargetCreator --num_threads 4 --num_cpu_threads_per_data_thread 4 -R ftp.broadinstitute.org/bundle/hg38/hg38bundle/Homo_sapiens_assembly38.fasta -I normal.bam -I cancer.bam -known ftp.broadinstitute.org/bundle/hg38/hg38bundle/Homo_sapiens_assembly38.known_indels.vcf -known ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/other_mapping_resources/ALL_20141222.dbSNP142_human_GRCh38.snps.vcf -known ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/other_mapping_resources/Mills_and_1000G_gold_standard.indels.b38.primary_assembly.vcf -known ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/other_mapping_resources/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf -known ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/VCF/GATK/common_all_20160527.vcf -known ussd-ftp.illumina.com/2016-1.0/hg38/small_variants/NA12877/NA12877.vcf -known ussd-ftp.illumina.com/2016-1.0/hg38/small_variants/NA12878/NA12878.vcf -o sample.forIndelRealigner.intervals
...
ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.6-0-g89b7209):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Your input file has a malformed header: Unexpected tag POS in line <ID=POS=POS-1,Number=0,Type=Flag,Description="POS has been adjusted due to missing REF in NCBI VCF file">
ERROR ------------------------------------------------------------------------------------------
Further, the documentation at https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_indels_IndelRealigner.php does not say that the GVF files cannot be used. I thought that I could have included them as well as they also contain annotated indels:
For GRCh38 I think of:
ftp.ensembl.org/pub/release-85/variation/gvf/homo_sapiens/Homo_sapiens_structural_variations.gvf.gz
ftp.ensembl.org/pub/release-85/variation/gvf/homo_sapiens/Homo_sapiens.gvf.gz
For hg19 I would think of:
ftp.ensembl.org/pub/release-75/variation/gvf/homo_sapiens/1000GENOMES-phase_1_EUR.gvf.gz
ftp.ensembl.org/pub/release-75/variation/gvf/homo_sapiens/Homo_sapiens_structural_variations.gvf.gz
ftp.ensembl.org/pub/release-75/variation/gvf/homo_sapiens/Homo_sapiens.gvf.gz
I admit I am not familiar with their contents, but I thought GATK's RealignerTargetCreator, IndelRealigner and BaseRecalibrator will overcome eventual redundancy in their contents and pick lines each of them wants.
Thank you for your thoughts,
Martin