I have been trying to use GATK4's CalculateContamination but the output is not as expected:
level contamination error
whole_bam 0.0 1.0
The GATK log contained warnings that there was not enough data points to segment and that no hom alt sites were found.
Using GATK jar /mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/share/gatk4-4.0.4.0-0/gatk-package-4.0.4.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx16g -jar /mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/share/gatk4-4.0.4.0-0/gatk-package-4.0.4.0-local.jar CalculateContamination -I out/BC002-03042014_A_getpileupsummaries.table -O out/BC002-03042014_A_calculatecontamination.table
Picked up _JAVA_OPTIONS: -XX:+UseSerialGC
09:46:05.758 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/projects/dlho/tancrc/bcbio_pipeline/anaconda/share/gatk4-4.0.4.0-0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
09:46:05.872 INFO CalculateContamination - ------------------------------------------------------------
09:46:05.872 INFO CalculateContamination - The Genome Analysis Toolkit (GATK) v4.0.4.0
09:46:05.872 INFO CalculateContamination - For support and documentation go to https://software.broadinstitute.org/gatk/
09:46:05.872 INFO CalculateContamination - Executing as dlho@n086.default.domain on Linux v2.6.32-431.el6.x86_64 amd64
09:46:05.872 INFO CalculateContamination - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_102-b14
09:46:05.873 INFO CalculateContamination - Start Date/Time: May 14, 2018 9:46:05 AM SGT
09:46:05.873 INFO CalculateContamination - ------------------------------------------------------------
09:46:05.873 INFO CalculateContamination - ------------------------------------------------------------
09:46:05.873 INFO CalculateContamination - HTSJDK Version: 2.14.3
09:46:05.873 INFO CalculateContamination - Picard Version: 2.18.2
09:46:05.873 INFO CalculateContamination - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:46:05.873 INFO CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:46:05.873 INFO CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:46:05.873 INFO CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:46:05.873 INFO CalculateContamination - Deflater: IntelDeflater
09:46:05.874 INFO CalculateContamination - Inflater: IntelInflater
09:46:05.874 INFO CalculateContamination - GCS max retries/reopens: 20
09:46:05.874 INFO CalculateContamination - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
09:46:05.874 INFO CalculateContamination - Initializing engine
09:46:05.874 INFO CalculateContamination - Done initializing engine
09:46:05.935 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
09:46:05.961 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2). Local changepoint costs will not be calculated for this window size.
09:46:05.961 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points
09:46:06.083 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
09:46:06.090 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (3) to segment; using all data points to calculate kernel matrix.
09:46:06.090 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (3). Local changepoint costs will not be calculated for this window size.
09:46:06.090 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points
09:46:06.091 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
09:46:06.091 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
09:46:06.092 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2). Local changepoint costs will not be calculated for this window size.
09:46:06.092 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points
09:46:06.092 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
09:46:06.093 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
09:46:06.093 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
09:46:06.093 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points
09:46:06.093 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
09:46:06.113 WARN CalculateContamination - No hom alt sites found! Perhaps GetPileupSummaries was run on too small of an interval, or perhaps the sample was extremely inbred or haploid.
09:46:06.116 WARN CalculateContamination - No hom alt sites found! Perhaps GetPileupSummaries was run on too small of an interval, or perhaps the sample was extremely inbred or haploid.
09:46:06.117 WARN CalculateContamination - No hom alt sites found! Perhaps GetPileupSummaries was run on too small of an interval, or perhaps the sample was extremely inbred or haploid.
To get the pileup file required for CalculateContamination I used GetPileupSummaries and restricted the region with -L to a bedfile containing 77 genes which are of interest. The pileup file looks normal and I have 311 variants in the file though, is this not enough to CalculateContamination? Can CalculateContamination not be performed on small targeted sequencing panels? Would appreciate if someone could assist pls!