Dear GATK Team,
I ran GATK4 variant calling as per best practices on one WGS sample sequenced in lanes.
Steps followed to get MergedBAM : Aligned lane wise fastq separately, remove duplicates, merge lane bam and again Markduplicates. Variant calling on mergebam. I followed the below reference.
Also I ran variant calling on lane-wise bam separately in order to compare 2 lane g.vcf files with merged bam g.vcf
When I compare gvcf generated from individual lane bam and merged bam. it is huge difference in size.
Sample # of lines GVCF Size in GB
lane-1 658655987 7.6G
lane-2 442845977 5.6G
Merged 83563153 1.3G
I have seen less difference when I convert gvcf to vcf using gvcftools extract_variants.But at g.vcf level I am not sure why I am getting this much difference in file sizes.
Could you please help me.
Thanks In Advance
Fazulur Rehaman