Hi,
I tried using GenomicsDBImport for our data. In my testcase I tried importing Chromosome 1 for 223 samples. Since most samples are panels and we have only a few genomes and exomes, I thought it would be best to always call anything together.
My commandline:
opt/gatk/4.0.0.0/gatk --java-options "-Xmx8G -Xms8G" GenomicsDBImport
--sample-name-map[...]/all_samples.sample_map
--genomicsdb-workspace-path [...]/germline_snp_database_1
--batch-size 50
-L NC_000001
--reader-threads 5
I only use 5 reader threads because I plan on parallelizing with scatter gather later on. The command is running since 14 hours on a local server. Is there something wrong, or something I can do to mae it reasonable fast? So far the GATK 3.8 pipeline is way faster.
Thanks & best regards,
Daniel