Hi,
I am having an issue with how to appropriately submit a WDL workflow to the cloud when a subworkflow is involved, which I have not been able to find anything in the forums or spec sheets to explain this. I am attempting to run a variant calling workflow across a large cohort of whole exome data. The workflow is constructed as a scatter-gather of the individual samples. However, to parallelize the variant calling step with HaplotypeCaller (with intervals), I needed a nested scatter-gather. To do this, a subworkflow was invoked.
Currently, I am submitting the jobs with the following command:
gcloud alpha genomics pipelines run \
--pipeline-file wdl_pipeline.yaml \
--zones us-east1-b \
--logging gs://dfci-testgenomes/logging \
--inputs-from-file WDL=VariantCalling.cloud.wdl \
--inputs-from-file WORKFLOW_INPUTS=VariantCalling.cloud.inputs.json \
--inputs-from-file WORKFLOW_OPTIONS=VariantCalling.cloud.options.json \
--inputs WORKSPACE=gs://dfci-testgenomes/workspace \
--inputs OUTPUTS=gs://dfci-testgenomes/outputs
The following files are located in the same directory as the invocation of the above command:
1)VariantCalling.cloud.wdl
2)VariantCalling.cloud.inputs.json
3)VariantCalling.cloud.options.json
4)subHaplotypeCaller.cloud.wdl
subHaplotypeCaller.cloud.wdl is my sub-workflow. In my main workflow (VariantCalling.cloud.wdl), it is imported and called as follows:
import "subHaplotypeCaller.cloud.wdl" as HaplotypeCaller
...
call HaplotypeCaller.HaplotypeCallerAndGatherVCFs {
input:
input_bam = ApplyBQSR.recalibrated_bam,
input_bam_index = ApplyBQSR.recalibrated_bam_index,
ref_fasta = ref_fasta,
ref_fasta_index = ref_fasta_index,
ref_dict = ref_dict,
gvcf_basename = inputs[1],
scattered_calling_intervals = scattered_calling_intervals
}
However, I get an error from Cromwell upon submission that reads as:
2017-03-06 22:30:52,742 cromwell-system-akka.actor.default-dispatcher-7 ERROR - WorkflowManagerActor: Workflow failed submission: Workflow input processing failed.
Unable to load namespace from workflow: /wdl_runner/subHaplotypeCaller.cloud.wdl
cromwell.engine.workflow.MaterializeWorkflowDescriptorActor$$anonfun$receive$1$$anon$1: Workflow input processing failed.
Unable to load namespace from workflow: /wdl_runner/subHaplotypeCaller.cloud.wdl
at cromwell.engine.workflow.MaterializeWorkflowDescriptorActor$$anonfun$receive$1.applyOrElse(MaterializeWorkflowDescriptorActor.scala:69) ~[cromwell.jar:0.19]
at akka.actor.Actor$class.aroundReceive(Actor.scala:467) ~[cromwell.jar:0.19]
at cromwell.engine.workflow.MaterializeWorkflowDescriptorActor.aroundReceive(MaterializeWorkflowDescriptorActor.scala:59) ~[cromwell.jar:0.19]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) [cromwell.jar:0.19]
at akka.actor.ActorCell.invoke(ActorCell.scala:487) [cromwell.jar:0.19]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) [cromwell.jar:0.19]
at akka.dispatch.Mailbox.run(Mailbox.scala:220) [cromwell.jar:0.19]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) [cromwell.jar:0.19]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [cromwell.jar:0.19]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [cromwell.jar:0.19]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [cromwell.jar:0.19]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [cromwell.jar:0.19]
It seems the issue is at identifying where the sub-workflow is located. What would be the appropriate means to submit this workflow to gcloud with the sub-workflow. Please let me know if there is any further information I can provide.
-- Derrick DeConti