Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all articles
Browse latest Browse all 12345

[WDL][Cromwell[ Mounting a directory to the docker for access.

$
0
0

Hi,

I am attempting to run Gemini within a docker through WDL and Cromwell. I have installed gemini with no data as the data is too large to be put into a Docker (plus it's bad practice). So I need to download the data elsewhere, and link it to be available for the gemini binary to access. Locally on my own machine without WDL, I might run the following to get this to work:

docker run -rm -v /path/to/local/gemini/data:/path/to/container/gemini/data -i gemini load -t VEP -v my.vcf my.db

At the bottom is have outlined my submission script with google genomics pipelines run and the yaml configuration for background. However, the crux of my problem is that I am unsure with the Broad docker image for wdl_runner what the mount procedure for the docker is.

In the WDL documentation, for local backends, the docker by default does the following:

docker run --rm -v <cwd>:<docker_cwd> -i <docker_image> /bin/bash < <script>

Now supposing I have my data in a google bucket at gs://my_bucket/data_for_gemini. How would I define in WDL the appropriate code to mount that google bucket directory so gemini inside the docker can access it?

Example WDL:

task Gemini {
    File my_vcf
    # how to pass an entire google bucket directory as a target site?

    command {
        # define mounts in here somehow?
        gemini load -t VEP -v ${my_vcf} out.db
    }
    runtime {
        # define mounts in here?
        docker: "gcr.io/my_containers/gemini"
        memory: "4 GB"
        cpu: "1"
    }
    output {
        File gemini_db = "out.db"
    }
}

I have thought one inelegant solution would be to run a docker in a docker and mount via that way. But I wanted to know if there would be a better and more elegant way.

-- Derrick DeConti

My submission script is:

gcloud alpha genomics pipelines run \
        --pipeline-file wdl_pipeline.yaml \
        --zones us-east1-b \
        --logging gs://dfci-cccb-pipeline-testing/logging \
        --inputs-from-file WDL=VariantCalling.cloud.wdl  \
        --inputs-from-file WORKFLOW_INPUTS=VariantCalling.cloud.inputs.json \
        --inputs-from-file WORKFLOW_OPTIONS=VariantCalling.cloud.options.json \
        --inputs WORKSPACE=gs://dfci-cccb-pipeline-testing/workspace \
        --inputs OUTPUTS=gs://dfci-cccb-pipeline-testing/outputs

The resultant yaml is as follows:

name: WDL Runner
description: Run a workflow defined by a WDL file

inputParameters:
- name: WDL
  description: Workflow definition
- name: WORKFLOW_INPUTS
  description: Workflow inputs
- name: WORKFLOW_OPTIONS
  description: Workflow options

- name: WORKSPACE
  description: Cloud Storage path for intermediate files
- name: OUTPUTS
  description: Cloud Storage path for output files

docker:
  imageName: gcr.io/broad-dsde-outreach/wdl_runner

  cmd: >
    /wdl_runner/wdl_runner.sh

resources:
  minimumRamGb: 1

Viewing all articles
Browse latest Browse all 12345

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>