mounting directories using docker operator on airflow is not working

I’m trying to use the docker operator to automate the execution of some scripts using airflow.

Airflow version: apache-airflow==1.10.12

What I want to do is to "copy" all my project’s files (with folders and files) to the container using this code.

The following file ml-intermediate.py is in this directory ~/airflow/dags/ml-intermediate.py:

"""
Template to convert a Ploomber DAG to Airflow
"""
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import days_ago

from ploomber.spec import DAGSpec
from soopervisor.script.ScriptConfig import ScriptConfig

script_cfg = ScriptConfig.from_path('/home/letyndr/airflow/dags/ml-intermediate')
# Replace the project root to reflect the new location - or maybe just
# write a soopervisor.yaml, then we can we rid of this line
script_cfg.paths.project = '/home/letyndr/airflow/dags/ml-intermediate'

# TODO: use lazy_import from script_cfg
dag_ploomber = DAGSpec('/home/letyndr/airflow/dags/ml-intermediate/pipeline.yaml',
                       lazy_import=True).to_dag()
dag_ploomber.name = "ML Intermediate"

default_args = {
    'start_date': days_ago(0),
}

dag_airflow = DAG(
    dag_ploomber.name.replace(' ', '-'),
    default_args=default_args,
    description='Ploomber dag',
    schedule_interval=None,
)

script_cfg.save_script()

from airflow.operators.docker_operator import DockerOperator
for task_name in dag_ploomber:
    DockerOperator(task_id=task_name,
        image="continuumio/miniconda3",
        api_version="auto",
        auto_remove=True,
        # command="sh /home/letyndr/airflow/dags/ml-intermediate/script.sh",
        command="sleep 600",
        docker_url="unix://var/run/docker.sock",
        volumes=[
            "/home/letyndr/airflow/dags/ml-intermediate:/home/letyndr/airflow/dags/ml-intermediate:rw",
            "/home/letyndr/airflow-data/ml-intermediate:/home/letyndr/airflow-data/ml-intermediate:rw"
        ],
        working_dir=script_cfg.paths.project,
        dag=dag_airflow,
        container_name=task_name,
    )



for task_name in dag_ploomber:
    task_ploomber = dag_ploomber[task_name]
    task_airflow = dag_airflow.get_task(task_name)

    for upstream in task_ploomber.upstream:
        task_airflow.set_upstream(dag_airflow.get_task(upstream))

dag = dag_airflow

When I execute this DAG using Airflow, I get the error that the docker does not find the /home/letyndr/airflow/dags/ml-intermediate/script.sh script. I changed the execution command of the docker operator sleep 600 to enter to the container and check the files in the container with the corrects paths.

When I’m in the container I can go to this path /home/letyndr/airflow/dags/ml-intermediate/ for example, but I don’t see the files that are supposed to be there.

I tried to replicate how Airflow implements Docker SDK for Python checking this part of the package docker operator Airflow, specifically, this one where it creates the docker container: docker container creation

This is my one replication of the docker implementation:

import docker

client = docker.APIClient()

# binds = {
#         "/home/letyndr/airflow/dags": {
#             "bind": "/home/letyndr/airflow/dags",
#             "mode": "rw"
#         },
#         "/home/letyndr/airflow-data/ml-intermediate": {
#             "bind": "/home/letyndr/airflow-data/ml-intermediate",
#             "mode": "rw"
#         }
#     }

binds = ["/home/letyndr/airflow/dags:/home/letyndr/airflow/dags:rw",
"/home/letyndr/airflow-data/ml-intermediate:/home/letyndr/airflow-data/ml-intermediate:rw"]

container = client.create_container(
    image="continuumio/miniconda3",
    command="sleep 600",
    volumes=["/home/letyndr/airflow/dags", "/home/letyndr/airflow-data/ml-intermediate"],
    host_config=client.create_host_config(binds=binds),
    working_dir="/home/letyndr/airflow/dags",
    name="simple_example",
)

client.start(container=container.get("Id"))

What I found was that mounting volumes only works if it’s set host_config and volumes, the problem is that the implementation on Airflow just set host_config but not volumes. I added the parameter on the method create_container, it worked.

Do you know if I’m using docker operator correctly or is this an issue?

Source: Docker Questions