Invalid arguments Error when using Airflow DockerOperator

  airflow, docker, dockeroperator, python

I am getting an error when using DockerOperator from Airflow:

raise AirflowException(
airflow.exceptions.AirflowException: Invalid arguments were passed to DockerOperator (task_id: etl_in_ch). Invalid arguments were:
**kwargs: {'volumes': ['./CH_ETL/src:/usr/src/copy_data', './pyproject.toml:pyproject.toml']}

My code:

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.docker_operator import DockerOperator
from airflow.operators.dummy_operator import DummyOperator

default_args = {
'owner'                 : 'airflow',
'description'           : 'Extract data from different sources into CH and train model with it',
'depend_on_past'        : False,
'start_date'            : datetime(2021, 7, 20),
'email_on_failure'      : False,
'email_on_retry'        : False,
'retries'               : 1,
'retry_delay'           : timedelta(minutes=5)
}

with DAG('docker_operator_demo', default_args=default_args, schedule_interval="00 23 * * *", catchup=False) as dag:
    start_dag = DummyOperator(
        task_id='start_dag'
        )

    end_dag = DummyOperator(
        task_id='end_dag'
        )

    t1 = DockerOperator(
        task_id='etl_in_ch',
        image='python:3.9.2-slim',
        container_name='etl_in_ch',
        api_version='auto',
        auto_remove=True,
        command="apt-get update && apt-get install -y cron && apt-get install -y libxml2 libxslt-dev wget bzip2 gcc 
                && pip install --no-cache-dir  --upgrade pip 
                && pip install --no-cache-dir poetry==1.1.5 
                && poetry config virtualenvs.create false
                && poetry install --no-interaction --no-ansi 
                && chmod +x /src/__main__.py 
                %% python __main__.py",
        docker_url="unix://var/run/docker.sock",
        network_mode="bridge",
        environment={"PYTHONDONTWRITEBYTECODE": 1, "PYTHONUNBUFFERED": 1},
        working_dir="/usr/src/copy_data",
        volumes=['./CH_ETL/src:/usr/src/copy_data', './pyproject.toml:pyproject.toml']
    )


    start_dag >> t1

    t1 >> end_dag

What am I doing wrong here? I am trying to run a dockerized python service with Apache Airflow, to be honest do not see a reason why suddenly "volumes" should be wrong argument.

Source: Docker Questions

LEAVE A COMMENT