i’m trying to get into some new topics (especially Docker and Airflow). Therefore I have thought about the following mini-project:
I pull data from the web, transform the data and want to visualize it in a shiny app. I want to orchestrate these three process steps with the help of Airflow.
Now I am not sure how to structure the whole thing and have the following considerations:
I define three containers (for loading, transforming and visualizing) and arrange them in a DAG using three DockerOperators. My problem:
I do not yet have an elegant way to exchange data between the containers.
Therefore I have…
a) read about XCOM, but this is actually for the exchange of metadata and not to transfer (large) amounts of data between the operators.
b) I thought about Docker-Compose, but for that I would have to write my own DockerCompose operator, right? With the standard DockerOperator, I can only start one container or? Especially I don’t want to run Airflow in a container, but only the respective tasks.
c) I thought about mounting the data after loading, then reading or mounting it again in the transformation step to visualize the transformed data in the Shiny app – but somehow I don’t find this way elegant.
d) Of course I could put everything into one big container, but since I want to build the respective steps on different images, I would like to separate them.
Do you have an idea for me? Am I missing something or am I using the wrong tools?
Thanks a lot in advance!