How can I modify library versions in the docker image used by dask workers?

  dask, dask-distributed, dockerfile

I am trying to run a distributed computation using Dask on a AWS Fargate cluster (using dask.cloudprovider API) and I am running into the exact same issue as this question. Based on the partial answers to the linked question, and on things like this, I heavily suspect it is due to the pandas version in my worker being outdated; and indeed the
official Dask Dockerfile specifies a old-ish version of pandas.

By contrast, when I run my computation locally (using a distributed.LocalCluster) with a pandas version at 1.2.2 it works fine. Btw, it is a call to the categorize method on a Dask DataFrame that triggers the error in the Fargate cluster case.

What I would like to do as a workaround is simply to specify myself the version of pandas in the image deployed to the workers, either by rewriting the Dockerfile or through some other method. Is there a way to achieve this?

Source: Dockerfile Questions

LEAVE A COMMENT