amazon EMR spark-submit doesn’t allow docker Image pattern sha256 diges

I am using Amazon EMR

my question log is following.

Image name '<account_id>.dkr.ecr.ap-northeast-2.amazonaws.com/[email protected]:3d3a07135.......https://forums.aws.amazon.com/....0.87563e51ef5c841841a5d1a6dde9c' doesn't match docker image name pattern

Release label:emr-6.2.0
Hadoop distribution:Amazon 3.2.1
Applications:Spark 3.0.1,
Hive 3.1.2,
JupyterHub 1.1.0,
Ganglia 3.7.2
Zeppelin 0.9.0
Livy 0.7.0
Hue 4.8.0
PrestoSQL 343

The reason I had to use sha256 digest is becuase I previously used TAG:latest pyspark image hardcoded in airflow job ALSO containerized in ECR image.
so, when my airflow container runs a EMROperator(SSHoperator precisely) as a CLI spark-submit. It pull :latest spark container which doesn’t update because of some reason.

It is strange because when I ssh into core instance, I am able to pull sha256 name pattern from ECR, also update :latest TAG if something is changed(so digest changed).

I think this is something about spark configuration or spark source from AWS which prohibited digest name pattern, but I can’t debug this because I do not have spark(Amazon) source on my own. I would appreciate your answer.

Many thanks,

Source: Docker Questions

LEAVE A COMMENT