Azure : Error 404: AciDeploymentFailed / Error 400 ACI Service request failed

I am trying to deploy a machine learning model through an ACI (Azure Container Instances) service. I am working in Python and I followed the following code (from the official documentation : https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=azcli) :

The entry script file is the following (score.py):

import os
import dill
import joblib

def init():
    global model
    # Get the path where the deployed model can be found
    model_path = os.getenv('AZUREML_MODEL_DIR')

    # Load existing model
    model = joblib.load('model.pkl')

# Handle request to the service
def run(data):
    try:
        # Pick out the text property of the JSON request
        # Expected JSON details {"text": "some text to evaluate"}
        data = json.loads(data)
        prediction = model.predict(data['text'])
        return prediction
    except Exception as e:
        error = str(e)
        return error

And the model deployment workflow is as:

from azureml.core import Workspace
# Connect to workspace
ws = Workspace(subscription_id="my-subscription-id",
               resource_group="my-ressource-group-name",
               workspace_name="my-workspace-name")


from azureml.core.model import Model
model = Model.register(workspace = ws,
                       model_path= 'model.pkl',
                       model_name = 'my-model',
                       description = 'my-description')


from azureml.core.environment import Environment
# Name environment and call requirements file
# requirements: numpy, tensorflow
myenv = Environment.from_pip_requirements(name = 'myenv', file_path = 'requirements.txt')

from azureml.core.model import InferenceConfig
# Create inference configuration
inference_config = InferenceConfig(environment=myenv, entry_script='score.py')

from azureml.core.webservice import AciWebservice #AksWebservice
# Set the virtual machine capabilities
deployment_config = AciWebservice.deploy_configuration(cpu_cores = 0.5, memory_gb = 3)


from azureml.core.model import Model
# Deploy ML model (Azure Container Instances)
service = Model.deploy(workspace=ws,
                       name='my-service-name',
                       models=[model],
                       inference_config=inference_config,
                       deployment_config=deployment_config)

service.wait_for_deployment(show_output = True)

I succeded once with the previous code. I noticed that during the deployment the Model.deploy created a container registry with a specific name (6e07ce2cc4ac4838b42d35cda8d38616).
The API was working well and I wanted to deploy an other model from scratch. I deleted the API service and model from Azure ML Studio and the container registry from Azure ressources.

Unfortunately I am not able to deploy again anything.

Everything goes fine until the last step (the Model.deploy step), I have the following error message :

Service deployment polling reached non-successful terminal state, current service state: Unhealthy

Operation ID: 46243f9b-3833-4650-8d47-3ac54a39dc5e

More information can be found here: https://machinelearnin2812599115.blob.core.windows.net/azureml/ImageLogs/46245f8b-3833-4659-8d47-3ac54a39dc5e/build.log?sv=2019-07-07&sr=b&sig=45kgNS4sbSZrQH%2Fp29Rhxzb7qC5Nf1hJ%2BLbRDpXJolk%3D&st=2021-10-25T17%3A20%3A49Z&se=2021-10-27T01%3A24%3A49Z&sp=r

Error:

{

"code": "AciDeploymentFailed",

"statusCode": 404,

"message": "No definition exists for Environment with Name: myenv Version: Autosave_2021-10-25T17:24:43Z_b1d066bf Reason: Container > registry 6e07ce2cc4ac4838b42d35cda8d38616.azurecr.io not found. If private link is enabled in workspace, please verify ACR is part of private > link and retry..",

"details": []

}

I do not understand why the first time a new container registry was well created, but now it seems that it is sought (the message is saying that container registry identified by name 6e07ce2cc4ac4838b42d35cda8d38616 is missing). I never found where I can force the creation of a new container registry ressource in Python, neither specify a name for it in AciWebservice.deploy_configuration or Model.deploy.

I tried to create the container registry by hand, but this time, this is the container that cannot be created. The Python output of the Model.deploy is the following :

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.

Running

2021-10-25 19:25:10+02:00 Creating Container Registry if not exists.

2021-10-25 19:25:10+02:00 Registering the environment.

2021-10-25 19:25:13+02:00 Building image..

2021-10-25 19:30:45+02:00 Generating deployment configuration.

2021-10-25 19:30:46+02:00 Submitting deployment to compute.

Failed

Service deployment polling reached non-successful terminal state, current service state: Unhealthy

Operation ID: 93780de6-7662-40d8-ab9e-4e1556ef880f

Current sub-operation type not known, more logs unavailable.

Error:

{

"code": "InaccessibleImage",

"statusCode": 400,

"message": "ACI Service request failed. Reason: The image ‘6e07ce2cc4ac4838b42d35cda8d38616.azurecr.io/azureml/azureml_684133370d8916c87f6230d213976ca5’ in container group ‘my-service-name-LM4HbqzEBEi0LTXNqNOGFQ’ is not accessible. Please check the image and registry credential.. Refer to https://docs.microsoft.com/azure/container-registry/container-registry-authentication#admin-account and make sure Admin user is enabled for your container registry."

}

I tried to follow the recommandation of the last message saying to set Admin user enabled for the container registry. All what I saw in Azure interface is that a username and password appeared when enabling on user admin.

Unfortunately the same error message appears again if I try to relaunche my code and I am stucked here…

Does anyone could help me moving on with this? The best solution would be I think to delete totally this 6e07ce2cc4ac4838b42d35cda8d38616 container registry but I can’t find where the reference is set so Model.deploy always fall to find it.

An other solution would be to force Model.deploy to generate a new container registry, but I could find how to make that.

It’s been 2 days that I am on this and I really need your help !

PS : I am not at all a DEVOPS/MLOPS guy, I make data science and good models, but infrastructure and deployment is not really my thing so please be gentle on this part ! 🙂

Source: Docker Questions

LEAVE A COMMENT