custom-speech-to-text slow on first invocation after inactivity

Upon first launch and after a period of inactivity, our custom-speech-to-text docker container takes a long time to reply (10+ seconds vs sub 1 second response times for subsequent invocations).

Here is the docker-compose lines used to run the container:

  msasr:
    image: mcr.microsoft.com/azure-cognitive-services/speechservices/custom-speech-to-text:2.6.0-amd64
    volumes:
       - ./MS_ASR_20201202:/usr/local/models
    restart: always
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4G
        reservations:
          cpus: '4'
          memory: 4G
    command: Eula=accept Billing=${MS_ENDPOINT_URI} ApiKey=${MS_API_KEY}

docker-compose is run with the –compatibility flag to for the memory allocations

This problem appeared after moving to the custom-speech-to-text:2.6.0-amd64 from the cognitive-services-custom-speech-to-text:2.2.0-amd64-preview

I suppose (no proof here) that there are some elements that need to be loaded into ram and are discarded when no in use, but this has a large QOS impact later on. Is there a way to tell the instance to keep them loaded, or re-prime the system?

Source: Docker Questions

LEAVE A COMMENT