I’m working on a machine learning application in python that will be contained in a docker on a kubernetes cluster. In this implementation I’ll be using Google Cloud Platform.
A user will access the endpoint with a post command sending in the data they want to test against the model. The data could come in as 10k requests with one row of data per request or it could come in as one block of 10k rows. The api should be able to handle it either way with a small difference in performance.
For the 10k requests sent row by row the built in load balancing and spinning up more instances of the docker will take care of the workload just fine.
I’m not sure how to handle the request that sends 10k rows at once. Is there some way to easily share one large request with other instances of the docker and then combine the result to send back to the user?
Source: Docker Questions