I’m working on moving data from Postgres DB to AWS Redshift using Python3.7. I created an SQL query that retrieves a query set when executed (I’m using where clause to query. So for every query execution, I’ll be changing the ID that I’m passing).
I’m going to run this script in Flask docker container which will be ultimately run on Kubernetes.
I have exposed an POST method enpoint in dockerized Flask app on which I’ll receiving a list of IDs that needs to be queried on DB and data to be moved to Redshift using Python.
I want to do multithreading for executing multiple queries at once and moving data as there could be lot of IDs in the POST request that I receive.
But, as I’m using Python3.7, I came to know that GIL is going to be a bottleneck and it doesn’t matter if how many threads you are running, and there will be only one thread executing at any time.
How do I overcome this problem and make the parallel execution of SQL queries on DB possible and that finally works on Kubernetes.
Can I go with multiprocessing or is there any other better way to achieve this?