After month of research, we are here, hoping for someone to have a insight about these issue:
On a GKE cluster, our pods (node.JS) are having trouble connecting to our external oracle business database.
To be more precise, ~70% of our connection tentative are ending in error:
ORA-12545: Connect failed because target host or object does not exist
The 30% left are working well, and doesn’t reset or end prematurely. Once it’s connected, it’s all good from here.
- Our flux are handed by containers based on a node:12.15.0-slim image, at which we add LIBAIO1 and a instant oracle client (v12.2). We use oracleDB v5.0.0 as node module
- We use cron job pod handling our node container, in a clusterIP service on a GKE cluster (1.16.15-gke.4300).
- Our external oracle database in on a private network (which our cluster have access), in a Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bi version, behind a load balancer
I can give more detail if needed.
What we have already tried:
- We have tried to pass directly on the database, cutting off the load balancer: no effect
- We had cron job pod doing ping each min on the database server for a day: no error, although flux pod somehow encounter the ORA-12545 error
- We redo all our code, connecting differently to the database and making update for our node module oracledb (v4 to v5): no effect
- We tried to monitore the load up over the oracle database and take action spreading our flux over all night instead of a 1 hour window: no effect
- We had our own kubernetes cluster before GKE, directly in our private network, causing the exactly same error.
- We had a audit by some expert on kubernetes, without them finding the issue or seeing a critical issue over our cluster/k8s configuration
- All our pods, some requesting into mySql database, micro service, web front, are all working fine.
- All our business tool (dozen of, including Talend and some custom software) are using the oracle database without issue.
- Our own flux handling node container are working fine with the oracle database as long they are into a docker env, and not a kube one.
To resume: We have a mysterious issue when trying to connect to an oracle database from a kubernetes env, where pods are randomly unable to reach the database
We are looking for any hint we can have
Source: Docker Questions