Random error on external Oracle database connection with Kubernetes

After month of research, we are here, hoping for someone to have a insight about these issue:
On a GKE cluster, our pods (node.JS) are having trouble connecting to our external oracle business database.

To be more precise, ~70% of our connection tentative are ending in error:

ORA-12545: Connect failed because target host or object does not exist

The 30% left are working well, and doesn’t reset or end prematurely. Once it’s connected, it’s all good from here.

Our stack:

  • Our flux are handed by containers based on a node:12.15.0-slim image, at which we add LIBAIO1 and a instant oracle client (v12.2). We use oracleDB v5.0.0 as node module
  • We use cron job pod handling our node container, in a clusterIP service on a GKE cluster (1.16.15-gke.4300).
  • Our external oracle database in on a private network (which our cluster have access), in a Oracle Database 10g Enterprise Edition Release – 64bi version, behind a load balancer

I can give more detail if needed.

What we have already tried:

  • We have tried to pass directly on the database, cutting off the load balancer: no effect
  • We had cron job pod doing ping each min on the database server for a day: no error, although flux pod somehow encounter the ORA-12545 error
  • We redo all our code, connecting differently to the database and making update for our node module oracledb (v4 to v5): no effect
  • We tried to monitore the load up over the oracle database and take action spreading our flux over all night instead of a 1 hour window: no effect
  • We had our own kubernetes cluster before GKE, directly in our private network, causing the exactly same error.
  • We had a audit by some expert on kubernetes, without them finding the issue or seeing a critical issue over our cluster/k8s configuration

What works:

  • All our pods, some requesting into mySql database, micro service, web front, are all working fine.
  • All our business tool (dozen of, including Talend and some custom software) are using the oracle database without issue.
  • Our own flux handling node container are working fine with the oracle database as long they are into a docker env, and not a kube one.

To resume: We have a mysterious issue when trying to connect to an oracle database from a kubernetes env, where pods are randomly unable to reach the database

We are looking for any hint we can have

Source: Docker Questions