nodejs web application in k8s gets OOM

  docker, fastify, kubernetes, nestjs, node.js

I’m running a nestjs web application implemented with fastify on kubernetes.

I split my application into Multi Zones, and deploy it into different pyhsical location k8s clusters (Cluster A & Cluster B).

Everything gose well, except the Zone X in Culster A which has the maximum traffic during all zones.
( Here is a 2-Day metrics dashboard for Zone X during normal time )

2-Day Metrics During Normal Time

The problem only happens on the Zone X in Cluster A and never happens on any other zones or clusters.

At first some 499 responses appear in Cluster A‘s Ingress Dashboard, and soon the memory of pods suddenly expand to the memory limit one pod after another.

Metrcis During Abnormal Time

It seems that the 499 status is caused by pods not sending responses to the outer.

At the same time, other zones in Cluster A work normally.

For avoiding influencing users, I switch all network traffic to Cluster B and everything work properly, Which excludes causing by dirty data.

I tried to kill and redeploy all pods of Zone X in Cluster A, but when I switch traffic back to Cluster A, the problem occurs again. But after waitting for 2-3 hours and then swith back the traffic, the problems disappers!

Since I don’t konow how comes, only thing I can do is switching traffic and check is everything back to normal.

I’ve tried multiple variations of node memory issues, but none of them seems to cause this problem. Any ideas or inspirations of this problem?

Name Version
nestjs v6.1.1
fastify v2.11.0
Docker Image node:12-alpine(v12.18.3)
Ingress v0.30.0
Kubernetes v1.18.12

kernal dmesg log

Source: Docker Questions