Should micro-services kill themselves?
With the rising use of containers and orchestrators, we can revise the way micro-services choose to maintain their running state. Health checks play an essential role in service orchestration. Detection of an unhealthy container/instance allows the orchestrator to order a replacement. Consider now that a service is in a state where some execution path is constantly failing. While monitoring and logging is still important, a container can choose to perform "harakiri" and let the health check and orchestrator pick up and provision a new healthy replica. Consider for example a service that performs some kind of init and transitions from "before init" state to "after init" state. This transition could possibly hiccup (a connection setup, mount, etc.). If an instance of the service decides to perform "harakiri", it would allow a new instance a chance for a clean transition where the first didn't.
But shouldn't health check detect these sick containers?
Health checks should remain simple and detect "Dead" containers. "Dead" should be based on the service type (response time, response type). "Dead" should not be defined by the implementation details of the service as this will form a tight coupling between the service implementation and the health check. In relation to the illustration above- the health check should not analyze whether a one-legged crutching cow should be categorized as healthy or not. Instead, the container should use its private information, and decide when it is sick enough to pull the plug on himself.
- Obviously, this "harakiri" pattern is only applicable to services that are scaled to an extent, that if a single instance decides to terminate- no adverse effect will be noticed while the orchestrator provisions a replacement. A strategy pattern algorithm implementing the ideas above is a work in progress. Suggestions and ideas are most welcomed.
- The 'state' mentioned above differs from the state in "stateless services". The latter refers to the state of interaction between the consumer and service, with 'stateless' allowing to easily balance the load across multiple instances. The 'state' type in this post refers to the inner state of service, which is unrelated to the former type. If ALL your services do not hold any internal state, then your stack must be entirely based on lambda serverless computing, and is way cooler than this post :)