Description |
(rep by Kevin MacClay)
We are experiencing a problem where the Resin front end load balancer is failing to detect that one of the back-end Resin instances is not processing requests. The problem is that the back-end Resin instance is accepting connections but does not actually process the request. Since the front-end LB does not mark the back-end instance down, it continues directing traffic there. Users who are directed to the ?bad? instance are frozen up.
We are currently using Resin 3.0.14 for this application. I noticed Resin 3.0.20 has a ?client-connect-timeout? enhancement to the LB. But I?m not sure that this would help either since a connection can be established to the back-end instance, it just hangs after that.
In addition, we would also like to create a Nagios monitoring script that would be able to notify us if a back-end instance enters this state. Do you have any recommendations on how to test whether a back-end ?srun? instance is accepting _and handling_ requests?
|