Mantis - Resin
Viewing Issue Advanced Details
3645 minor always 08-21-09 13:44 08-24-09 10:53
closed 3.1.9  
none 4.0.2  
0003645: syn_recv between mod_caucho and Resin
(rep by Daniel WIgenfors)

After a couple of hours, around noon, suddently all threads in apache where eaten up, and when checking the caucho-status page, many of the resin-instanses where marked red and unavailable. When using telnet, to try to access port 6801 of the resin-servers, alot of times, there was no response, just timeout, just as mod_caucho would have seen it probably.

We have a http port, 8081, configured on the resin-instances, and it was possible to reach that port, and access the /resin-status page, there we could see that the instance was more or less idling, with a minimal number of active threads. When doing a thread-dump of the instance, it looks like no threads are running, they are just waiting for new connections.

a netstat on the resin-machine revelead alot of (up to around 1000) connections in SYN_RECV, all from the webserver (web64). We first re-compiled mod caucho and lowered the connect timout back to the default 2s, and restarted everything, after a while, some of the resin servers startet to be unresponsive again, and the number of SYN_RECV connections increased. We then rolled back to the 3.1.8 mod_caucho with the default connect timeout.
After a while we suffered from the same problems again, and we decieded to roll back to resin 3.1.8 on the resin-servers as well.
This seems to have fixed the problems, as we havent seen any more of these since the application-servers where restarted with 3.1.8 again.

The closest conclution would be to suspect that there is something wrong with the communication between resin and mod_cacucho in 3.1.9. Have you had any other reports with this kind of problem?

08-24-09 10:53   
Issue is related to a low thread-max (200) with a heavy load using all threads.