Mantis - Resin
Viewing Issue Advanced Details
3528 minor always 05-21-09 18:23 06-08-09 19:09
ferg  
ferg  
normal  
closed 3.1.9  
fixed  
none    
none 4.0.1  
0003528: watchdog/httpd issues
(rep by Rob Lockstone)


It looks like this is another manifestation of the same issue with the watchdog process not properly shutting things down prior to a new httpd.exe process starting up. I've been playing around with this, watching the windows task manager on the machine running resin as I remotely (although it could be done locally as well) stop/query/start the resin windows service using the service controller (sc) command.

It looks like the problem is that the windows service only cares about the httpd.exe process, but the watchdog process is embedded within java itself and runs within the javaw process. There doesn't seem to be any communication between httpd and java during shutdowns, although interestingly, there is a java.exe process that gets spawned whenever httpd.exe is started/stopped, but that process is very short-lived.

Here's what I see happening:

1. sc \\machine stop resin

The httpd.exe process exits pretty quickly, within a second or two.

2. sc \\machine query resin

Once the httpd.exe process exits, the query returns "STOPPED" because, as far as the service controller is concerned, the service is stopped because all it's tied to the httpd.exe process.

3. Meanwhile, depending on what java is doing, the javaw.exe process (actually two of them, since one is the resin-admin process) continues to run. In the case of the Thread.sleep(300000); jsp page, the javaw.exe processes continue to run for up to 30 seconds before they finally disappear.

4. If the sc \\machine start resin command is issued while the javaw.exe processes are still running, the new httpd.exe process will start but won't be able to actually start java due to the watchdog IllegalStateException, as noted here in the watchdog-manager.log:

[2009/05/21 16:15:49.182] java.lang.IllegalStateException: Can't start new task because of old task 'WatchdogTask[Watchdog[]]'
[2009/05/21 16:15:49.182] at com.caucho.boot.Watchdog.start(Watchdog.java:296)
[2009/05/21 16:15:49.182] at com.caucho.boot.WatchdogManager.startServer(WatchdogManager.java:295)
[2009/05/21 16:15:49.182] at com.caucho.boot.WatchdogServlet.start(WatchdogServlet.java:75)
[2009/05/21 16:15:49.182] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2009/05/21 16:15:49.182] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[2009/05/21 16:15:49.182] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[2009/05/21 16:15:49.182] at java.lang.reflect.Method.invoke(Method.java:585)
[2009/05/21 16:15:49.182] at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:180)
[2009/05/21 16:15:49.182] at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:109)
[2009/05/21 16:15:49.182] at com.caucho.hessian.server.HessianServlet.service(HessianServlet.java:396)
[2009/05/21 16:15:49.182] at com.caucho.server.dispatch.ServletFilterChain.doFilter(ServletFilterChain.java:103)
[2009/05/21 16:15:49.182] at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:187)
[2009/05/21 16:15:49.182] at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:265)
[2009/05/21 16:15:49.182] at com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:436)
[2009/05/21 16:15:49.182] at com.caucho.server.port.TcpConnection.run(TcpConnection.java:682)
[2009/05/21 16:15:49.182] at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:743)
[2009/05/21 16:15:49.182] at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:662)
[2009/05/21 16:15:49.182] at java.lang.Thread.run(Thread.java:595)


TEMP SOLUTION

If I set the shutdown-max-wait timeout to a very low value, say 3s, then my experiments show that the javaw.exe processes do exit within < 5 seconds of the httpd.exe process going away. So I'm going to build in a delay of 10s in my shutdown/restart routine. Can you tell me if there are any inherent problems in making the shutdown time such a low value?

I think the real solution has to involve setting up some kind of communication channel between httpd.exe and the watchdog so that httpd.exe doesn't exit until the javaw process(es) really exit. As it stands now, httpd.exe seems to be completely disconnected from the running javaw processes. Yes, the watchdog does eventually stop java from running, but httpd is allowed to exit independent of the watchdog.


Notes
(0004067)
ferg   
06-08-09 19:09   
changed Resin/Watchdog communication to use BAM/HMTP service. Now the watchdog will send a stop message to Resin and wait for the result.