Mantis - Resin
Viewing Issue Advanced Details
309 minor always 07-13-05 00:00 11-30-05 14:43
stefanp  
 
immediate  
closed 2.1.x  
2.1.x fixed  
none    
none 2.1.x  
0000309: ServletServer.restart(): bad synchronization leads to broken server
RSN-352
Hi,

This bug causes resin to end up in a unconfigured state if a class change is made while the server is under (even small) load.

The visible effect is that some vhosts cannot be found, and resin starts serving the contents of it's RESIN_HOME, and 404 errors for everything else.

To reproduce, start two simple shell scripts that request a page in a loop, and print ok or error depending on status (200/404). Cause a restart (in my case I change the .class file of a global resource). Prior to 2.1.14 (and in 2.1.14 without the problematic ServletServer patch), resin will lock for a bit, eventually serve a few 500 errors, but will end up serving the good files once the restart is complete. In 2.1.14, resin will start serving 404 errors.

(Some Vhost configuration seems to disappear in the process:
  [13:11:27.799] java.lang.NullPointerException
        at com.caucho.server.http.ServletServer.getHost(ServletServer.java:1362)
        at com.caucho.server.http.ServletServer.getHost(ServletServer.java:1339)
        at com.caucho.server.http.ServletServer.getInvocation(ServletServer.java:1244)
        at com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:250)
        at com.caucho.server.http.HttpRequest.handleConnection(HttpRequest.java:170)
        at com.caucho.server.TcpConnection.run(TcpConnection.java:139)
        at java.lang.Thread.run(Thread.java:595)
)

Note that in our case, the problem is easy to reproduce since restarting takes a long time (approx 30s): 15 vhosts with 4 web apps each, lots of resources that need DB interaction to initialize in each vhost, etc... So the chance of having two requests enter restart() in parallel is higher than with a setup that'll restart more quickly.


This bug appeared in 2.1.14, and is due to the following change:

diff -r resin-2.1.13/src/com/caucho/server/http/ServletServer.java resin-2.1.14/src/com/caucho/server/http/ServletServer.java
1888c1888
< synchronized void restart()
---
> void restart()
1890,1891c1890,1892
< if (! isModifiedFull() || ! _isInitComplete || _isInitStarted)
< return;
---
> synchronized (this) {
> if (! isModifiedFull() || ! _isInitComplete || _isInitStarted)
> return;
1893,1895c1894,1897
< _isInitStarted = true;
< _isInitComplete = false;
< _configException = null;
---
> _isInitStarted = true;
> _isInitComplete = false;
> _configException = null;
> }

If the restart method is made synchronized again, the bug disappears.

Stefan
Linux. Pretty complex configuration, lots of vhosts and webapps.

Notes
(0000350)
ferg   
07-13-05 00:00   
Fixed in the snapshot. The proposed solution of setting the method as synchronized doesn't work because that reintroduces a deadlock.