Mantis - Resin
Viewing Issue Advanced Details
61 minor always 03-30-05 00:00 01-25-06 11:28
sam  
ferg  
normal  
closed  
3.0.12 fixed  
none    
none 3.0.18  
0000061: wrapper.pl kill enhancement
RSN-52
(rep by B Bernstein)

We've recently found a small problem with the "stop" part of
wrapper.pl, and I'd like to see if there are any suggestions on how to
deal with it.

We found that a third party component that we were using in a servlet
was freezing up our webapp. Of course we're working on getting that
corrected, but in the meantime, it became impossible to kill the java
resin process while it was in that hung state.

We use the Perl wrapper to launch Resin, so when we sent the "stop"
command, it ran it's normal code which kills the java child process. In
our case, it was timing out after 60 seconds, and then we had the
orphan process sitting around after that. We then needed to manually
"kill -9" that orphan.

What I'd really like to have it do as a fallback (if kill fails) is to
then call:

kill(-9, $child)

Alternatively, have the wrapper return an error if the kill failed, so
that our script can do something about it.

Our script that calls wrapper to stop or restart the process does it's
own checking and waits until the procid goes away, but it only knows
about the wrapper's procid, so when the wrapper fails to kill it's
child, it just exits and never lets any external processes know that it
failed. So, our calling script thinks it was successful.

We're using resin 2.1.x, but I see that the same code is in 3.0.x
versions of the perl wrapper.

Here is something like what I have in mind for that code excerpt (not
tested):

    if ($child > 0) {
        $def_kill_time = $kill_time;

        # let it die gracefully in 60 seconds
        while ($kill_time-- > 0 and kill(0, $child)) {
            sleep(1);
        }

        if ($kill_time <= 0) {
            print("Resin proc $child didn't die, trying kill -$child");
            # timed-out, try again with -$child
            $kill_time = $def_kill_time;
            while ($kill_time-- > 0 and kill(-$child)) {
               sleep(1);
            }
        }

        if ($kill_time <= 0) {
            print("Resin proc $child STILL didn't die, trying kill -9
$child");
            kill(-9, $child);
        }
    }


Unix

Notes
(0000064)
user98   
03-30-05 00:00   
It would also be good if the shutdown sequence would block the invoking shell
until the process has died. The current practice for our operations staff is
either:

   A: To type "httpd.sh stop" followed by a lot of "ps f -A | grep java"
      to find out when the JVM has shut down.

   B: Write an elaborate shell script that finds all the child pids of
      wrapper.pl and loop through killing them ountil they're all gone.
(0000065)
sam   
03-30-05 00:00   
(rep by B Bernstein)

I agree, that it would be nice if this happened directly in wrapper.
Currently we use our own elaborate script for controlling resin,
including a loop that watches the resin pid after sending a stop. But
since it's possible for the parent to die and leave orphan children
that script was failing for us in some situations. The pid would die if
the wrapper timed out after 60 seconds.

We would prefer to have our
controller just hang indefinitely so that we know that there's a
problem rather than have it exit neatly after 60 seconds only to find
out that we have orphan processes around eating resources. Of course
we'd really prefer that after the timeout, the wrapper would then try
an increasing level of kill (whatever that would be) followed
eventually by kill -9.