0005286: Threads can become 'stuck' forever

Mantis - Resin
Viewing Issue Advanced Details

ID:	Category:	Severity:	Reproducibility:	Date Submitted:	Last Update:
5286		block	random	11-21-12 02:33	01-09-13 12:10

Reporter:	paul	Platform:
Assigned To:	ferg	OS:
Priority:	normal	OS Version:
Status:	closed	Product Version:	3.1.9
Product Build:		Resolution:	fixed
Projection:	none
ETA:	none	Fixed in Version:	4.0.32

Summary:	0005286: Threads can become 'stuck' forever
Description:	Noticed a few times that the thread count on a server (in a 3 server cluster) can become 'stuck'. Looking through the resin.log reveals the error message [2012-11-09 04:56:41.419] java.io.IOException: failed to add EPOLL for fd=512 [2012-11-09 04:56:41.419] [2012-11-09 04:56:41.419] at com.caucho.server.port.JniSelectManager.removeNative(Native Method) [2012-11-09 04:56:41.419] at com.caucho.server.port.JniSelectManager.remove(JniSelectManager.java:476) [2012-11-09 04:56:41.419] at com.caucho.server.port.JniSelectManager.run(JniSelectManager.java:376) [2012-11-09 04:56:41.419] at java.lang.Thread.run(Thread.java:662) Taking a Threaddump reveals a lot of threads (up to 200 sometimes) all with a stacktrace similar to "hmux-192.168.0.3:6800-20399$1859226681" - Thread t@40 java.lang.Thread.State: RUNNABLE at com.caucho.server.port.JniSelectManager.addNative(Native Method) at com.caucho.server.port.JniSelectManager.keepalive(JniSelectManager.java:229) at com.caucho.server.port.TcpConnection.keepalive(TcpConnection.java:448) at com.caucho.server.port.TcpConnection.run(TcpConnection.java:739) at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:743) at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:662) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None Looking at the native code, guessing that the failure causes the method to exit without first releasing the mutex that it took, and thus leaves any subsequent callers stuck waiting for the same mutex to become available.
Steps To Reproduce:
Additional Information:	Checked the JNI src for 3.1.9Pro, 3.1.12Pro and 3.1.13Pro and all seem to be the same.
Relationships
Attached Files:

There are no notes attached to this issue.