Mantis - Resin
Viewing Issue Advanced Details
2555 minor always 03-26-08 09:52 03-27-08 10:31
paulberto  
ferg  
normal  
closed 3.1.5  
fixed  
none    
none 3.1.6  
0002555: Load balancers report keepalive connection max error
Logs are filling up with the following error after upgrading from 3.1.3 to 3.1.5 on production system:

[13:48:18.902] TcpConnection[id=http-192.168.1.1:80-1924,socket=JniSocketImpl$10386758[-1330613472,fd=99],port=Port[192.168.1.1:80]] failed keepalive connection max 505
[13:48:18.912] TcpConnection[id=http-192.168.1.1:80-1857,socket=JniSocketImpl$8624359[156736064,fd=355],port=Port[192.168.1.1:80]] failed keepalive connection max 504
[13:48:18.967] TcpConnection[id=http-192.168.1.1:80-1936,socket=JniSocketImpl$10674590[-1330534616,fd=406],port=Port[192.168.1.1:80]] failed keepalive connection max 503
[13:48:18.972] TcpConnection[id=http-192.168.1.1:80-2139,socket=JniSocketImpl$8182872[156827208,fd=230],port=Port[192.168.1.1:80]] failed keepalive connection max 502
[13:48:18.978] TcpConnection[id=http-192.168.1.1:80-2115,socket=JniSocketImpl$7536165[156731968,fd=148],port=Port[192.168.1.1:80]] failed keepalive connection max 502
[13:48:18.986] TcpConnection[id=http-192.168.1.1:80-2129,socket=JniSocketImpl$19123775[156839496,fd=111],port=Port[192.168.1.1:80]] failed keepalive connection max 501
[13:48:19.084] TcpConnection[id=http-192.168.1.1:80-2046,socket=JniSocketImpl$604400[-1330596576,fd=240],port=Port[192.168.1.1:80]] failed keepalive connection max 498
[13:48:19.196] TcpConnection[id=http-192.168.1.1:80-2143,socket=JniSocketImpl$23482138[156821064,fd=142],port=Port[192.168.1.1:80]] failed keepalive connection max 500
[13:48:19.196] TcpConnection[id=http-192.168.1.1:80-2090,socket=JniSocketImpl$12715534[157210304,fd=146],port=Port[192.168.1.1:80]] failed keepalive connection max 499
[13:48:19.197] TcpConnection[id=http-192.168.1.1:80-1818,socket=JniSocketImpl$655613[156774976,fd=357],port=Port[192.168.1.1:80]] failed keepalive connection max 499
[13:48:19.345] TcpConnection[id=http-192.168.1.1:80-2136,socket=JniSocketImpl$30621423[156830280,fd=409],port=Port[192.168.1.1:80]] failed keepalive connection max 498
[13:48:19.351] TcpConnection[id=http-192.168.1.1:80-2010,socket=JniSocketImpl$11434871[-1330572504,fd=360],port=Port[192.168.1.1:80]] failed keepalive connection max 497
[13:48:19.357] TcpConnection[id=http-192.168.1.1:80-2132,socket=JniSocketImpl$18977449[156836424,fd=468],port=Port[192.168.1.1:80]] failed keepalive connection max 496
[13:48:19.496] TcpConnection[id=http-192.168.1.1:80-2120,socket=JniSocketImpl$30074295[156723776,fd=410],port=Port[192.168.1.1:80]] failed keepalive connection max 501
[13:48:19.498] TcpConnection[id=http-192.168.1.1:80-1620,socket=JniSocketImpl$970294[-1330611936,fd=501],port=Port[192.168.1.1:80]] failed keepalive connection max 500
[13:48:19.499] TcpConnection[id=http-192.168.1.1:80-2142,socket=JniSocketImpl$29947460[156821576,fd=226],port=Port[192.168.1.1:80]] failed keepalive connection max 499
[13:48:19.499] TcpConnection[id=http-192.168.1.1:80-2112,socket=JniSocketImpl$2728006[156739136,fd=189],port=Port[192.168.1.1:80]] failed keepalive connection max 498
[13:48:19.590] TcpConnection[id=http-192.168.1.1:80-1777,socket=JniSocketImpl$27847924[157248704,fd=251],port=Port[192.168.1.1:80]] failed keepalive connection max 498

At the minimum i need to get rid of this error from showing up in the log because it is filling the disk at a rapid pace ... look at the timestamps. keepalive-max is set to 1024 in resin.conf

Please advise!!

Notes
(0002904)
paulberto   
03-26-08 10:07   
... In an attempt to get more info: simulating a keepalive connection to the LB in question results in getting cut off right away -- keepalive requests are not working.

Same simulation works on the nodes that talk to LB so i know that error is suggestive that it hit a limit of sorts (maybe hard limit).

I tried increasing keepalive-max and subsequently increasing it to find if that would have an effect. None.

This is on a very high traffic site - i need a solution to this.
(0002905)
ferg   
03-26-08 10:08   
Check the connection-max in the resin.conf. The default is 512, which is what you're hitting.

When you increase the keepalive-max, you'll also need to increase the connection-max.
(0002907)
paulberto   
03-26-08 11:05   
I have spent last 10 minutes trying to add connection-max .. resin gives config tag errors wherever i put it .. tried server-default, server, http .. EVERYWHERE.. This tag is broken. Please advise!!!!!!
(0002908)
paulberto   
03-26-08 11:10   
also thread-max is set very high .. higher than keepalive-max -- perhaps connection-max has become thread-max ? in which this does not fix the problem either
(0002909)
ferg   
03-26-08 11:26   
The connection-max is only on the <http> or <cluster-port>, e.g.

  <http port="80" connection-max="1024"/>

(i.e. we also need to add connection-max at the <server> level to match the other tags.)

The error message in the log is definitely from the connection-max check.

Not related to the error message or this bug report, but you'll want to double check your file descriptor max. The default on a Linux system, for example, is very low (1024).
(0002910)
paulberto   
03-26-08 11:35   
ok adding it to http worked now ... i had tried this but i had a forgot to remove the previous tag from <server> so it kept giving me the error .. anyway -- this is now set correctly.

But now I get new error:

[15:33:16.181] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.182] Tcp[web1-lb,6600] failed keepalive (select)
[15:33:16.184] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.185] Tcp[web1-lb,5054] failed keepalive (select)
[15:33:16.213] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.218] Tcp[web1-lb,5941] failed keepalive (select)
[15:33:16.236] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.246] Tcp[web1-lb,6792] failed keepalive (select)
[15:33:16.654] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.654] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.654] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.654] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.655] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.655] Tcp[web1-lb,5894] failed keepalive (select)
[15:33:16.655] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.655] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.655] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.656] Tcp[web1-lb,6180] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6695] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6412] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6746] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6742] failed keepalive (select)
[15:33:16.657] Tcp[web1-lb,6728] failed keepalive (select)
[15:33:16.657] Tcp[web1-lb,6812] failed keepalive (select)
[15:33:16.658] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.658] Tcp[web1-lb,6244] failed keepalive (select)

Please advise
(0002911)
paulberto   
03-26-08 13:44   
Please help us -- we are paying customer and our site is suffering right now big time. Hitting the nodes directly results in a very fast page load. Going thru LB is very slow and the logs are filled with the errors as mentioned previously:

[15:33:16.656] Tcp[web1-lb,6412] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6746] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6742] failed keepalive (select)
[15:33:16.657] Tcp[web1-lb,6728] failed keepalive (select)
[15:33:16.657] Tcp[web1-lb,6812] failed keepalive (select)
[15:33:16.658] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.658] Tcp[web1-lb,6244] failed keepalive (select)

....

I tried turning off the select mechanism but that rendered the LB totaly unusable as i am desperate in trying to fix this.

Please advise.

Thanks
(0002912)
ferg   
03-26-08 13:52   
The select configuration is keepalive-select-max in the <server> block.

If you're a licensed customer, you should be using the customer support emails, not bug tracking feedback for support.
(0002913)
paulberto   
03-26-08 14:09   
No where on the website or on the 3 emails we have for purchasing the license does it state support email. Nor on your website.

I should have been clearer in regards to the select mechanism - turning it off makes resin spiral out of control.

You cant comment on the current error we are having ?

[15:33:16.656] Tcp[web1-lb,6412] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6746] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6742] failed keepalive (select)
[15:33:16.657] Tcp[web1-lb,6728] failed keepalive (select)
[15:33:16.657] Tcp[web1-lb,6812] failed keepalive (select)
[15:33:16.658] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.658] Tcp[web1-lb,6244] failed keepalive (select)


      <server-default>
  
      <!-- Maximum number of threads. -->
      <thread-max>2024</thread-max>

      <!-- Configures the socket timeout -->
      <socket-timeout>15s</socket-timeout>

      <!-- Configures the keepalive -->
      <keepalive-max>1500</keepalive-max>
      <keepalive-timeout>5s</keepalive-timeout>
<keepalive-select-enable>true</keepalive-select-enable>


      </server-default>

      <server id="web1-lb" address="192.168.1.1" port="6700" >
        <http host="192.168.1.1" port="80" connection-max="5024" />
      </server>


I've been tweaking all these settings in an attempt to fix the issue.
(0002914)
ferg   
03-26-08 14:14   
<server>
  <keepalive-select-max>1024</keepalive-select-max>
(0002916)
paulberto   
03-26-08 14:43   
THANK YOU FERG! that worked wonderfully and we're back to normal.
(0002925)
ferg   
03-27-08 10:31   
server/2792