Mantis Bugtracker
  

Viewing Issue Advanced Details Jump to Notes ] View Simple ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0002555 [Resin] minor always 03-26-08 09:52 03-27-08 10:31
Reporter paulberto View Status public  
Assigned To ferg
Priority normal Resolution fixed Platform
Status closed   OS
Projection none   OS Version
ETA none Fixed in Version 3.1.6 Product Version 3.1.5
  Product Build
Summary 0002555: Load balancers report keepalive connection max error
Description Logs are filling up with the following error after upgrading from 3.1.3 to 3.1.5 on production system:

[13:48:18.902] TcpConnection[id=http-192.168.1.1:80-1924,socket=JniSocketImpl$10386758[-1330613472,fd=99],port=Port[192.168.1.1:80]] failed keepalive connection max 505
[13:48:18.912] TcpConnection[id=http-192.168.1.1:80-1857,socket=JniSocketImpl$8624359[156736064,fd=355],port=Port[192.168.1.1:80]] failed keepalive connection max 504
[13:48:18.967] TcpConnection[id=http-192.168.1.1:80-1936,socket=JniSocketImpl$10674590[-1330534616,fd=406],port=Port[192.168.1.1:80]] failed keepalive connection max 503
[13:48:18.972] TcpConnection[id=http-192.168.1.1:80-2139,socket=JniSocketImpl$8182872[156827208,fd=230],port=Port[192.168.1.1:80]] failed keepalive connection max 502
[13:48:18.978] TcpConnection[id=http-192.168.1.1:80-2115,socket=JniSocketImpl$7536165[156731968,fd=148],port=Port[192.168.1.1:80]] failed keepalive connection max 502
[13:48:18.986] TcpConnection[id=http-192.168.1.1:80-2129,socket=JniSocketImpl$19123775[156839496,fd=111],port=Port[192.168.1.1:80]] failed keepalive connection max 501
[13:48:19.084] TcpConnection[id=http-192.168.1.1:80-2046,socket=JniSocketImpl$604400[-1330596576,fd=240],port=Port[192.168.1.1:80]] failed keepalive connection max 498
[13:48:19.196] TcpConnection[id=http-192.168.1.1:80-2143,socket=JniSocketImpl$23482138[156821064,fd=142],port=Port[192.168.1.1:80]] failed keepalive connection max 500
[13:48:19.196] TcpConnection[id=http-192.168.1.1:80-2090,socket=JniSocketImpl$12715534[157210304,fd=146],port=Port[192.168.1.1:80]] failed keepalive connection max 499
[13:48:19.197] TcpConnection[id=http-192.168.1.1:80-1818,socket=JniSocketImpl$655613[156774976,fd=357],port=Port[192.168.1.1:80]] failed keepalive connection max 499
[13:48:19.345] TcpConnection[id=http-192.168.1.1:80-2136,socket=JniSocketImpl$30621423[156830280,fd=409],port=Port[192.168.1.1:80]] failed keepalive connection max 498
[13:48:19.351] TcpConnection[id=http-192.168.1.1:80-2010,socket=JniSocketImpl$11434871[-1330572504,fd=360],port=Port[192.168.1.1:80]] failed keepalive connection max 497
[13:48:19.357] TcpConnection[id=http-192.168.1.1:80-2132,socket=JniSocketImpl$18977449[156836424,fd=468],port=Port[192.168.1.1:80]] failed keepalive connection max 496
[13:48:19.496] TcpConnection[id=http-192.168.1.1:80-2120,socket=JniSocketImpl$30074295[156723776,fd=410],port=Port[192.168.1.1:80]] failed keepalive connection max 501
[13:48:19.498] TcpConnection[id=http-192.168.1.1:80-1620,socket=JniSocketImpl$970294[-1330611936,fd=501],port=Port[192.168.1.1:80]] failed keepalive connection max 500
[13:48:19.499] TcpConnection[id=http-192.168.1.1:80-2142,socket=JniSocketImpl$29947460[156821576,fd=226],port=Port[192.168.1.1:80]] failed keepalive connection max 499
[13:48:19.499] TcpConnection[id=http-192.168.1.1:80-2112,socket=JniSocketImpl$2728006[156739136,fd=189],port=Port[192.168.1.1:80]] failed keepalive connection max 498
[13:48:19.590] TcpConnection[id=http-192.168.1.1:80-1777,socket=JniSocketImpl$27847924[157248704,fd=251],port=Port[192.168.1.1:80]] failed keepalive connection max 498

Steps To Reproduce
Additional Information At the minimum i need to get rid of this error from showing up in the log because it is filling the disk at a rapid pace ... look at the timestamps. keepalive-max is set to 1024 in resin.conf

Please advise!!
Attached Files

- Relationships

- Notes
(0002904)
paulberto
03-26-08 10:07

... In an attempt to get more info: simulating a keepalive connection to the LB in question results in getting cut off right away -- keepalive requests are not working.

Same simulation works on the nodes that talk to LB so i know that error is suggestive that it hit a limit of sorts (maybe hard limit).

I tried increasing keepalive-max and subsequently increasing it to find if that would have an effect. None.

This is on a very high traffic site - i need a solution to this.
 
(0002905)
ferg
03-26-08 10:08

Check the connection-max in the resin.conf. The default is 512, which is what you're hitting.

When you increase the keepalive-max, you'll also need to increase the connection-max.
 
(0002907)
paulberto
03-26-08 11:05

I have spent last 10 minutes trying to add connection-max .. resin gives config tag errors wherever i put it .. tried server-default, server, http .. EVERYWHERE.. This tag is broken. Please advise!!!!!!
 
(0002908)
paulberto
03-26-08 11:10

also thread-max is set very high .. higher than keepalive-max -- perhaps connection-max has become thread-max ? in which this does not fix the problem either
 
(0002909)
ferg
03-26-08 11:26

The connection-max is only on the <http> or <cluster-port>, e.g.

  <http port="80" connection-max="1024"/>

(i.e. we also need to add connection-max at the <server> level to match the other tags.)

The error message in the log is definitely from the connection-max check.

Not related to the error message or this bug report, but you'll want to double check your file descriptor max. The default on a Linux system, for example, is very low (1024).
 
(0002910)
paulberto
03-26-08 11:35

ok adding it to http worked now ... i had tried this but i had a forgot to remove the previous tag from <server> so it kept giving me the error .. anyway -- this is now set correctly.

But now I get new error:

[15:33:16.181] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.182] Tcp[web1-lb,6600] failed keepalive (select)
[15:33:16.184] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.185] Tcp[web1-lb,5054] failed keepalive (select)
[15:33:16.213] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.218] Tcp[web1-lb,5941] failed keepalive (select)
[15:33:16.236] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.246] Tcp[web1-lb,6792] failed keepalive (select)
[15:33:16.654] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.654] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.654] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.654] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.655] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.655] Tcp[web1-lb,5894] failed keepalive (select)
[15:33:16.655] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.655] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.655] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.656] Tcp[web1-lb,6180] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6695] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6412] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6746] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6742] failed keepalive (select)
[15:33:16.657] Tcp[web1-lb,6728] failed keepalive (select)
[15:33:16.657] Tcp[web1-lb,6812] failed keepalive (select)
[15:33:16.658] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.658] Tcp[web1-lb,6244] failed keepalive (select)

Please advise
 
(0002911)
paulberto
03-26-08 13:44

Please help us -- we are paying customer and our site is suffering right now big time. Hitting the nodes directly results in a very fast page load. Going thru LB is very slow and the logs are filled with the errors as mentioned previously:

[15:33:16.656] Tcp[web1-lb,6412] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6746] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6742] failed keepalive (select)
[15:33:16.657] Tcp[web1-lb,6728] failed keepalive (select)
[15:33:16.657] Tcp[web1-lb,6812] failed keepalive (select)
[15:33:16.658] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.658] Tcp[web1-lb,6244] failed keepalive (select)

....

I tried turning off the select mechanism but that rendered the LB totaly unusable as i am desperate in trying to fix this.

Please advise.

Thanks
 
(0002912)
ferg
03-26-08 13:52

The select configuration is keepalive-select-max in the <server> block.

If you're a licensed customer, you should be using the customer support emails, not bug tracking feedback for support.
 
(0002913)
paulberto
03-26-08 14:09

No where on the website or on the 3 emails we have for purchasing the license does it state support email. Nor on your website.

I should have been clearer in regards to the select mechanism - turning it off makes resin spiral out of control.

You cant comment on the current error we are having ?

[15:33:16.656] Tcp[web1-lb,6412] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6746] failed keepalive (select)
[15:33:16.656] Tcp[web1-lb,6742] failed keepalive (select)
[15:33:16.657] Tcp[web1-lb,6728] failed keepalive (select)
[15:33:16.657] Tcp[web1-lb,6812] failed keepalive (select)
[15:33:16.658] JniSelectManager[] keepalive overflow 512 max=512
[15:33:16.658] Tcp[web1-lb,6244] failed keepalive (select)


      <server-default>
  
      <!-- Maximum number of threads. -->
      <thread-max>2024</thread-max>

      <!-- Configures the socket timeout -->
      <socket-timeout>15s</socket-timeout>

      <!-- Configures the keepalive -->
      <keepalive-max>1500</keepalive-max>
      <keepalive-timeout>5s</keepalive-timeout>
<keepalive-select-enable>true</keepalive-select-enable>


      </server-default>

      <server id="web1-lb" address="192.168.1.1" port="6700" >
        <http host="192.168.1.1" port="80" connection-max="5024" />
      </server>


I've been tweaking all these settings in an attempt to fix the issue.
 
(0002914)
ferg
03-26-08 14:14

<server>
  <keepalive-select-max>1024</keepalive-select-max>
 
(0002916)
paulberto
03-26-08 14:43

THANK YOU FERG! that worked wonderfully and we're back to normal.
 
(0002925)
ferg
03-27-08 10:31

server/2792
 

- Issue History
Date Modified Username Field Change
03-26-08 09:52 paulberto New Issue
03-26-08 10:07 paulberto Note Added: 0002904
03-26-08 10:08 ferg Note Added: 0002905
03-26-08 11:05 paulberto Note Added: 0002907
03-26-08 11:10 paulberto Note Added: 0002908
03-26-08 11:26 ferg Note Added: 0002909
03-26-08 11:35 paulberto Note Added: 0002910
03-26-08 13:44 paulberto Note Added: 0002911
03-26-08 13:52 ferg Note Added: 0002912
03-26-08 14:09 paulberto Note Added: 0002913
03-26-08 14:14 ferg Note Added: 0002914
03-26-08 14:43 paulberto Note Added: 0002916
03-27-08 10:31 ferg Note Added: 0002925
03-27-08 10:31 ferg Assigned To  => ferg
03-27-08 10:31 ferg Status new => closed
03-27-08 10:31 ferg Resolution open => fixed
03-27-08 10:31 ferg Fixed in Version  => 3.1.6
03-27-08 10:31 ferg Description Updated
03-27-08 10:31 ferg Additional Information Updated


Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
53 total queries executed.
37 unique queries executed.
Powered by Mantis Bugtracker