Mantis Bugtracker
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0003877 [Resin] minor always 02-04-10 18:00 02-22-10 12:00
Reporter alex View Status public  
Assigned To ferg
Priority normal Resolution fixed  
Status closed   Product Version 4.0.3
Summary 0003877: Uneven distribution of requests across a cluster with dead nodes
Description Configuration:
  Mac OS X, dual CPU
  cluster: a, b, c, d, e, f, g
  inactive-nodes: a, d
  apache: 2.2.14
     11 processes started
  10000 requests issued

Expected Results ? even distribution( 2000 requests each)

Actual Results:

a 0 ? node is down
b 2799
c 1439
d 0 - node is down
e 2895
f 1456
g 1411
Additional Information
Attached Files

- Relationships

- Notes
(0004415)
alex
02-04-10 18:07
edited on: 02-04-10 18:10

It appears as each thread/process has a copy of cluster and active_count on a particular srun never tracks total active_socket counts.

With the cost at 0 for every one of sruns nodes following the failed nodes get selected at a rate proportional to the number of the preceding dead nodes.

with a and b down, server c gets 'a's and 'b's share serving triple the load
a 0
b 0
c 4298
d 1444
e 1424
f 1413
g 1421

 
(0004418)
ferg
02-05-10 09:03

The backup calculation was using the old 3.1 session encoding, and needed to be updated to the 4.0 encoding.
 
(0004421)
alex
02-09-10 09:34

Retested the case with build off the trunk:
debian-5-64-bit
apache 2.2.14

The problem appears to be in select_host code where active_sockets invariably equal 0, so all server have equal cost, therefore next node after the failed takes their load.

a 0
b 0
c 2164
d 726
e 713
f 720
g 724
 
(0004446)
alex
02-22-10 12:00

fix verified with resin 4.0.4 and resin 3.1.10
 

- Issue History
Date Modified Username Field Change
02-04-10 18:00 alex New Issue
02-04-10 18:07 alex Note Added: 0004415
02-04-10 18:07 alex Note Edited: 0004415
02-04-10 18:08 alex Note Edited: 0004415
02-04-10 18:10 alex Note Edited: 0004415
02-05-10 09:03 ferg Note Added: 0004418
02-05-10 09:03 ferg Assigned To  => ferg
02-05-10 09:03 ferg Status new => closed
02-05-10 09:03 ferg Resolution open => fixed
02-05-10 09:03 ferg Fixed in Version  => 4.0.4
02-09-10 09:34 alex Status closed => feedback
02-09-10 09:34 alex Resolution fixed => reopened
02-09-10 09:34 alex Note Added: 0004421
02-22-10 12:00 alex Status feedback => closed
02-22-10 12:00 alex Note Added: 0004446
02-22-10 12:00 alex Resolution reopened => fixed


Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
38 total queries executed.
29 unique queries executed.
Powered by Mantis Bugtracker