Mantis Bugtracker
  

Viewing Issue Advanced Details Jump to Notes ] View Simple ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0003877 [Resin] minor always 02-04-10 18:00 02-22-10 12:00
Reporter alex View Status public  
Assigned To ferg
Priority normal Resolution fixed Platform
Status closed   OS
Projection none   OS Version
ETA none Fixed in Version 4.0.4 Product Version 4.0.3
  Product Build
Summary 0003877: Uneven distribution of requests across a cluster with dead nodes
Description Configuration:
  Mac OS X, dual CPU
  cluster: a, b, c, d, e, f, g
  inactive-nodes: a, d
  apache: 2.2.14
     11 processes started
  10000 requests issued

Expected Results ? even distribution( 2000 requests each)

Actual Results:

a 0 ? node is down
b 2799
c 1439
d 0 - node is down
e 2895
f 1456
g 1411
Steps To Reproduce
Additional Information
Attached Files

- Relationships

- Notes
(0004415)
alex
02-04-10 18:07
edited on: 02-04-10 18:10

It appears as each thread/process has a copy of cluster and active_count on a particular srun never tracks total active_socket counts.

With the cost at 0 for every one of sruns nodes following the failed nodes get selected at a rate proportional to the number of the preceding dead nodes.

with a and b down, server c gets 'a's and 'b's share serving triple the load
a 0
b 0
c 4298
d 1444
e 1424
f 1413
g 1421

 
(0004418)
ferg
02-05-10 09:03

The backup calculation was using the old 3.1 session encoding, and needed to be updated to the 4.0 encoding.
 
(0004421)
alex
02-09-10 09:34

Retested the case with build off the trunk:
debian-5-64-bit
apache 2.2.14

The problem appears to be in select_host code where active_sockets invariably equal 0, so all server have equal cost, therefore next node after the failed takes their load.

a 0
b 0
c 2164
d 726
e 713
f 720
g 724
 
(0004446)
alex
02-22-10 12:00

fix verified with resin 4.0.4 and resin 3.1.10
 

- Issue History
Date Modified Username Field Change
02-04-10 18:00 alex New Issue
02-04-10 18:07 alex Note Added: 0004415
02-04-10 18:07 alex Note Edited: 0004415
02-04-10 18:08 alex Note Edited: 0004415
02-04-10 18:10 alex Note Edited: 0004415
02-05-10 09:03 ferg Note Added: 0004418
02-05-10 09:03 ferg Assigned To  => ferg
02-05-10 09:03 ferg Status new => closed
02-05-10 09:03 ferg Resolution open => fixed
02-05-10 09:03 ferg Fixed in Version  => 4.0.4
02-09-10 09:34 alex Status closed => feedback
02-09-10 09:34 alex Resolution fixed => reopened
02-09-10 09:34 alex Note Added: 0004421
02-22-10 12:00 alex Status feedback => closed
02-22-10 12:00 alex Note Added: 0004446
02-22-10 12:00 alex Resolution reopened => fixed


Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
38 total queries executed.
29 unique queries executed.
Powered by Mantis Bugtracker