Mantis - Quercus
Viewing Issue Advanced Details
2707 minor always 05-29-08 09:08 05-29-08 09:22
ferg  
 
normal  
new 3.1.6  
open  
none    
none  
0002707: mysql/mediawiki transaction timeout
(rep by Paul Fischer)

We keep running into sporadic database errors on certain mediawiki pages. Mostly they are mySQL 1213 errors (a database lock) or a database timeout. It seems to be related to some object caching in mediawiki, and appears to occur only during the deletion of data related to objcache.

I have a few theories on what might cause this:

1. Clustering between multiple mediawiki instances is causing deadlocks as two instances try to delete the same content
2. database configuration issue
3. Logging synchronization (when this issue seems to occur, I look at resin-admin to see what is going on. Often, there seem to be a lot of threads in a BLOCKED state waiting on logging code. I am wondering if there is some synchronization that is causing threads to block [during logging], and that this waiting is having a cascade effect on the database connection pool [since db connections can't be returned])

Even if this issue continued to occur, we could prevent it from getting seen by always passing a 500 error. The problem is that we have a controller that delegates to Quercus/PHP, and even if a 500 error is returned, the resultant response is simply included within the model and then displayed in a section of a page. In other words, each page is comprised of multiple requests to PHP (via a controller) making it hard to detect an error condition. If we were able to detect a 500 error on any of these "embedded" requests, we could ensure that a 500 is sent for the actual browser request. Since all requests come through Akamai, we these bad responses would never get seen -- Akamai will never cache or display 500 errors. It will just use the last, cached, non-error page.

If you have any suggestions on how to go about detecting an error condition on one of these responses, it would be very helpful. And of course, addressing the actual issue is the most ideal. The problem happens sporadically, making it very difficult to debug. But since these pages are getting cached, the error is compounded, and it looks quite bad on the site.

Notes
(0003109)
ferg   
05-29-08 09:22   
Here is an example of the error we are seeing:

A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was:

(SQL query hidden)

from within function "MediaWikiBagOStuff::_doquery". MySQL returned error "1213: Deadlock found when trying to get lock; try restarting transaction (foo51-03)".