Project

General

Profile

Actions

Bug #4173

open

Bug #4166: [ce] Rhodecode crashing after MySQL error

[ce, ee] mysql recycle pool timeout not working

Added by Daniel D over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
12.08.2016
Due date:
% Done:

0%

Estimated time:
Sorting:
Commit Number:
Affected Version:

Description

Replication:

  • new config using mysql database
  • set mysql's wait_timeout to 6
  • threads = 1 in ini
  • sqlalchemy.db1.pool_recycle = 3 in ini
  • git push a repo (no changes need, just a push)
  • wait 7 seconds
  • git push again
  • get - OperationalError: (OperationalError) (2006, 'MySQL server has gone away)

This will also break subsequent requests with StatementError: Can't reconnect until invalid transaction is rolled back

Actions #1

Updated by Daniel D over 7 years ago

The root cause of the problem is here:
https://internal-code.rhodecode.com/rhodecode-enterprise-ce/files/a92da8f244901a192e22530d3a8190a09dd4fef8/rhodecode/lib/middleware/simplevcs.py#L410

Some interesting points:

  1. the recycle check only happens when checking out a connection from the pool
  2. when no connections are checked out, meta.Session() will check out a new connection from the pool, also doing the recycle check, otherwise it will reuse a connection that exists - but it won't do a recycle check.
  3. meta.Session.remove() is needed to close the connection and check it back in to the pool

The problem is although we have meta.Session.remove in the pylons tween: https://internal-code.rhodecode.com/rhodecode-enterprise-ce/files/a92da8f244901a192e22530d3a8190a09dd4fef8/rhodecode/tweens.py#L76

we are calling meta.Session() again, in the simplevcs finally clause of the response iterator...this 'bypasses' the original meta.Session.remove() finally clause and reopens the connection, leaving it open until the next request comes in.

On the next request that comes in, there will be no 'checkout' since the connection is open already....so if the time between last request (that was a push operation) and the next request is over mysql's wait_timeout seconds, then there will be no recycle, and we crash

Actions #2

Updated by Marcin Kuzminski [CTO] over 7 years ago

nice write-up, actually maybe we could move the cache invalidation to other place ?

Actions #4

Updated by Daniel D over 7 years ago

The problem is that this bug can happen easily anywhere - anything that creates a calls meta.Session() will check out a connection.

So we should wrap anything that would possibly use the database, and this is not limited to requests, the make_pyramid_app() function for example will check out a connection and leave it hanging, and also there seemed to be a subrequest issued when a http hooks daemon hook was being called.

Actions #5

Updated by Daniel D over 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Daniel D
Actions

Also available in: Atom PDF