Database connection pooling with PHP and gearman


Handling Database connections with PHP is pretty straightforward. Just open database connection, perform the actions we need, and close it. There’s a little problem. We cannot create a pull of database connections. We need to create the connection in every request. Create and destroy, again and again. There are some third-party solutions like SQL relay or pgpool2 (if you use PostgreSQL like me). In this post I’m going to try to explain a personal experiment to create a connection pooling using gearman. Another purpose of this experiment is create a prepared statements’ cache not only for the current request but also for all ones. Let’s start.

This is an example of SELECT statement with PDO and PHP

$dbh = new PDO('pgsql:dbname=pg1;host=localhost', 'user', 'pass');
$stmt = $dbh->prepare($sql);
$stmt->execute();
$data = $stmt->fetchAll();

Basically the idea is to use a gearman worker to perform every database operations. As far as I known we cannot pass PDO instances from gearman worker to gearman client. Even with object serialization (PDO objects cannot be serialized). That’s a problem.

My idea is use the same interface than using PDO but let the database work to the worker and obtaining a connection id instead of a real PDO connection.

That’s is the configuration class. We can see It’s defined one database connection and two gearman servers at the same host:

class PoolConf
{
    const PG1 = 'PG1';
    static $DB = array(
        self::PG1 => array(
            'dsn'      => "pgsql:dbname=gonzalo;host=localhost",
            'username' => 'user',
            'password' => 'pass',
            'options'  => null),
    );

    static $SERVERS = array(
        array('127.0.0.1', 4730),
        array('127.0.0.1', 4731),
    );
}

We start the workers:

gearmand -d --log-file=/var/log/gearman --user=gonzalo -p=4730
gearmand -d --log-file=/var/log/gearman --user=gonzalo -p=4731

How many workers we need to start? Depends on your needs. We must realize gearman is not a pool. It’s a queue. But we can start as many servers as we want  (OK it’s depends on our RAM) and create a poll of queues. We need to remember that if we start only one gearman server we only can handle only one database operation each time (it’s a queue) and it will be a huge bottleneck if the application scales. So you need to assess your site and evaluate how many concurrent operations you normally have and start as many gearman server as you need.

Maybe is difficult to explain but the final outcome will be something like that:

use Pool\Client;
$conn = Client::singleton()->getConnection(PoolConf::PG1);

$sql = "SELECT * FROM TEST.TBL1";
$stmt = $conn->prepare($sql);

$stmt->execute();
$data = $stmt->fetchall();
echo "<p>count: " . count($data) . "</p>";

We must take into account that $stmt is not a “real” PHP statement. The real PHP statement is stored into a static variable within the worker. Our $stmt is an instance of Pool\Server\Stmt class. This class has some public methods with the same name than the real statement (because of that it behaves as a real statement), and internally those methods are calls to gearman worker. The same occurs with $conn variable. It’s not a real PDO connection. It’s am instance of Pool\Server\Connection Class.

// Pool/Server/Stmt.php
namespace Pool\Server;

class Stmt
{
    private $_smtpId = null;
    private $_cid    = null;
    private $_client = null;

    function __construct($stmtId, $cid, $client)
    {
        $this->_stmt = $stmtId;
        $this->_cid = $cid;
        $this->_client = $client;
    }

    public function execute($parameters=array())
    {
        $out = $this->_client->do('execute', serialize(array(
            'parameters' => $parameters,
            'stmt'       => $this->_stmt,
            )));
        $error = unserialize($out);
        if (is_a($error, '\Pool\Exception')) {
            throw $error;
        }
        $this->_stmt = $out;
        return $this;
    }

    public function fetchAll()
    {
        $data = $this->_client->do('fetchAll', serialize(array(
            'stmt' => $this->_stmt,
            )));
        return unserialize($data);
    }
}

The worker. The following code is an extract. You can see the full code here

# Create our worker object.
$worker= new GearmanWorker();
foreach (PoolConf::$SERVERS as $server) {
    $worker->addServer($server[0], $server[1]);
}

\Pool\Server::init();

$worker->addFunction('getConnection', 'getConnection');
$worker->addFunction('prepare', 'prepare');
$worker->addFunction('execute', 'execute');
$worker->addFunction('fetchAll', 'fetchAll');
$worker->addFunction('info', 'info');
$worker->addFunction('release', 'release');
$worker->addFunction('beginTransaction', 'beginTransaction');
$worker->addFunction('commit', 'commit');
$worker->addFunction('rollback', 'rollback');

while (1) {
    try {
        $ret = $worker->work();
        if ($worker->returnCode() != GEARMAN_SUCCESS) {
            break;
        }
    } catch (Exception $e) {
        echo $e->getMessage();
    }
}

function fetchAll($job)
{
    echo __function__."\n";
    $params = unserialize($job->workload());
    $stmtId = $params['stmt'];
    return serialize(\Pool\Server::fetchAll($stmtId));
}

function execute($job)
{
    echo __function__."\n";
    $params = unserialize($job->workload());
    $stmtId = $params['stmt'];
    $parameters = $params['parameters'];
    return \Pool\Server::execute($stmtId, $parameters);
}
...

The heart of the worker is \Pool\Server class. This class performs every real PDO operations and stores statements and connections into static private variables.

And now we can use the database pool reusing connections and prepared statements. You can see here a small performance test of reusing prepared statements in an older post.

I’ve also implemented a small error handling. Errors in the worker are serialized and thrown on the client simulating the normal operation of standard PDO usage.

Now a set of examples:

Simple queries. And a simple error handling:

include('../conf/PoolConf.php');
include('../lib/Pool/Client.php');
include('../lib/Pool/Server.php');
include('../lib/Pool/Exception.php');
include('../lib/Pool/Server/Connection.php');
include('../lib/Pool/Server/Stmt.php');

use Pool\Client;
$conn = Client::singleton()->getConnection(PoolConf::PG1);

$sql = "SELECT * FROM TEST.TBL1";
$stmt = $conn->prepare($sql);

$stmt->execute();
$data = $stmt->fetchall();
echo "<p>count: " . count($data) . "</p>";

try {
    $sql = "SELECT * TEST.NON_EXISTENT_TABLE";
    $stmt = $conn->prepare($sql);

    $stmt->execute();
    $data = $stmt->fetchall();
    echo "<p>count: " . count($data) . "</p>";

} catch (Exception $e) {
    echo "ERROR: " . $e->getMessage();
}

print_r(Client::singleton()->info(PoolConf::PG1));

Now with bind parameters:

<?php
include('../conf/PoolConf.php');
include('../lib/Pool/Client.php');
include('../lib/Pool/Server.php');
include('../lib/Pool/Exception.php');
include('../lib/Pool/Server/Connection.php');
include('../lib/Pool/Server/Stmt.php');

use Pool\Client;
$conn = Client::singleton()->getConnection(PoolConf::PG1);

$data = $conn->prepare("SELECT * FROM TEST.TBL1 WHERE SELECCION=:S")->execute(array('S' => 1))->fetchall();

echo count($data);

print_r(Client::singleton()->info(PoolConf::PG1));

And now a transaction:

<?php
include('../conf/PoolConf.php');
include('../lib/Pool/Client.php');
include('../lib/Pool/Server.php');
include('../lib/Pool/Exception.php');
include('../lib/Pool/Server/Connection.php');
include('../lib/Pool/Server/Stmt.php');

use Pool\Client;
$conn = Client::singleton()->getConnection(PoolConf::PG1);

$conn->beginTransaction();
$data = $conn->prepare("SELECT * FROM TEST.TBL1 WHERE SELECCION=:S")->execute(array('S' => 1))->fetchall();
$conn->rollback();

print_r(Client::singleton()->info(PoolConf::PG1));

Conclusion.

That’s a personal experiment. It works, indeed, but probably it’s crowed by bugs. You can use it in production if you are a brave developer :).

Source code.
The full sourcecode is available here at google code

13 thoughts on “Database connection pooling with PHP and gearman

  1. That’s an thinking out-of-the-box solution. However I’m not sure PHP works well with long-running connections, as I have not used a persistent connection for year. Will PDO::ATTR_PERSISTENT work in the same way?

    1. As far as I know it’s not possible to do it with PDO. I want to store open transactions inside the pool. PDO::ATTR_PERSISTENT’s connection dies and roll-backs my transaction if it’s not commited (I cannot recover the same one in the next http request). I known my implementation it is too overbold to use on production. It’s a prototype. It works but it’s a prototype.

      I’d heard in a PHP conference that the new Oracle database allows connection pooling natively but. Maybe in PHP we need to use something like sql_relay.

      By the way I think something like that can be used to keep objects in memory between requests (A poor man solution). But as you said I don’t known if PHP works well with long-running connections (but gearman workers work like a charm)

  2. What about pconnect?.It searches for opened conecction in apache if one is opened it uses it but Is this really a connection pooling? .
    I have also read about sql_relay and it supports connection pooling but have you test it?

    my email:logica_razon@hotmailcom

    1. pconnect doesn’t create a connecion pooling indeed. My connection pooling with gearman is maybe too exotic (it’s just an experiment). I don’t recommed to use it in producction (unless you are a brave developer). sqlrelay is a good solution. If you use postgresql you can use pgpool2. Oracle also another solution (I don’t remember the name now)

  3. No lo he probado, solo ojeado, pero parece que cada worker tiene su propio array de conexiones, osea un cliente pediria una conexion y cuando haga una operacion, gearman podria asignarsele un worker que no fue el que la creo y no encontria la conexion en el array supongo. No se si lo he visto mal. Saludos.

    1. La idea no es esa. Si funciona así es claramente un bug :). La idea es crear un pool de conexiones global a todos los workers y que estos lo usen. Del mismo modo las sentencias preparadas. Gracias a esto podemos reutilizar la conexión e incluso compartir la misma transacción entre diferentes peticiones.

      Jugando con la misma idea he creado otro experimento. Esta vez usando react (node.php) en vez de Gearman: https://gonzalo123.wordpress.com/2012/05/21/database-connection-pooling-with-php-and-react-node-php/

      1. Como las conexiones no pueden viajar serializadas, no queda otra que un solo worker que funciona como un servicio y es el que mantiene ese array global, o hacer que cada peticion vaya al worker que tiene su conexion, pero esto ultimo me da que va en contra de como funciona gearman. En el ejemplo de React no veo pool de conexiones, creo que se te olvido poner el array y gestionarlo usando las clases del ejemplo de gearman de Pool/Server, aun asi solo puede funcionar un “worker” react que despacha toda la pool.

  4. I checked this on a recent version of php 7 and mysql. The basic code installation looks fine.

    however I do not understand how the connection pooling works.

    In the example client.php If I call 100 requests in the loop , it goes till 10 and then exits. If I have to test connection pooling , how exactly can I do it.

    with worker.php started, it opens the number of connections defined in the server.php file.

    Is there any updated code that I can try out?

  5. Have you been able to make improvements on it. Do you have another suggestion for a stable connection pool?

    couple of things I have noticed with this implementation.
    1) There is no queue mechanism. If the connection is not available the process continues.
    2) sometimes the queries are slower using this approach rather than the traditional pdo connect approach.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.