5. Slony-I Maintenance

Slony-I actually does a lot of its necessary maintenance itself, in a “cleanup” thread:

5.1.  Watchdogs: Keeping Slons Running

There are a couple of “watchdog” scripts available that monitor things, and restart the slon processes should they happen to die for some reason, such as a network “glitch” that causes loss of connectivity.

You might want to run them...

5.2. Parallel to Watchdog: generate_syncs.sh

A new script for Slony-I 1.1 is generate_syncs.sh, which addresses the following kind of situation.

Supposing you have some possibly-flakey server where the slon daemon that might not run all the time, you might return from a weekend away only to discover the following situation.

On Friday night, something went “bump” and while the database came back up, none of the slon daemons survived. Your online application then saw nearly three days worth of reasonably heavy transaction load.

When you restart slon on Monday, it hasn't done a SYNC on the master since Friday, so that the next “SYNC set” comprises all of the updates between Friday and Monday. Yuck.

If you run generate_syncs.sh as a cron job every 20 minutes, it will force in a periodic SYNC on the origin, which means that between Friday and Monday, the numerous updates are split into more than 100 syncs, which can be applied incrementally, making the cleanup a lot less unpleasant.

Note that if SYNCs are running regularly, this script won't bother doing anything.

5.3. Replication Test Scripts

In the directory tools may be found four scripts that may be used to do monitoring of Slony-I instances:

  • test_slony_replication is a Perl script to which you can pass connection information to get to a Slony-I node. It then queries sl_path and other information on that node in order to determine the shape of the requested replication set.

    It then injects some test queries to a test table called slony_test which is defined as follows, and which needs to be added to the set of tables being replicated:

    CREATE TABLE slony_test (
        description text,
        mod_date timestamp with time zone,
        "_Slony-I_testcluster_rowID" bigint DEFAULT nextval('"_testcluster".sl_rowid_seq'::text) NOT NULL
    );

    The last column in that table was defined by Slony-I as one lacking a primary key...

    This script generates a line of output for each Slony-I node that is active for the requested replication set in a file called cluster.fact.log.

    There is an additional finalquery option that allows you to pass in an application-specific SQL query that can determine something about the state of your application.

  • log.pm is a Perl module that manages logging for the Perl scripts.

  • run_rep_tests.sh is a “wrapper” script that runs test_slony_replication.

    If you have several Slony-I clusters, you might set up configuration in this file to connect to all those clusters.

  • nagios_slony_test is a script that was constructed to query the log files so that you might run the replication tests every so often (we run them every 6 minutes), and then a system monitoring tool such as Nagios can be set up to use this script to query the state indicated in those logs.

    It seemed rather more efficient to have a cron job run the tests and have Nagios check the results rather than having Nagios run the tests directly. The tests can exercise the whole Slony-I cluster at once rather than Nagios invoking updates over and over again.

5.4.  Log Files

slon daemons generate some more-or-less verbose log files, depending on what debugging level is turned on. You might assortedly wish to:

  • Use a log rotator like Apache rotatelogs to have a sequence of log files so that no one of them gets too big;

  • Purge out old log files, periodically.