Spread/mod_log_spread

 




What is Spread?

Spread is a toolkit for group communication developed at the Johns Hopkins Unversity Center for Networking and Distributed Systems. It is both a protocol and a communication method. To use spread, machines run a spread daemon which is awar, through it's configuration, of all other spread daemons that might be talking to it. There is a notion of 'groups', implemneted such that if machine A has joined group 'x', machine 'B' has joined group 'x', and machine 'C' has joined group 'y', then 'A' and 'B' will see each others messages but not 'C's and 'C' will see nothing but it's own. When sending messages, various methods of reliability can be used, such as unreliable (and spread the spread protocol is not tcp, so unreliable is unreliable), to reliable, to various degrees of orderedness (such as if you wanted to guarantee that if machine 'A' sends a message before machine 'B', that those messages are correctly ordered by every member of the group. More information on spread is available at the CNDS website.

What is mod_log_spread?

mod_log_spread is a module for the apache webserver which uses spread for reliable logging to the network. Some benefits of using spread are
It's reliable. syslog network logging is not and would result in dropped logs. It allows for redundancy. In spread there are no priviledged hosts. The logging host is eliminated as a single point of failure as bringing up a new logging host is as simple as turning one on. It's flexible. Since mod_log_spread was built on top of mod_log_config, the entire advanced feature set of apache logging (including env masks and per-directory/per-vhost logging) is available. Further, breaking down different services to differnet log files is as simple as changing the group that servivce logs to.  Information/source for mod_log_spread is available here.

What is wrong with the way things were ....

The reason I wrote mod_log_spread was that a popular commercial log writing was hard to support, non-scalable, and broke frequently. The scalability concerns with it stemmed from it's basic design. The particular product I was addled with was a (java-based) packet sniffer. It sniffs for http transactions and recreates them from tcp sessions. This presents immediate scalability concerns. How do we sniff a network pushing 70Mb of traffic with a single non-clustering packet-sniffer? You don't. mod_log_spread backs up this assertion by demonstratebly recording 10-15% more traffic. Sniffers drop logs, Spread, the underlying protocol behind mod_log_spread, is designed to be unable to drop messages. This particular commercial sniffer  is also a single point of failure.   mod_log_spread can run two (or any number) logging hosts simultaneously with no netwrok overhead. Further it is not a black box product, mod_log_spread is an open-source project.

So why not just write logs locally?

There is a 20-30% performance hit, and you have never known pain until you have tried to manage local logging across 60 machines.  Trust me.

Ok, so you've convinced me it's cool.  How does it work?

Both spread and mod_log_spread have decent documentation.  Check them out.  Here's a brief run down.  Spread's main configuration file is /etc/spread.conf.  It looks something like:
 

#/etc/spread.conf from bp23
1
120     225.0.1.4       7777
bp1             192.168.1.1
bp2             192.168.1.2
bp3             192.168.1.3
bp4             192.168.1.4
bp5             192.168.1.5
bp6             192.168.1.6
bp7             192.168.1.7
bp8             192.168.1.8
bp9             192.168.1.9
bp10            192.168.1.10
bp11            192.168.1.11
bp12            192.168.1.12
bp13            192.168.1.13
bp14            192.168.1.14
bp15            192.168.1.15
bp16            192.168.1.16
bp17            192.168.1.17
bp18            192.168.1.18
....
bp119           192.168.1.119
bp120           192.168.1.120
 
This says that bp23 sees 1 spread ring, identified as having up to 120 members listening to port 7777 on multicast address 225.0.1.4.  The possible members of the ring are detailed in the lines that follow.  All machines that listen on this ip/port need to have this exact configuration file.

mod_log_spread is a Apache DSO must have the log_spread.so in /web/XX/adm/libexec/ and is enabled with lines like the following in httpd.conf:
 

#/path/to/apache/conf/httpd.conf  from bp23
LoadModule spread_log_module  libexec/mod_log_spread.so
AddModule mod_log_spread.c
SpreadDaemon 7777
CustomLog     $test    common
This tells mod_log_spread where to find the local SpreadDaemon (you can contact a remote one, but you shouldn't) and tells it to log clf logs to the group 'test'.

You can verify your configuration is working by looking in the apache error log.  You should get a line like:

[Sun Jul 30 05:53:19 2000] [notice] set_spread_daemon(7777)
[Sun Jul 30 05:53:19 2000] [notice] Apache/1.3.9 (Unix) PHP/3.0.11 configured -- resuming normal operations
If you get a bunch of these:
 
[Mon Jul 31 13:44:38 2000] [notice] SP_multicast error(-11) in config_log_transaction
[Mon Jul 31 13:44:44 2000] [notice] SP_multicast error(-11) in config_log_transaction
something is wrong - perhaps your spread daemon is not listening on the port you specified in your httpd.conf.  You may get a few of these, especially if spread restarts.  Don't worry about them.

There are other tools for evaluating the health and happiness of your spread daemons as well.  /usr/local/bin/user (available on most machines and binary-portable) is a command line spread client.  You can use it to monitor the raw spread traffic.  An example session is:

 
[root@bp23 ~]# user -s 7777
Spread library version is 3.12
User: connected to 7777 with private group #user#bp23

==========
User Menu:
----------

        j <group> -- join a group
        l <group> -- leave a group

        s <group> -- send a message
        b <group> -- send a burst of messages

        r -- receive a message (stuck)
        p -- poll for a message
        e -- enable asynchonous read (default)
        d -- disable asynchronous read

        q -- quit

User> j test

User>
============================
Received REGULAR membership for group test with 2 members, where I am member 0\
:
        #user#bp23
        #sld-09221#bp87
grp id is -1062731504 964738633 2
Due to the JOIN of #user#bp23

User>
============================
received RELIABLE message from #ap-30445#bp26, of type 1, (endian 0) to 1 group\
s
(82 bytes): 143.231.34.236 - - [31/Jul/2000:13:52:47 -0400] "GET /Members/ HTTP\
/1.1" 200 1097

....
User>
============================
received RELIABLE message from #ap-08888#bp23, of type 1, (endian 0) to 1 group\
s
(81 bytes): 205.188.198.36 - - [31/Jul/2000:13:52:48 -0400] "POST /auth.html HT\
TP/1.0" 302 0

User> q
Bye.
[root@bp23 ~]#

Here, we connect user to the local spread daemon (your port may vary) and join the group wwwbp to see whats going on.  Then we quit.  If user hangs when you invoke it, there is something bad going on.  Spreadwatch.pl will take care of it for you (it will kill the spread daemon and re-invoke it.)  Never under any circumstances do a  's test' - this will write trash in our logs!  You can also make up your own group if you want.

There is also the utility /usr/local/bin/monitor.  With the additional automated watchers, using this should be pretty unnecessary.  I'll include a little transcript:

 
[root@bp23 ~]# monitor -c /etc/spread.conf -n bp23
/===========================================================================\
| The Spread Group Communication Toolkit.                                   |
| Copyright (c) 1994-1999 Yair Amir, Michal Miskin-Amir, Jonathan Stanton.  |
| All rights reserved.                                                      |
|                                                                           |
| The Spread package is licensed under the Spread Non-Commercial License.   |
| You may only use this software in compliance with the License.            |
| A copy of the license can be found at http://www.spread.org/license       |
|                                                                           |
| This software is distributed on an "AS IS" basis, WITHOUT WARRANTY OF     |
| ANY KIND, either express or implied.                                      |
|                                                                           |
| Spread is developed at the Center for Networking and Distributed Systems, |
| The Johns Hopkins University.                                             |
|                                                                           |
| Creators:                                                                 |
|    Yair Amir             yairamir@cs.jhu.edu                              |
|    Michal Miskin-Amir    michal@spread.org                                |
|    Jonathan Stanton      jonathan@cs.jhu.edu                              |
|                                                                           |
| Contributors:                                                             |
|    Dan Schoenblum   dansch@cnds.jhu.edu - Java Interface Developer.       |
|    John Schultz     jschultz@cnds.jhu.edu - contribution to process group |
|                                             membership.                   |
| |
| Special thanks to the following for providing ideas and/or code:          |
|    Ken Birman, Danny Dolev, David Shaw, Robbert VanRenesse.               |
|                                                                           |
| WWW    : http://www.spread.org  and  http://www.cnds.jhu.edu              |
| Contact: spread@spread.org                                                |
|                                                                           |
| Version 3.12, Built 27/Jul/1999                                           |
\===========================================================================/

=============
Monitor Menu:
-------------
        0. Activate/Deactivate Status {all, none, Proc, CR}

        1. Define Partition
        2. Send   Partition
        3. Review Partition
        4. Cancel Partition Effects

        5. Define Flow Control
        6. Send   Flow Control
        7. Review Flow Control

        8. Terminate Spread Daemons {all, none, Proc, CR}

        9. Exit
Monitor> 0

=============
Activate Status
-------------

        Enter Proc Name: bp23
        Enter Proc Name:
Monitor: send status query

Monitor>
============================
Status at bp23 V3.12 (state 1, gstate 1) after 488025 seconds :
Membership  :  7  procs in 1 segments, leader is bp16
rounds   : 352222065    tok_hurry :     127     memb change:      10
sent pack: 1323028      recv pack : 6816735     retrans    :  165393
u retrans:       0      s retrans :  165393     b retrans  :       0
My_aru   : 4970943      Aru       : 4970942     Highest seq: 4970943
Sessions :      39      Groups    :       1     Window     :      60
Deliver M: 8084658      Deliver Pk: 8139763     Pers Window:      15
Delta Mes: 8084658      Delta Pack: 4970942     Delta sec  :  488025
==================================

Monitor> q
Bye.
[root@bp23 ~]#


There's lots of interesting info here, unfortunately it's not terribly well documented.  :)