The Lire::DlfConverter interface requires two kinds of methods. First, it requires methods which provide information to the framework on your converter. Second, it requires methods which will actually implement the conversion process. It this the format that this section documents.
The method name()
should
returns the name of our DLF converter. It is this name
that is passed to the lr_log2report
command. This name must be unique among all the converters
registered and it should be restricted to alphanumerical
characters (hyphens, period and underscores can also be
used).
We will name our converter
common_syslog
:
sub name { return "common_syslog"; }
The next two required methods are used to give more
verbose information on your converter to the users. The
converter's title()
and
description()
can be use to
display information about your converter from the user
interface or to generate documentation.
The title()
should simply
returns a string:
sub title { return "Common Log Format embedded in Syslog DLF Converter"; }
The description()
method
should returns a DocBook
fragment describing your converter and the log formats it
support. If you don't know
DocBook just restrict yourself
to using the para
elements to make
paragraphs:
sub description { return <<EOD; <para>This DLF Converter extracts web server's requests and error information from a syslog file. </para> <para>The requests and errors should be logged under the <literal>httpd</literal> program name. The errors are mapped to the <type>syslog</type> schema, the requests are mapped to the <type>www</type> schema. </para> <para>Syslog records from another program than <literal>httpd</literal> are ignored. </para> EOF }
Two other meta-data methods are used by the framework itself. The first one specifies to what DLF schemas your DLF converter is converting to:
sub schemas { return ( "www", "syslog" ); }
In our case, we are converting to the syslog and www schemas. Like we described it in our converter's description, we will map the web server's error message to the syslog schema and the request logs to the www schema. Other alternatives would have been to only map the requests information to www schema or map all the non-request records to the syslog schema. The rationale behind the current choice (besides this being an example) is that it make it convenient to process one log file to obtain a report containing the requests and errors from our web server. For that use case, it is best to ignore the non-web server related stuff.
The other method affects how the conversion process will be handled. Lire offers two mode of conversion, the line oriented one and the file oriented one. (Both will be described in the next section). If your log file is line-oriented (each lines is one log record) like most log files are, you should use the line-oriented conversion mode:
sub handle_log_lines { return 1; }
The actual conversion process is handled through three
methods: init_dlf_converter
,
finish_conversion()
and either
process_log_file()
or
process_log_line()
depending on
the conversion mode (as determined by
handle_log_lines()
's return value.
The method
init_dlf_converter()
will be
called once before the log file is processed. It should
be use to initialize the state of your converter. Since
our DLF Converter doesn't need any initialization and doesn't
need any configuration, the method is simply empty:
sub init_dlf_converter { my ( $self, $process ) = @_; return; }
The $process
parameter which is
passed to all the processing methods is an instance of
Lire::DlfConverterProcess
. This
is the object which is driving the conversion process
and it defines several methods which you will use in the
actual conversion process.
The method
finish_conversion()
will be
called once after the log file has been completely
processed. This method will be mostly of use to stateful
converter, that is DLF converters which generates DLF
records from more than one line. Since this is not our
case, we simply leave the method empty:
sub finish_conversion { my ( $self, $process ) = @_; return; }
Whether you are using the file-oriented or
line-oriented conversion mode, the principles are the
same. You extract information from the log file and
creates DLF records from it. Your DLF converter
communicates with the framework by calling methods on
the Lire::DlfConverterProcess
object which is passed as parameter to your methods.
Here is the complete code of our conversion method:
use Lire::Apache qw/parse_common/; sub process_log_line { my ( $self, $process, $line ) = @_; my $sys_rec = eval { $self->{syslog_parser}->parse( $line ) }; if ( $@ ) { $process->error( $@, $line ); return; } elsif ( $sys_rec->{process} ne 'httpd' ) { $process->ignore_log_line( $line, "not an httpd record" ); return; } else { my $common_dlf = {}; eval { parse_common( $sys_rec->{content}, $common_dlf ) }; if ( $@ ) { $sys_rec->{message} = $sys_rec->{content}; $process->write_dlf( "syslog", $sys_rec ); } else { $process->write_dlf( "www", $common_dlf ); } } }
The first thing that should be noted is that in the
line-oriented conversion mode, the method
process_log_line()
will be
called once for each line in the log file.
Secondly, the actual parsing of the line is done
using two functions: parse_common
and Lire::Syslog
's
parse
. These methods simply
uses regular expressions to extract the appropriate
information from the line and put it in an hash
reference. What is important is that these methods
already uses as key names the schema's field names.
Finally, you can see that there are four different
methods used on the $process
object to
report different kind of information:
The example uses the
eval
statement to trap
errors during the syslog record parsing. If the
line cannot be parsed as a valid syslog record,
it is an error and it is reported through the
error()
method. The
first parameter is the error message and the
second one is the line to which the error is
associated. This last parameter is optional.
When the syslog event doesn't come from the
httpd process, we ignore the
line. Ignored line are reported to the framework
by using the
ignore_log_line()
method. The first parameter is the line which is
ignored. The second optional parameter gives the
reason why the line was ignored.
Finally, DLF records are created by using
the write_dlf()
method.
Its first parameter is the schema to which the
DLF record complies. This schema must be one
that is listed by your converter's
schemas()
method. The
second parameter is the DLF data contained in an
hash reference. The DLF record will be created
by taking for each field in the schema the value
under the same name in the hash. (Since in the
syslog schema, the field which
contains the actual log message is called
message
, this is the
reason we
are assigning the content
value to the message key.)
Missing fields
or fields whose value is
undef
will contains the
special LR_NA
missing value
marker. Keys in the hash that don't map to a
schema's field are simply ignored.
In our example, we distinguish between the
server's error message (mapped to the
syslog schema) and the request
information (mapped to the www
schema) based on whether
parse_common
succeeded in
parsing the line.
Another possibility, not shown in our example, is to ask that the line be saved for a later processing. This is mostly of use to converters who maitains state between lines. In the cases, it is quite the case that there are related lines that are missing from the end of the log file. In that case, you save the line and they will automatically seen by the next run of your converter on the same DLF store. This option is only available in the line-oriented mode of conversion.
The same principles apply when you are using the file-oriented mode of conversion. This mode will usually be used for binary log formats or format which aren't line-oriented like XML.
For demonstration purpose, the following code could be added to transform our line-oriented converter into a file-oriented one:
sub handle_log_lines { return 0; } sub process_log_file { my ( $self, $process, $fh ) = @_; my $line; while ( defined( $line = <$fh> ) { chomp $line; $self->process_log_line( $process, $line ); } }
The difference between the above code and using the line oriented mode is that the framework won't be aware of the number of log lines processed and your converter might have troubles when processing log files which uses a different line-ending convention than the host you are runnig on. Bottom line is that you should use the line-oriented conversion mode when your log format is line oriented.