In some cases, a merged report doesn't display the right information. We outline some worst case scenarios, and justify our implementation.
Suppose log file 1 (“requests” with “sizes”) looks like:
request | size |
---|---|
A | 12 |
B | 11 |
C | 10 |
while log file 2 looks like:
request | size |
---|---|
D | 3 |
E | 2 |
F | 1 |
We report on the top 2 biggest requests, so the report from log 1 looks like:
request | size |
---|---|
A | 12 |
B | 11 |
while the report from log 2 would look like:
request | size |
---|---|
D | 3 |
E | 2 |
Now we change the superservice.cfg file to list the top-4 biggest items. A naive merge would lead to:
request | size |
---|---|
A | 12 |
B | 11 |
D | 3 |
E | 2 |
Of course, this should've been:
request | size |
---|---|
A | 12 |
B | 11 |
C | 10 |
D | 3 |
This effect does not occur when keeping the top-limit to the same value. However, when we're not reporting on distinct values in the log, but are summing, more horrible things might happen. Consider this: We want to report on the total size by client. Logs look like:
client | size |
---|---|
a | 12 |
b | 11 |
c | 10 |
and
client | size |
---|---|
d | 4 |
e | 4 |
c | 3 |
Reports from these logs would look like:
client | size |
---|---|
a | 12 |
b | 11 |
client | size |
---|---|
d | 4 |
e | 4 |
After naively merging, one would get:
client | size |
---|---|
a | 12 |
b | 11 |
In fact, the complete report should look like:
client | size |
---|---|
c | 13 |
a | 12 |
Luckily, the Lire merging algorithm is not this naive: in fact, the XML reports store a little more records than actually needed. This heuristic trick leads to sane merged reports in most cases. However, since this is merely a heuristic trick, it is no waterproof guarantee.
See the description of the guess_extra_entries routine in the Lire::AsciiDlf::Group manpage for more implementation details.