Here follows a description of
the known limitation you should consult before creating a bug report
for dar:
Fixed Limits
- The size of SLICES may be limited by the file system or
kernel (maximum file size is 2 GB with Linux kernel 2.2.x),
- the number of SLICES is only limited by the size of the
filenames, thus using a basename of 10 chars, considering your file
system can support 256 char per filename at most, you could already get
up to 10^241 SLICES, 1 followed by 241 zero. But as soon as your
file
system will support bigger files or longer filename, dar will follow
without change.
- dar_manager can gather up to 65534 different backups. This
limit should be high enough to not be a problem.
System variable limits
Memory
Dar uses virtual memory (=
RAM+swap) to be able to add the list of file
saved at the end of each archive. Dar uses its own integer type (called
"infinint") that do not have limit (unlike 32 bits or 64 bits
integers). This makes dar already able to manage Zettabytes volumes and
above even if the systems cannot yet manage such file sizes.
Nevertheless, this has an overhead with memory and CPU usage,
added to
the C++ overhead for the datastructure. All together dar needs a
average of 650 bytes of virtual memory by saved file. Thus, for example
if you have 110,000 files to save, whatever is the total amount of data
to save, dar will require around 67 MB of virtual memory.
Now, when doing catalogue
extraction or differential backup, dar has in
memory two catalogues, thus the amount of memory space needed is the
double (134 MB in the example). Why ? Because for differential backup,
dar starts with the catalogue of the archive of reference which is
needed to know which files to save and which not to save, and in
another hand, builds the catalogue of the new archive all along the
process. Now, for catalogue extraction, the process is equivalent to
making a differential backup just after a full backup.
This memory issue, is not a
limit by itself, but you need enough
virtual memory to be able to save your data (if necessary you can still
add swap space).
Integers
To overcome the previously
explained
memory issue, dar can be build in an other mode. In this other mode,
"infinint" is replaced by 32 bits or 64 bits integers, as defined by
the use of --enable-mode=32 or --enable-mode=64 options given to
configure script. The executables built this way (dar, dar_xform,
dar_slave and dar_manager) run faster and use much less memory than the
"full" versions using "infinint". But yes, there are drawbacks:
slice
size, file size, dates, number of files to backup, total archive size
(sum of all slices), etc, are bounded by the maximum value of the used
integer, which is 4,294,967,296 for 32 bits and
18,446,744,073,709,551,616 for 64 bits integers. In clear the 32
bits
version cannot handle dates after year 2106 and file sizes over 4 GB.
While the 64 bits version cannot handle dates after around 500 billion
years (which is longer than the estimated age of the Universe: 15
billion years) and file larger than around 18 EB (18 exa bytes).
What the comportment when such
a
limit is reached ? For compatibility with the rest of the code, limited
length integers (32 or 64 for now) cannot be used as-is, they are
enclosed in a C++ class, which will report overflow in arithmetic
operations. Archives generated with all the different version of dar
will stay compatible between them, but the 32 bits or 64 bits will not
be able to read or produce all possible archive. In that case, dar
suite program will abort with an error message asking you to use the
"full" version of dar program.
Command line
On several systems,
command-line long
options are not available. This is due to the fact that dar relies on
GNU getopt. Systems like FreeBSD do not have by default GNU getopt,
instead the getopt function proposed from the standard library does not
support long options, nor optional arguments. On such system you will
have to use short options only, and to overcome the lack of optional
argument you need to explicitly set the argument. For example in place
of "-z" use "-z 9" and so on. All options for dar's features are
available with FreeBSD's getopt, just using short options and explicit
arguments.
Else you can install GNU getopt
as a
separated library called libgnugetopt. If the include file
<getopt.h> is also available, the configure script will detect it
and use this library. This way you can have long options on FreeBSD for
example.
Dates
Unix files have three dates :
- last modification date (mtime)
- last access date (atime)
- last inode change (ctime)
In dar dates are stored as
integers (the number of seconds elapsed
since Jan 1st, 1970). As seen above, the limitation is not due to dar
but on the integer used, so if you use infinint, you should be able to
store any date as far in the future as you want. Of course dar cannot
stores dates before Jan the 1st of 1970, but it should not be a very
big problem. ;-)
There is no standard way under
Unix
to change the ctime. So DAR does not save ctime date (except when
saving EA) nor it tries can restore it. Reading a file (when doing a
backup or a comparison) updates the atime of the read files. Some
applications like leafnode (an NNTP cache) base their expiry time on
the atime. So since the beginning and still by default, dar sets back
the atime of file it reads. But changing the atime updates the ctime,
which cannot be set back to its original value. As the ctime was not
taken in account this was not a problem to me until I got feedback that
some security tool base their inspection on the ctime (A file that had
the ctime changed is most probably changed, in particular if the atime
has not changed, this may indicate a dissimulation). So there is now a
special mode in which dar does not try to set back the atime of files
which preserves the ctime.
|