If you think you have found a bug, please check the manual
and the HACKING
file to see if it is a known
restriction. If not, please send a clear and detailed
report to Martin Pool mbp@samba.org
. (For a clear
and detailed description of "clear and detailed", see Simon
Tatham's advice on reporting bugs,
http://www.chiark.greenend.org.uk/~sgtatham/bugs.html.)
distcc has a test suite written in Python using the PyUnit framework. It does not yet exercise all functionality, but is improving. If you discover a bug, or write new functionality, please try to add corresponding tests to make sure that the fix keeps working in the future.
There are no known cases where distcc will produce incorrect code, but they may exist. There are some restrictions on distcc, and some possible optimizations that are not yet implemented.
An important general goal is that the code should stay as simple as possible, and secondarily be portable to reasonably current Unix-like systems. Complicating the code, or adding large dependencies is undesirable unless there's an overwhelming advantage.
$COMPILER_PATH
and
$GCC_EXEC_PREFIX
in some sensible way, if there
is one. Not urgent because I have never heard of them
being used.-b
and
-V
options to the compiler, based on whatever
is present on the current machine? Or perhaps the user
should just do this.-D
,
-I
and any other options that we're sure are
handled only by the preprocessor. This would make the
server logs slightly more clear and readable and
possibly be a very tiny performance boost.gzip -3
is cheaper in CPU than cpp
and
gives a substantial reduction in the size of a
.i
file.connect()
in non-blocking mode, bounded by a
select
.ssh
or some other
security mechanism might be possible, but will cause
some performance loss.$(CC)
then it may have to be
updated to work properly. ccache handles these
situations by allowing itself to be installed in place
of gcc
. It examines the name under which it
was invoked and decides to run another compiler. It may
be possible for distcc
to piggy-back on that.
cc
argument parsing is complex and not
completely standardized.
$DISTCC_LOG
.cc
is used just for assembly. This
too could be done remotely, by handling the .s
extension as preprocessed source, and .S
as
unpreprocessed source.gcc
directly,
but on some volunteers it might be necessary to specify
a more detailed description of the compiler to get the
appropriate cross tool. This might be insufficient for
Makefiles that need to call several different compilers,
perhaps gcc
and g++
or different
versions of gcc. Perhaps they can make do with changing
the DISTCC host settings at appropriate times./etc/hosts.allow
.
That's a moderately good level of security: certainly
much cheaper than SSH. Unfortunately this may suddenly
break daemons, because many machines are configured to
disallow everything by default. We need to either make
it a configure option, or just put a big warning in the
documentation.--ping
client option to
contact all the remote servers, and perhaps return some kind of
interesting information. This is almost certainly just chrome;
though.SRV
records (RFC2052), or perhaps multi-RR A
records. For exmaple, compile.ozlabs.foo.com
would resolve to all relevant machines. Another
possibility would be to use SLP, the Service Location
Protocol, but that adds a larger dependency and it seems
not to be widely deployed. distcc in it's present form works well on small numbers of close machines owned by the same people. It might be an interesting project to investigate scaling up to large numbers of machines, which potentially do not trust each other. This would make distcc somewhat more like other "peer-to-peer" systems like Freenet and Napster.
Running distcc across OpenSSH has several security advantages and should be supported in the future. They include:
Using SSH is greatly preferable to developing and maintaining a custom security protocol.
If the client or volunteer is subverted, then the other party is not protected. (For example, if the administrator of the volunteer is malicious, or if the volunteer has been compromised, then compilation results might contain trojans.) However, this is the case for practically every Internet protocol.
Using SSH will consume some CPU cycles in computation on both client and volunteer.
A simple implementation would be trivial, since the daemon already works on stdin/stdout. However, this might perform poorly because SSH takes quite a long time to open a connection.
Connections should be hoarded by the client. If the client
doesn't already have an ssh connection to the server,
distcc
should fork, with a background task holding
the connection open and coordinating access.
When running a job locally (such as cpp or ld), distcc ought to count that against the load of localhost. At the moment it is biased towards too much local load.
distcc needs a way to know that some machines have multiple CPUs, and should accept a proportionally larger number of jobs at the same time. It's not clear whether multiprocessor machines should be completely filled before moving on to another machine.
If there are more parallel invocations of
distcc
than available CPUs it's not clear what
behaviour would be best. Options include having the
remaining children sleep; distributing multiple jobs
across available machines; or running all the overflow
jobs locally.
In fact, on Linux it seems that running two tasks on a CPU is not much slower than running a single task, because the task-switching overhead is pretty low.
Problems tend to occur when we run more jobs than will fit into available physical memory. It might be nice if there was a "batch mode" scheduler that would finish one before running the next, but in the absence of that we have to do it ourselves. I can't see any clean and portable way to determine when the compiler is using too much memory: it would depend on the RSS of the compiler (which depends on the source file), on the amount of memory and swap, and on what other tasks are running. In addition, on some small boxes compiling large code, you may actually want (or need) to have it swap sometimes.
In addition, it might be nice to have a
-
-max-load
option, as for GNU Make, to
tell it not to accept more than one job (or more than
zero?) when the machine's load average is above that
number. We can try calling getloadavg()
, which
should exist on Linux and BSD, but apparently not on
Solaris. Can take patches later.
A server-side administrative restriction on the number of consecutive tasks would probably be a sufficient approximation.
Oscar Esteban suggests that when the server is limiting accepted jobs, it may be better to have it accept source, but defer compiling it. This implies not using fifos, even if they would otherwise be appropriate. This may smooth out network utilization. There may be some undesirable transient effects where we're waiting for one small box to finish all the jobs it has queued.