Next Previous Contents

4. Bugs and Future Work

4.1 Reporting Bugs

If you think you have found a bug, please check the manual and the HACKING file to see if it is a known restriction. If not, please send a clear and detailed report to Martin Pool mbp@samba.org. (For a clear and detailed description of "clear and detailed", see Simon Tatham's advice on reporting bugs, http://www.chiark.greenend.org.uk/~sgtatham/bugs.html.)

4.2 Test Suite

distcc has a test suite written in Python using the PyUnit framework. It does not yet exercise all functionality, but is improving. If you discover a bug, or write new functionality, please try to add corresponding tests to make sure that the fix keeps working in the future.

4.3 Known Bugs and Restrictions

There are no known cases where distcc will produce incorrect code, but they may exist. There are some restrictions on distcc, and some possible optimizations that are not yet implemented.

An important general goal is that the code should stay as simple as possible, and secondarily be portable to reasonably current Unix-like systems. Complicating the code, or adding large dependencies is undesirable unless there's an overwhelming advantage.

4.4 Large-scale Distribution

distcc in it's present form works well on small numbers of close machines owned by the same people. It might be an interesting project to investigate scaling up to large numbers of machines, which potentially do not trust each other. This would make distcc somewhat more like other "peer-to-peer" systems like Freenet and Napster.

4.5 Execution across SSH

Running distcc across OpenSSH has several security advantages and should be supported in the future. They include:

  1. Volunteer machines will not need to open an additional network-facing service.
  2. Only authenticated users can use a volunteer machine.
  3. Clients have some guarantees that their connections to a volunteer are not being spoofed.

Using SSH is greatly preferable to developing and maintaining a custom security protocol.

If the client or volunteer is subverted, then the other party is not protected. (For example, if the administrator of the volunteer is malicious, or if the volunteer has been compromised, then compilation results might contain trojans.) However, this is the case for practically every Internet protocol.

Using SSH will consume some CPU cycles in computation on both client and volunteer.

A simple implementation would be trivial, since the daemon already works on stdin/stdout. However, this might perform poorly because SSH takes quite a long time to open a connection.

Connections should be hoarded by the client. If the client doesn't already have an ssh connection to the server, distcc should fork, with a background task holding the connection open and coordinating access.

4.6 Load Balancing

When running a job locally (such as cpp or ld), distcc ought to count that against the load of localhost. At the moment it is biased towards too much local load.

distcc needs a way to know that some machines have multiple CPUs, and should accept a proportionally larger number of jobs at the same time. It's not clear whether multiprocessor machines should be completely filled before moving on to another machine.

If there are more parallel invocations of distcc than available CPUs it's not clear what behaviour would be best. Options include having the remaining children sleep; distributing multiple jobs across available machines; or running all the overflow jobs locally.

In fact, on Linux it seems that running two tasks on a CPU is not much slower than running a single task, because the task-switching overhead is pretty low.

Problems tend to occur when we run more jobs than will fit into available physical memory. It might be nice if there was a "batch mode" scheduler that would finish one before running the next, but in the absence of that we have to do it ourselves. I can't see any clean and portable way to determine when the compiler is using too much memory: it would depend on the RSS of the compiler (which depends on the source file), on the amount of memory and swap, and on what other tasks are running. In addition, on some small boxes compiling large code, you may actually want (or need) to have it swap sometimes.

In addition, it might be nice to have a --max-load option, as for GNU Make, to tell it not to accept more than one job (or more than zero?) when the machine's load average is above that number. We can try calling getloadavg(), which should exist on Linux and BSD, but apparently not on Solaris. Can take patches later.

A server-side administrative restriction on the number of consecutive tasks would probably be a sufficient approximation.

Oscar Esteban suggests that when the server is limiting accepted jobs, it may be better to have it accept source, but defer compiling it. This implies not using fifos, even if they would otherwise be appropriate. This may smooth out network utilization. There may be some undesirable transient effects where we're waiting for one small box to finish all the jobs it has queued.


Next Previous Contents
distcc User Manual