[Erlang Systems]

2 How to interpret the Erlang crash dumps

This document describes the erl_crash.dump file generated upon abnormal exit of the Erlang runtime system.

The system will write the crash dump in the current directory of the emulator or in the file pointed out by the environment variable (whatever that means on the current operating system) ERL_CRASH_DUMP. For a crash dump to be written, there has to be a writable file system mounted.

Crash dumps are written mainly for one of two reasons: either the builtin function erlang:halt/1 is called explicitly with a string argument from running Erlang code, or else the runtime system has detected an error that cannot be handled. The most usual reason that the system can't handle the error is that the cause is external limitations, such as running out of memory. A crash dump due to an internal error may be caused by the system reaching limits in the emulator itself (like the number of atoms in the system, or too many simultaneous ets tables). Usually the emulator or the operating system can be reconfigured to avoid the crash, which is why interpreting the crash dump correctly is important.

2.1 Reasons for crash dumps

The reason for the dump is noted in the beginning of the file as Slogan: <reason> (the word "slogan" has historical roots). If the system is halted by the BIF erlang:halt/1, the slogan is the string parameter passed to the BIF, otherwise it is a description generated by the emulator or the (Erlang) kernel. Normally the message should be enough to understand the problem, but nevertheless some messages are described here. Note however that the suggested reasons for the crash are only suggestions. The exact reasons for the errors may vary depending on the local applications and the underlying operating system.

Other errors than the ones mentioned above may occur, as the erlang:halt/1 BIF may generate any message. If the message is not generated by the BIF and does not occur in the list above, it may be due to an error in the emulator. There may however be unusual messages that I haven't mentioned, that still are connected to an application failure. There is a lot more information available, so more thorough reading of the crash dump may reveal the crash reason. The size of processes, the number of ets tables and the Erlang data on each process stack can be useful for tracking down the problem.

2.2 Process information

After the general information in the crash dump (the date, slogan and version information) follows a listing of each living Erlang process in the system, and zombie processes. The process information for one process may look like this (line numbers have been added):

(1)  <0.2.0> Waiting. Registered as: erl_prim_loader
(2)  Spawned as: erl_prim_loader:start_it/4
(3)  Message buffer data: 262 words
(4)  Link list: [<0.0.0>,<0,1>]
(5)  Dictionary: [{fake, entry}]
(6)  Reductions 2194 stack+heap 987 old_heap_sz=987 
(7)  Heap unused=85 OldHeap unused=987
(8)  Stack dump:
(9)  program counter = 0x1875e4 (erl_prim_loader:loop/3 + 52)
(10) cp = 0xed830 (<terminate process normally>)
(11) arity = 0
(12)
(13) 1d4ae0   Return addr 0xED830 (<terminate process normally>)
(14) y(0)     ["/usr/local/product/releases/otp_beam_sunos5_r7b_patched/lib/kernel-2.6.1.6/ebin","/usr/local/product/releases/otp_beam_sunos5_r7b_patched/lib/stdlib-1.9.3/ebin"]
(15) y(1)     <0.1.0>
(16) y(2)     {state,[],none,get_from_port_efile,stop_port,exit_port,#Port<0.2>,infinity,dummy_in_handler}
(17) y(3)     infinity

Each line of the output should be interpreted as follows:

When interpreting the data for a process, it is helpful to know that anonymous function objects (funs) are given a name constructed from the name of the function in which they are created, and a number (starting with 0) indicating the number of that fun within that function.

2.3 Port information

This section lists the open ports, their owners, any linked processed, and the name of their driver or external process.

2.4 Internal table information

This section mostly contains information for runtime system developers. What can be of interest is the following fields:

The rest of the information is only of interest for runtime system developers.

2.5 ETS tables

This section contains information about all the ETS tables in the system. The following fields are interesting for each table:

2.6 Timers

This section contains information about all the timers started with the BIFs erlang:start_timer/3 and erlang:send_after/3. Each line includes the message to be sent, the pid to receive the message and how many milliseconds were left until the message would have been sent.

2.7 Distribution information

If the Erlang node was alive, i.e., set up for communicating with other nodes, this section lists the connections that were active.

2.8 Loaded module information

This is a list of all loaded modules, together with the memory usage of each module, in bytes. Note that loaded code is usually larger than the packed format in the beam files.

At the end of the list, the memory usage by loaded code is summarized. There is one field for "Current code" which is code that is the current latest version of the modules. There is also a field for "Old code" which is code where there exists a newer version in the system, but the old version is not yet purged.

2.9 Atoms

Now all the atoms in the system are written. This is only interesting if one suspects that dynamic generation of atoms could be a problem, otherwise this section can be ignored.

2.10 Disclaimer

The format of the crash dump evolves between releases of OTP. Some information here may not apply to your version. A description as this will never be complete; it is meant as an explanation of the crash dump in general and as a help when trying to find application errors, not as a complete specification.


Copyright © 1991-2003 Ericsson Utvecklings AB