<< Prev | - Up - | Next >> |
Fault
This section summarizes the operations of the Fault
module and their argument types. Please refer to the Distribution Tutorial for a full specification of the operations and examples of how to use them. This section carefully indicates where the current release is incomplete with respect to the specification (called a limitation) or has a different behavior (called a modification).
We summarize the argument types for the operations in the Fault
module.
Entity
A reference to any Oz language entity that has distributed fault modes, namely any object, cell, lock, port, or logic variable.
Level
Either site
or 'thread'(T)
, where T
is a thread reference or the atom this
.1
FStates
A set of fault states, i.e., a list that can contain at most one of each of the elements tempFail
, permFail
, remoteProblem(tempSome)
, remoteProblem(permSome)
, remoteProblem(tempAll)
, and remoteProblem(permAll)
.
OP
A record that indicates which attempted operation caused the exception or handler invocation. The value of OP
is one of:
bind(T)
, wait
, isDet
(for logic variables).
cellExchange(Old New)
, cellAssign(New)
, cellAccess(Old)
(for cells).
'lock'
(for locks).
send(Msg)
(for ports).
objectExchange(Attr Old New)
, objectAssign(Attr New)
, objectAccess(Attr Old)
, objectFetch
(for objects). A limitation of the current release is that an attempted operation on an object cannot be retried.
HandlerProc
A handler, i.e., a three-argument procedure that is called as {HandlerProc Entity FStates OP}
, where FStates
is a set of currently active fault states. A handler replaces an attempted operation on an entity.
WatcherProc
A watcher, i.e., a two-argument procedure that is called in its own thread as {WatcherProc Entity FStates}
, where FStates
is a set of currently active fault states. A watcher is invoked as soon as the site detects a fault.
When there is a distribution problem, then three items of information are made available:
Entity
: the faulty entity.
ActualFStates
: the fault states that are currently active. This is always a subset of the states that the entity is set up to detect. For objects, cells, and locks, the fault states tempFail(info:I)
and permFail(info:I)
are possible, where I
is in {state
, owner
}. This tells whether the fault is due to a lost state pointer (state
) or a crashed owner (owner
).
OP
: the operation that is attempted but does not succeed.
The system can be configured (see below) so that these three items appear in one or more of the following three ways:
In an exception with format system(dp(entity:Entity conditions:FStates op:OP) ...)
.
As arguments to a handler call, {HandlerProc Entity FStates OP}
.
As arguments to a watcher call, {WatcherProc Entity FStates}
.
A limitation of the current release is that the Entity
argument is undefined for an object operation. For handlers and watchers, this limitation can be bypassed by giving the handler and watcher procedures a reference to the object.
The Fault
module contains the following operations. All operations return a boolean flag B
that is true
if the operation succeeds and false
otherwise. All enable
and install
operations succeed if nothing was enabled or installed at that level. An entity with a successful enable
or install
at a given level is said to have fault detection at that level. All disable
and deInstall
operations succeed if nothing was disabled or deinstalled at that level. The system starts up as if {Fault.defaultEnable [tempFail permFail] _}
was executed.
All the following operations that have an Entity
argument will do nothing if entity does not have distributed fault modes. If a logic variable with fault detection is bound to a nonvariable entity, then the fault detection is transferred to the entity, provided the latter has no fault detection at that level.
{Fault.defaultEnable FStates
?B
}
Sets the default fault detection to FStates
on the current site. When an operation is attempted on an entity and there is no fault detection on the site or thread level for the entity, then the default fault detection is used. This always succeeds.
{Fault.defaultDisable
?B
}
Sets the default fault detection to nil
on the current site. This always succeeds.
{Fault.enable Entity Level FStates
?B
}
Enables fault detection on a given entity at a given level for a given set of fault states. An exception is raised if a fault is detected when an operation is attempted on the entity.
{Fault.disable Entity Level
?B
}
Disables fault detection on a given entity at a given level.
{Fault.install Entity Level FStates HandlerProc
?B
}
Installs a handler for fault detection on a given entity at a given level for a given set of fault states. The handler {HandlerProc Entity AFStates OP}
is called if a fault is detected when an operation is attempted on the entity. A modification of the current release with respect to the specification is that handlers installed on variables always retry the operation after they return.
{Fault.deInstall Entity Level
?B
}
Deinstalls a handler for fault detection on a given entity at a given level.
{Fault.installWatcher Entity FStates WatcherProc
?B
}
Installs a watcher for fault detection on a given entity for a given set of fault states. Any number of watchers can be installed on an entity. It is always possible to install a watcher, so therefore this always succeeds. The watcher {WatcherProc Entity AFStates}
is called in its own thread as soon as the site detects a fault.
{Fault.deInstallWatcher Entity WatcherProc
?B
}
Deinstalls the given watcher on a given entity. This call succeeds if WatcherProc
was installed on the entity. If there is more than one instance of WatcherProc
installed on the entity, then exactly one is deinstalled.
On a given entity at the global
level, at most one enable can be done or one handler installed. For a given entity, the site
level can have at most one fault detection per site. The 'thread'(T)
can have at most one fault detection per thread. To have another fault detection, it is necessary to do a disable or deinstall first.
The current release has the following limitations and modifications with respect to the failure model specification. A limitation is an operation that is specified but not possible in the current release. A modification is an operation that is specified but behaves differently in the current release.
Most of the limitations and modifications listed here will be removed in future releases.
The limitations are:
The fault state tempFail
is indicated only after a long delay. In future releases, the delay will be very short and based on adaptive observation of actual network behavior.
If an exception is raised or a handler or watcher is invoked for an object, then the Entity
argument is undefined. For handlers and watchers, this limitation can be bypassed by giving the handler and watcher procedures a reference to the object.
If an exception is raised or a handler is invoked for an object, then the attempted object operation cannot be retried.
The modifications are:
A handler installed on a variable will retry the operation (i.e., bind or wait) after it returns. That is, the handler is inserted before the operation instead of replacing the operation.
<< Prev | - Up - | Next >> |
thread
is already used as a keyword in the language, it has to be quoted to make it an atom.