Sandstorm Tutorial and User's Guide

Matt Welsh, mdw@cs.berkeley.edu
Last updated 11 July 2002

[Back to SEDA Release Documentation]

Introduction

This is an installation and user's guide for Sandstorm, a Java-based platform for highly-concurrent server applications. Sandstorm is the main software component of the Staged Event-Driven Architecture (SEDA) project.

Sandstorm's goal is to support server applications which demand massive concurrency (on the order of tens of thousands of simultaneous client connections), and to behave well under huge variations in load. In the staged event-driven architecture (SEDA), upon which Sandstorm is based, an Internet service is decomposed into a set of stages connected by queues. This design avoids the high overhead associated with thread-based concurrency models, and decouples event and thread scheduling from application logic. SEDA enables services to be well-conditioned to load, preventing resources from being overcommitted when demand exceeds service capacity. Decomposing services into a set of stages also enables modularity and code reuse, as well as the development of debugging tools for complex event-driven applications.

Requirements

Sandstorm is implemented in Java and has been tested primarily on Linux systems running the IBM JDK 1.1.8 and 1.3, as well as Sun JDK 1.3 and 1.4. Note that despite using Java, we have found the performance of Sandstorm to match or surpass that of systems built in C/C++ -- see the SEDA project pages for more details.

Sandstorm should work on most operating systems and JVMs, provided that two basic facilities are available: native threads (as opposed to the now-outdated "green threads", which are no longer supported by most JVMs anyway), and native nonblocking I/O support. Nonblocking I/O can be provided in one of two ways: using either JDK 1.4 (which provides nonblocking I/O through the java.nio package), or using our own NBIO package, which is included in the SEDA software release. NBIO provides nonblocking I/O for a wide range of UNIX and Win2k JVMs, and is the precursor to JDK 1.4's java.nio library.

Note that NBIO requires native threads --- not "green threads" --- because of interactions between nonblocking I/O and the JVM's thread management. When using Sandstorm and nonblocking I/O, you should be able to write applications with a very small number of threads, so the scalability limits of some native threads packages (e.g., under Linux) should not be a problem.

If you wish to use NBIO with Sandstorm, please read the NBIO release documentation for compilation instructions for that package. Note that Sandstorm is compiled and configured to use NBIO by default; if you wish to use JDK 1.4 instead, see below.

The use of nonblocking I/O is vital for obtaining good performance and high concurrency. Sandstorm does not support "threadpool" emulations of nonblocking socket I/O, and I have no plans to do so. I have demonstrated that emulation of nonblocking I/O using threads does not scale beyond a few hundred socket connections. The Sandstorm disk layer currently uses a threadpool model, because nonblocking file I/O is not supported by most operating systems.

Setup and Compilation

All of the SEDA code is in the top-level Java package seda. Sandstorm is in the package seda.sandStorm, and NBIO is in seda.nbio. To compile or use the Sandstorm software, you need to configure the following environment variables:

  1. Be sure that your CLASSPATH contains the following two entries:

    For example, under Bourne-style UNIX shells, type:

      CLASSPATH=$CLASSPATH:/path/to/seda/src:.
      export CLASSPATH
    
    Under C-style shells:
      setenv CLASSPATH "$CLASSPATH":/path/to/seda/src:.
    
    Replacing /path/to/seda with the directory where you unpacked the SEDA release.

  2. If you are using NBIO, be sure that your LD_LIBRARY_PATH contains the "seda/lib" directory in the SEDA release. This is necessary for the JVM to locate the NBIO native libraries. If you are using JDK 1.4 instead of NBIO, you can skip this step.

    Under Bourne-style shells,

      LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/seda/lib
      export LD_LIBRARY_PATH
    
    Under C-style shells,
      setenv LD_LIBRARY_PATH "$LD_LIBRARY_PATH":/path/to/seda/lib
    

  3. Place seda/bin on your PATH. This directory contains the sandstorm script for starting the Sandstorm runtime, as well as other tools.

    Under Bourne-style shells,

      PATH=$PATH:/path/to/seda/bin
      export PATH
    
    Under C-style shells,
      setenv PATH "$PATH":/path/to/seda/bin
    
Building the code and Javadoc documentation

To compile all of the code in the SEDA tree, including NBIO, Sandstorm, and related utility code, simply cd to the seda/src/seda directory and type make. If there are any problems with compilation, be absolutely sure that seda/src and "." are in fact on your CLASSPATH. This is the most common problem when building the code.

To build the Javadoc documentation for the SEDA tree, cd to seda/docs/javadoc and type make. You can then browse the Javadocs by pointing your browser at javadoc/index.html.

Testing

You can now test the installation by changing to the seda/src/sandStorm/test/basic directory and running:

  sandstorm sandstorm.cfg
You should see output resembling the following:

Sandstorm v2.2 
  Starting at Thu Mar 14 15:14:10 PST 2002

ThreadPoolController: Started, delay 2000 ms, threshold 1000, autoMaxDetect false
Sandstorm: Starting aSocket layer
aSocket layer using JDK1.4 java.nio package
SelectSource created, do_balance = true
TP <aSocket ReadStage>: Adding 1 threads to pool, size 1
TP <aSocket ReadStage>: Starting 1 threads, maxBatch=-1
SelectSource created, do_balance = true
TP <aSocket ListenStage>: Adding 1 threads to pool, size 1
TP <aSocket ListenStage>: Starting 1 threads, maxBatch=-1
SelectSource created, do_balance = true
TP <aSocket WriteStage>: Adding 1 threads to pool, size 1
TP <aSocket WriteStage>: Starting 1 threads, maxBatch=-1
Sandstorm: Starting aDisk layer
ThreadPoolController: Started, delay 2000 ms, threshold 20, autoMaxDetect false
TP <AFileTPTM Stage>: Adding 1 threads to pool, size 1
TP <AFileTPTM Stage>: Starting 1 threads, maxBatch=-1
Sandstorm: Loaded GenStage1 from GenericHandler
Sandstorm: Loaded SinkStage from DevNullHandler
Sandstorm: Loaded TimerStage from TimerHandler
Sandstorm: Initializing stages
-- Initializing <GenStage1>
sleep=100
TP : initial 1, min 1, max 10, blockTime 1000, idleTime 1000
TP : Adding 1 threads to pool, size 1
TP : Starting 1 threads, maxBatch=-1
-- Initializing <SinkStage>
Started
TP : initial 1, min 1, max 10, blockTime 1000, idleTime 1000
TP : Adding 1 threads to pool, size 1
TP : Starting 1 threads, maxBatch=-1
-- Initializing <TimerStage>
delay=500
TP : initial 1, min 1, max 10, blockTime 1000, idleTime 1000
TP : Adding 1 threads to pool, size 1
TP : Starting 1 threads, maxBatch=-1
Sandstorm: Waiting for all components to start...
TimerHandler: GOT QEL: seda.sandStorm.core.BufferElement@b1f2275
GenericHandler: GOT QEL: seda.sandStorm.core.BufferElement@b1f2275

Sandstorm: Ready.

DevNullHandler: GOT QEL: seda.sandStorm.core.BufferElement@b1f2275
TimerHandler: GOT QEL: seda.sandStorm.core.BufferElement@b1f2275
GenericHandler: GOT QEL: seda.sandStorm.core.BufferElement@b1f2275
DevNullHandler: GOT QEL: seda.sandStorm.core.BufferElement@b1f2275

If any exceptions are thrown or other error messages are displayed, then something is wrong; you should try to track these down yourself or contact me.

Support for JDK 1.4 java.nio

Sandstorm supports nonblocking I/O using either NBIO or JDK 1.4's java.nio package. By default Sandstorm is compiled and configured to use NBIO; to use java.nio follow the instructions here.

  1. First, edit seda/src/seda/sandStorm/lib/aSocket/Makefile and change the line
       SUBDIRS = nbio test
    
    to
       SUBDIRS = nio test
    

    You should be able to type make in the seda/src directory to rebuild your Sandstorm tree with java.nio support enabled. (Note that you are welcome to compile both NBIO and java.nio support into your tree; you can select which library to use at runtime as described in the next step.)

  2. To enable java.nio support, be sure your Sandstorm configuration file (sandstorm.cfg) includes the lines
    <global>
       <aSocket>
         provider NIO
       </aSocket>
    </global>
    
    (That is, add the provider NIO line to the <aSocket> section of the file.)

    You can reenable NBIO support simply by changing this line to

         provider NBIO
    

When Sandstorm starts, one of the first lines it displays should indicate whether NBIO or java.nio is being used.

Sandstorm Programming API

This section discusses the basic programming interfaces used by Sandstorm applications. Consult the Javadoc API documentation for Sandstorm for more details on the specific APIs. (This documentation is generated as part of the build process for Sandstorm; if you have not yet compiled the code, you can get it from the Web at this link.) Most of the application-level APIs are in the package seda.sandStorm.api.

Stages and Event Handlers

Sandstorm applications are constructed as a set of stages connected by event queues. Threads are used to drive stage execution. However, the programming model hides the details of stages, queues, and threads from the application. The only component that the application implements is a set of event handlers. An event handler implements the core event-handling logic of a stage. Therefore an event handler is the application-level code which is "embedded" in a stage.

An event handler (represented by the interface EventHandlerIF) implements the following set of methods:

  public void handleEvent(QueueElementIF elem) 
        throws EventHandlerException;

  public void handleEvents(QueueElementIF elemarr[])
        throws EventHandlerException;

  public void init(ConfigDataIF config) throws Exception;

  public void destroy() throws Exception;

A QueueElementIF is the basic representation of an event; it is an empty interface which the application provides implementations of to represent different events in the system.

handleEvent and handleEvents are the basic event-handling code. These two methods are invoked by the runtime when events are pending for a stage. init and destroy are used for initialization and cleanup, respectively.

Note that event handlers do not directly manage threads or consume events from their incoming event queue. This is a deliberate design decision, which decouples the details of thread and queue management from the application logic. The runtime is responsible for allocating and scheduling threads, performing queue operations, and so forth.

Event queues, SinkIF, and StageIF

Each stage has one or more associated event queues, but only one event handler which processes events from those queues. As stated above, an event handler does not directly dequeue events from its incoming event queues; rather, the runtime system manages this and invokes the handler's handleEvent or handleEvents method with the pending events.

An event queue is represented by two ends -- a "source" and a "sink". The source allows dequeue operations and is only visible to the Sandstorm internals. The sink allows enqueue operations, and is visible to event handlers which wish to deliver events to a given stage. The event sink is represented by the class SinkIF, which supports various enqueue operations: enqueue, enqueue_many, and enqueue_lossy The first two methods allow one or more events to be pushed onto the queue. These methods may throw an exception if the sink is full, or if it is no longer being serviced (i.e. the sink is "closed"). enqueue_lossy allows events to be enqueued, but does not throw an exception; rather, the events are dropped if the sink is full or closed. SinkIF also supports operations which return the number of pending events on the queue, as well as a "split-phase" enqueue operation. See the Javadocs for details.

An event handler communicates with another stage by obtaining a handle to that stage's StageIF. This is accomplished through the ManagerIF interface, which is passed in as part of the ConfigDataIF to the event handler's init method. The StageIF allows an event handler to get a handle to that stage's SinkIF. Note that a stage may have more than one SinkIF associated with it; this is to support differentiated event queues of various kinds. For the most part applications make use of the "main" SinkIF associated with a stage.

Event Handler Example

Here is the code for a very simple event handler which prints a message for every received event, and passes the event along to another stage:

  import seda.sandStorm.api.*;

  public class printEventHandler implements EventHandlerIF {

    private SinkIF nextStageSink;

    public void init(ConfigDataIF config) throws Exception {
      
      // Get system manager
      ManagerIF mgr = config.getManager();

      // Get stage with name "nextStage"
      StageIF nextStage = mgr.getStage("nextStage");

      // Get nextStage's event sink
      nextStageSink = nextStage.getSink();

      System.err.println("printEventHandler initialized");
    }

    public void destroy() {
      System.err.println("printEventHandler shutting down");
    }

    public void handleEvent(QueueElementIF elem) 
      throws EventHandlerException {

      System.err.println("printEventHandler: Got event "+elem);

      // Pass the event along
      nextStageSink.enqueue(elem);
    }

    public void handleEvents(QueueElementIF elemarr[]) 
      throws EventHandlerException {

      // Just call handleEvent in FIFO order
      for (int i = 0; i < elemarr.length; i++) {
        handleEvent(elemarr[i]);
      }
    }
  }

Threading and Concurrency Issues

Each stage contains a thread pool (hidden from the application) that processes events from the stage's incoming event queue. The pool contains 1 thread by default, but the thread pool sizing controller will grow and shrink the thread pool based on demand. See below for information on configuring the parameters of the thread pool size controller.

Essentially this means that multiple threads may be executing a stage's event handler at once, so you need to be aware of synchronization and concurrency issues within an event handler. Any global data structures should be protected using the standard Java synchronized primitive. You may also make the entire handleEvent() or handleEvents() methods synchronized, however, this will prevent multiple threads from executing within your event handler at once (limiting parallelism and performance). In addition, multiple event handlers may be executing simultaneously, so if several stages share a common data structure, you must use synchronized to access them as well.

In general these concurrency issues are not difficult to work around, and are much simpler than if implementing a thread-per-request style of concurrency (since synchronization is typically limited within each stage). However, if your stage code is much too complex and you wish to force only one thread to execute within it, you have three options:

  1. Make the handleEvent() and/or handleEvents() methods synchronized. This approach is not recommended, as the thread pool sizing controller may allocate multiple threads to your stage without realizing that only one can execute at a time.
  2. Edit the Sandstorm configuration file to limit the thread pool to a single thread for that stage; see below for details. This is also not recommended as it is possible to create stages at runtime for which the thread pool controller parameters have not been specified in the configuration file.
  3. The preferred method is to have the event handler class implement SingleThreadedEventHandlerIF in addition to EventHandlerIF. This is an empty interface that indicates to the Sandstorm runtime that the stage should not be allocated more than 1 thread.

A related concurrency issue arises when there are multiple threads in a stage and some ordering constraint on the events being processed by the stage. Say you have events A, B, and C that must be processed in that order. If A, B, and C are dispatched to separate threads they could be processed in any order. In general you will have to solve this problem yourself, that is, create the appropriate synchronization primitives to ensure this ordering (such as a shared queue of events that threads pull from in the "right order").

This issue comes up most commonly in the context of processing incoming data on a TCP socket, using the aSocket library described below. To ensure that your stages process incoming TCP packets in the correct order, each aTcpInPacket event has an associated sequence number field that is incremented for each packet received on a connection - you can use this field to reorder the packets for processing. aSocketInputStream is a convenience class that automatically reorders incoming TCP packets and represents them as a stream of contiguous bytes to simplify packet processing.

Asynchronous Sockets

The package seda.sandStorm.lib.aSocket provides an asynchronous socket communication facility for Sandstorm applications. This is very similar to the original ninja2.core.io_core.aSocket API, if you are familiar with that. You should use the Sandstorm version of this library, not the original "io_core" version (the former is integrated into Sandstorm's runtime system and has better performance). This library supports both TCP (stream-based) sockets, UDP (datagram) sockets, and multicast UDP sockets.

An application creates an outgoing socket connection by creating an ATcpClientSocket, providing the hostname, port number, and SinkIF for the stage requesting the connection. When the connection is established a ATcpConnection object is pushed to the given SinkIF. To send new packets over the connection, the application enqueues a BufferElement to the ATcpConnection object. (A BufferElement is just a wrapper for a byte array.) When new packets are received, an ATcpInPacket object is pushed to the application; this contains a BufferElement with the contents of the packet, a pointer to the ATcpConnection from which the packet was received, and a sequence number that indicates what order the packet was received on the connection.

Server sockets are created by instantiating an ATcpServerSocket, which pushes incoming connections (as ATcpConnections objects) to the application.

The classes for UDP and multicast UDP sockets are similar in structure, and are called AUdpSocket and AMcastSocket, respectively.

Asynchronous Files

The package seda.sandStorm.lib.aDisk provides an asynchronous file library. The primary way to access this mechanism is to create an object of type AFile; you can enqueue I/O requests to the file through its read, write, and seek methods. When the I/O operations complete, AFileIOCompleted events are pushed to the client's SinkIF.

The current implementation makes use of blocking file I/O and a small thread pool; however, we are currently working on a true asynchronous implementation using the POSIX.4 AIO mechanism.

See the Javadoc documentation for this library for more detail.

Timers

An application can request that events be delivered to a SinkIF at some time in the future using the seda.sandStorm.core.ssTimer library. This class supports registerEvent and cancelEvent calls and is fairly self-explanatory.

Other Libraries

Other useful libraries are provided in the sandStorm/lib directory. The seda.sandStorm.lib.http package provides an asynchronous HTTP protocol implementation based on aSocket. Likewise, seda.sandStorm.lib.Gnutella provides a Gnutella protocol implementation. In the test directory within these two packages you will find simple HTTP and Gnutella servers implemented using these libraries; the Javadoc documentation also makes it clear how to use them.

Sandstorm Configuration File

The Sandstorm configuration file (usually called sandstorm.cfg) uses an XML-like format, consisting of nested sections delimited by tags of the form

  <sectionName> ... </sectionName>
The exact contents of this file are bound to change in future Sandstorm releases, but the basic format should remain the same.

The file test/basic/sandstorm.cfg provides an example Sandstorm configuration file. Note that this file contains the complete set of options in order to document their use, but in general a config file should only contain those options which you need to specify. A simple configuration file is shown below:

# Example Sandstorm configuration file
<sandstorm>

  <stages>

    <firstStage>
      class seda.test.firstStageHandler
      <initargs>
        somearg1 somevalue1
        somearg2 somevalue2
      </initargs>
    </firstStage>

    <secondStage>
      class seda.test.secondStageHandler
    </secondStage>

  </stages>

  <!include options.cfg>

</sandstorm>

Comments are started by # characters and extend to the end of the line. Files may be included (nested includes are supported) using the <!include filename> directive.

The main section of the file must be named <sandstorm>, and consists of a <stages> which contains one subsection per stage. There is also a <global> section used for defining global options; see below. The contents of each section consist of key-value pairs of the format

  key1 value1
  key2 value2
  ...
  keyN valueN
where key is a single (whitespace-delimted) word and value is an arbitrary string which may contain whitespace. The value field extends to the end of the line.

<stage> sections

Each subsection of the <stage> section is formatted as follows:

<stage-name>
Specifies the name of the stage as it will be registered in the system. This name is used to perform stage lookups through the ManagerIF interface.

class class-name
(Mandatory) Specifies the fully-qualified class name of the stage's event handler.

queueThreshold value
(Optional) Specifies the threshold for this stage's incoming event queue. value must be a positive integer. If the event queue reaches this threshold then new entries cannot be enqueued onto it; a SinkFullException will be thrown instead. This field is optional; the default value is -1, which indicates an infinite event queue threshold.

<initargs> arguments </initargs>
(Optional) The <initargs> subsection contains the initial arguments passed to the event handler's init() method. It consists of a set of key-value pairs which the event handler accesses through the ConfigDataIF interface. The initargs are optional.

The initialization arguments for each stage are derived from three different sources. In order of precedence, they are:

  1. Arguments passed on the sandstorm command line;
  2. Arguments contained in the <initargs> subsection for the corresponding <stage>; and
  3. Arguments contained in the <initargs> subsection in the <global> section of the configuration file.
This precedence order allows you to, say, override any initialization arguments directly on the command line, while providing defaults in the configuration file.

<global> section

The config file may also contain a <global> section which defines global options. This section is optional and in general should not be used by applications; the default values should be acceptable. If you wish to use it, however, here is the format:

<initargs> arguments </initargs>
(Optional) Contains initial arguments passed to every stage in the application. If the stage has its own <initargs> section, that will override the global initargs specified here.

<threadPool> options </threadPool>
(Optional) Defines options for thread pool management. These options are:

<sizeController> options </sizeController>
(Optional) Defines options for the thread pool sizing controller. These options are:

enable boolean
(Optional) Enable or disable the sizing controller, which resizes stage thread pools when the stage's incoming event queue reaches some threshold. The thread governor automatically allocates threads based on its observation of load against a stage in an attempt to avoid overload conditions. The value of boolean must be true or false. Default is false.

threshold value
(Optional) Specifies the queue threshold which triggers a thread being added by the controller. Default is 1000.

delay delay
(Optional) Specifies the sampling delay (in milliseconds) for the controller. Default is 2000 ms.

idleTimeThreshold delay
(Optional) Specifies the idle time which causes a thread to be removed from the pool. Default is 1000 ms.

maxThreads value
(Optional) Specifies the maximum number of threads in a thread pool. Default is 20.

minThreads value
(Optional) Specifies the minimum number of threads in a thread pool. Default is 1.

initialThreads value
(Optional) Specifies the initial number of threads in a thread pool. Default is 1.

blockTime value
(Optional) Specifies the time which each thread blocks waiting for events to arrive. Default is 1000 ms.

<profile> options </profile>
(Optional) Defines options for the Sandstorm profiler. These options are:

enable boolean
(Optional) Specifies whether the Sandstorm system profiler should be enabled. The profiler generates a log file containing measurements of event queue lengths and memory usage in the system. See Using the profiler, below, for more details. boolean must be true or false; the default value is false. Use of the -profile option on the sandstorm command line overrides this value (if set) to true.

delay value
(Optional) Specifies the sampling delay (in milliseconds) for the profiler. Default is 1000 ms.

filename value
(Optional) Specifies the filename which the profile will be written to. Default is ./sandstorm-profile.txt.

graph boolean
(Optional) Specifies whether the profile should generate a graph of stage connectivity during runtime. Default value is false.

sockets boolean
(Optional) Specifies whether the outgoing queue length for sockets should be included in the profile. Default value is false.

The programmatic interface for configuring the Sandstorm system is given by the class seda.sandStorm.main.SandstormConfig. This class allows the system to manipulate arbitrary key/value pairs (where both key and value are specified as strings, but may be converted to and from booleans, ints, and doubles as well). See the source code for this class to understand how configuration options are specifed and used by the runtime.

Running Sandstorm

The basic way to run Sandstorm is to use the sandstorm script, found in the seda/bin directory. This script uses the following syntax:

  sandstorm [-profile] <configfile> [initargs ...]
where configfile is the name of the Sandstorm configuration file (described in detail below). Use of the -profile option turns on the Sandstorm profiler (also described below), and overrides the profile option (if any) in the configuration file.

Any setting in the configuration file can be overridden using command-line arguments when you run sandstorm. These are of the form name=value and are specified on the command line after the configuration file name. For example,

  sandstorm myconfiguration.cfg global.threadPool.minThreads=2
sets the global.threadPool.minThreads parameter to 2, regardless of its setting in the configuration file.

If you specify a command-line argument name that does not contain a dot (.), it will be passed as an initialization argument to every stage, as if it were specified in the <initargs> subsection of every <stage> section. For example,

  sandstorm myconfiguration.cfg foo=bar
sets the initialization parameter named foo to the value bar for every stage; stages can retrieve this value by calling ConfigDataIF.getString().

In addition to invoking Sandstorm from the command line, it is possible to embed Sandstorm in another Java application; see below for details.

Other Features

There are many other features in Sandstorm that we haven't documented here, including queue admission controllers, adaptive response time controllers, the batching controller, and so forth. Most of these features are not important for simple application development, but you may wish you use them to condition your sevice to heavy load. Each stage's event queue can have an associated admission control algorithm that determines whether an event is accepted or not; admission controllers implementing simple thresholding and token-bucket rate limiting are provided. Several response time controllers are provided to tune admission control parameters to meet a response time target. All of this code and associated interfaces are easily discovered in the seda/sandStorm/internal directory.

Using the Profiler

The Sandstorm profiler is extremely valuable for understanding the performance of applications, and for identifiying bottlenecks. When enabled (by setting profile true in the <global> section of the configuration file, or by using the -profile option on the sandstorm command line), the profiler samples queue lengths and the Java heap size every 100 ms and appends a record to the file sandstorm-profile.txt in the current directory. (These values can be changed in the configuration file.)

If you have Gnuplot installed, you can visualize the Sandstorm profiler output by running the script ssprofile-graph in the Sandstorm bin directory; this script takes a single argument which is the name of the profile logfile. ssprofile-graph draws a graph of the queue lengths and heap size over time. You can edit this script to customize the output if you like. Here is an example of the display produced by ssprofile-graph:


(Click to enlarge)

At runtime you can get a handle to the system profiler using the ManagerIF.getProfiler() method. ProfilerIF defines the profiler API. The add() method allows you to add an object to the profiler's queue-length trace; anything which implements ProfilableIF can be profiled in this way.,

The addGraphEdge() and dumpGraph() methods allow the profiler to generate a graph depicting the connectivity between stages. Calling addGraphEdge() adds a pair of nodes to the graph with a directed edge between them. dumpGraph dumps the graph to an output file which you can visualize using the graphiviz tool from AT&T Research. Here is an example graph generated using this approach:


(Click to enlarge)

This graph has been beautified somewhat by hand, but you get the idea. Soon I will be packaging up the tools to make this process somewhat more automatic.

Embedding Sandstorm

Sandstorm is designed to be embedded into other applications. In this way you can create a Sandstorm instance within some other application and control it programmatically, creating and destroying stages, and so forth.

To do this, simply create an object of type seda.sandStorm.main.Sandstorm. In general you should only create one Sandstorm per Java Virtual Machine, since multiple instances may interfere with one another in terms of sockets, thread scheduling, and so forth.

There are three constructors for the Sandstorm class:

public Sandstorm()
Create a Sandstorm with default configuration parameters.
public Sandstorm(String filename)
Create a Sandstorm and read the configuration from the given filename. This file is simply a Sandstorm configuration file as described above.
public Sandstorm(SandstormConfig config)
Create a Sandstorm with the configuration parameters specified by the given SandstormConfig object. See the Javadocs for details on using SandstormConfig.

The Sandstorm class exports two other methods which you can use to control the Sandstorm instance from your application:

public ManagerIF getManager()
Returns a handle to the ManagerIF for this Sandstorm instance. This interface allows you to create stages, obtain handles to stages, and so forth.
public ManagerIF getSystemManager() o
Returns a handle to the SystemManagerIF for this Sandstorm instance. This is a lower-level interface allowing both stages and thread managers to be created. It should only be used if you need to use a thread manager other than the default.

Sandstorm Programming Style Guide

A great deal of programming in Sandstorm is a matter of style. I have tried to design the APIs to encourage a particular way of doing things, and to discourage "abuses" of Java functionality which can lead to poor performance or badly-structured applications. Still, I think it's a good idea to provide a list of "do's and don'ts" about programming in Sandstorm.

Don't allocate or manage threads in your application.
This is the single most important aspect of programming in Sandstorm. Threads are only created and managed by the runtime system, not by applications. If you find yourself wanting or needing to use threads in your system, ask yourself why you are doing this.

Programming in Sandstorm requires a fundamental shift in thought -- you are no longer using a single thread to process a single request through the system; rather, you are breaking the request into multiple stages through which events flow. Anything you can do with threads you can do with this model, although it may seem a bit strange at first.

The problem with applications managing their own threads is that they will inevitably interfere with the thread allocation and scheduling policies performed by the Sandstorm runtime system. If you are unhappy with the performance or behavior of Sandstorm's two standard thread managers, then consider implementing your own. The appropriate interface is seda.sandStorm.api.ThreadManagerIF and there are two implementations in the internal directory which you can use as a guide. In general I do not recommend taking this approach, but it is an option.

Don't share memory between stages.
This one is a bit trickier and has to do with preventing deadlocks. If two stages directly access a shared object, then they must typically use Java's synchronization primitives to implement correct concurrent access to that object. The problem here is that one stage may grab a lock and hold it for a very long time while another stage attempts to acquire the same lock. This can lead to both deadlock (where there is a cyclic dependency between stages acquiring locks) and livelock (where one stage is holding the lock and making progress, but in doing so is preventing other stages from making progress).

I strongly recommend that stages pass data by value rather than by reference. It is fine for an event passed between stages to contain a pointer to an object, as long as the pointer is not used concurrently by more than one stage. If you can adopt a "pass by value" programming style then you will avoid these problems.

Don't use blocking I/O.
In general stages should never sleep. Doing so causes a serious problem for concurrency through the system, since more threads must be allocated to deal with overlapping requests to the sleeping stage. The more threads, the lower the performance. Your goal in designing a stage should be to avoid blocking as much as possible. Among other things this means not using the standard blocking I/O libraries found in the JDK.

Sandstorm provides nonblocking, asynchronous network I/O in the form of the aSocket library. For TCP, UDP, and multicast sockets you should never have to resort to using the original blocking Socket and ServerSocket APIs from the JDK. Asynchronous file access is provided by the seda.sandStorm.lib.aDisk library; this actually makes use of blocking I/O but is incorporated into the Sandstorm APIs and thread pool control mechanisms so as to be extremely efficient.

Note that the thread pool controller automatically allocates new threads (up to some limit) to a stage which appears to be saturated. The idea here is that if a stage is sleeping for some reason then the system automatically doles out new threads to it. This can be used to avoid bottlenecks due to sleeping, but only up to a certain point (that is, only until we allocate so many threads that performance starts to suffer). This feature is still experimental and not meant for production use; it is much better to avoid blocking than to rely on this mechanism.

Don't sleep.
This is related to the above point: stages should attempt to never sleep. If you need to schedule work to be done at some time in the future, use the seda.sandStorm.core.ssTimer class to send a timer event to your stage instead.

Debugging Hints and Other Tips

Many of the classes making up the Sandstorm implementation support a simple debugging feature: at the top of the class there is a definition of the form

   private static final boolean DEBUG = false;
By changing the value of DEBUG to true, and recompiling the corresponding class, that class will emit a lot of debugging information to stderr at runtime. This is useful if you suspect a bug inside of Sandstorm somewhere (although I am sure it is 100% bug-free :-)). The same can be done for the NBIO library. Be sure to use the Sandstorm profiler as well, since this can be used to find a number bugs including performance bottlenecks and memory leaks.

If you do have a problem with Sandstorm please don't hesitate to get in touch with me. It would be helpful if you can isolate the problem you are seeing to a simple reproducable case.


Matt Welsh, mdw@cs.berkeley.edu