Using SEE, the Simple ECMAScript Engine

by David Leonard, 2005
for SEE version 1.3

The impatient may want to jump straight to the §4.1 code example.

Table of contents

Introduction

The Simple ECMAScript Engine ('SEE') is a parser and runtime library for the popular ECMAScript language. ECMAScript is the official name for what most people call JavaScript:

[ECMAScript] is based on several originating technologies, the most well known being JavaScript (Netscape) and JScript (Microsoft). The language was invented by Brendan Eich at Netscape and first appeared in that company's Navigator 2.0 browser. It has appeared in all subsequent browsers from Netscape and in all browsers from Microsoft starting with Internet Explorer 3.0. (ECMA-262 standard, 1999)

SEE [almost] fully complies with ECMAScript Edition 3, and to JavaScript 1.5. It has compatibility modes that allow it to run scripts developed under earlier versions of JavaScript, Microsoft's JScript and LiveScript.

This documentation is intended for developers wishing to incorporate SEE into their applications. It explains how you can use SEE to:

This documentation does not explain the ECMAScript language, nor discuss how to build the library on your system.

SEE includes an example application, called see-shell which allows interactive use of the interpreter, and demonstrates how to write host function objects.

Document conventions

I will use the phrase host application to mean your application, or any application that uses the SEE runtime environment auxillary to some primary purpose. Examples of a host application are web browsers and scriptable XML processors.

Throughout this documentation, references are made to the C functions and macros provided by the SEE library. To avoid definitional redundancy and to improve precision, the reader is encouraged to examine the SEE header files to find the precise definitions and arguments of each function or macro. Signatures for C macros are given, but you should understand that the compiler cannot normally typecheck your use of those macros.

Where literal C code is used, it is typeset in a monospace font, like this:

if (failed) { abort(); }

Similarly, ECMAScript code is typeset in a sans serif font, like this:

window.location = "about:blank";

Elided code is indicated with an elipsis: ...

1 Requirements

Compiling SEE requires an ANSI C compiler. Although the SEE library is essentially self-contained, it does depend on you (the host application developer) providing the following:

an IEEE 754 floating point type
Most modern compilers have this, but if you are developing for some obscure architecture, you should check.
a garbage-collecting memory allocator
The free Boehm gc is highly recommended (See also §3.1).

SEE uses scripts from GNU autoconf to determine if these are available, and also to determine other system-dependent properties. Host applications should #include <see/see.h> to access all the macros and functions prototypes.

⚠ Note: A future release of SEE will use IBM's ICU library for Unicode support.

(As a developer you may find the need to edit header files and configure scripts to make SEE compile on your system. I would be interested in hearing what changes were needed so that future releases can supply this automatically for other users. Please send mail to leonard@users.sourceforge.net.nospam.)

2 Creating interpreters

The first step in executing ECMAScript program text with SEE is to create yourself an interpreter instance. Each interpreter represents a reusable execution context. When created, they are initialised with all the standard ECMAScript objects (such as Math and String).

First, have your application allocate storage for a SEE_interpreter structure and then call SEE_interpreter_init() to initialise that structure.

void SEE_interpreter_init(struct SEE_interpreter *interp);

A pointer to the initialised SEE_interpreter structure is required for almost every function that SEE provides.

Here is an example where the storage has been allocated on the stack, and consequently the interpreter only exists until the function returns.

void
example()
{
    struct SEE_interpreter interp_storage;

    SEE_interpreter_init(&interp_storage);
    /* now the interpreter is ready */
}

There is no mechanism for explicitly destroying an initialised interpreter; instead, SEE relies on the garbage collector to reclaim all unreferenced storage. If you want finalization semantics, you will need to arrange that yourself.

⚠ Note: A future release of SEE may provide hooks for host object finalisation.

2.1 Multiple simultaneous interpreters

SEE supports multiple independent interpreter instances. This is useful, for example, in an HTML web browser application, where each window may need its own interpreter instance because the variables and bindings to built-in objects must be different and separate in each one.

SEE's functions are not inherently thread-safe, but multiple different interpreters can be safely used by different threads. This is because all data used by the library is attached to the SEE_interpreter structure; there are no mutable global data structures. Interpreters can remain completely independent of each other in this way if you:

2.2 Fatal error handlers

If SEE encounters an internal error (such as memory exhaustion, memory corruption, or a bug), it calls the global function pointer SEE_system.abort, passing it a pointer to the interpreter in context (or NULL), and a short descriptive message. The SEE_system.abort hook initially points to a wapper function that simply calls the C library function abort(). You can set the hook early if you want to handle errors more gracefully. Its signature is:

extern struct {
    ...
    extern void (*abort)(struct SEE_interpreter *interp, const char *msg) _SEE_dead;
    ...
} SEE_system;

A convenience macro, SEE_ABORT() is provided for applications to call through this hook.

extern void (*SEE_ABORT)(struct SEE_interpreter *interp, const char *msg);

3 Memory management

SEE uses a garbage collecting memory allocator. SEE has global function pointers for memory allocation that the host application can configure. These hooks must be set up before any interpreter instances are created.

SEE manages memory by calling through the following function pointers stored in the global structure SEE_system. Your host application can replace them before it creates any interpreters.

extern struct {
    ...
    void * (*malloc)(struct SEE_interpreter *interp, SEE_size_t size);
    void * (*malloc_string)(struct SEE_interpreter *interp, SEE_size_t size);
    void   (*free)(struct SEE_interpreter *interp, void *ptr);
    void   (*mem_exhausted)(struct SEE_interpreter *interp);
    ...
} SEE_system;

These hooks are invoked through the following functions:

void * SEE_malloc(struct SEE_interpreter *interp, SEE_size_t size);
void * SEE_malloc_string(struct SEE_interpreter *interp, SEE_size_t size);
void SEE_free(struct SEE_interpreter *interp, void **datap);

Notice that SEE_free() takes a pointer-to-a-pointer, unlike its counterpart in the SEE_system structure. The pointer will be set to NULL after freeing. Freeing a NULL pointer with SEE_free() has no effect.

If SEE was compiled with Boehm-gc support, SEE_system.malloc is initialised to point to a wrapper around the GC_malloc() function, SEE_system.malloc_string is initialised to point to a wrapper around GC_malloc_atomic(), and SEE_system.free is initialised to point to a wrapper around GC_free(). Otherwise, the functions will use the system malloc().

If you intend to hook in your own memory allocator, be aware that any of these hooks may be called with a NULL interpreter argument, indicating unknown context. The malloc hooks must not throw exceptions, but should return NULL on failure.

Instead of calling the hooks directly, application code should use these three convenient macros to allocate storage:

T * SEE_NEW(struct SEE_interpreter *interp, type T);
T * SEE_NEW_ARRAY(struct SEE_interpreter *interp, type T, int length);
T * SEE_NEW_STRING_ARRAY(struct SEE_interpreter *interp, type T, int length);
T * SEE_ALLOCA(struct SEE_interpreter *interp, int length, type T);

A usage example is:

char *buffer = SEE_NEW_STRING_ARRAY(interp, char, 30);

These macros check for a memory allocation failure indicated by the system allocator returning NULL. In this event they will assume an out-of-memory condition and call the SEE_system.mem_exhausted(). This hook defaults to a function that simply calls SEE_ABORT(). Your application may prefer to change the mem_exhausted hook to handle this situation more gracefully.

It is worth familiarizing yourself with the macro definitions to see what they do. See <see/mem.h> for the definitions.

3.1 On memory allocators

Why is SEE so dependent on a garbage collector? Why doesn't it use reference counting?

This subsection is a short diversion on answering this good question. I have asked myself the same thing about other applications that use garbage collectors. I'll justify SEE's reliance on a garbage collector with the following reasons:

4 Running programs

SEE's ultimate purpose is to execute user scripts. A full script, or a self-contained fragment of a script is referred to as program text. You should execute program text using the following general strategy:

  1. obtain a reference to an (initialised) SEE_interpreter (§2);
  2. construct a SEE_input unicode stream reader (§4.2) to transport the ECMAScript program text to SEE's parser;
  3. establish a try-catch context (§4.3);
  4. call the function SEE_Global_eval() to parse and evaluate the stream;
  5. handle any exceptions caught in the try-catch context (§4.3);
  6. examine the value result returned (§5) (optional)

The SEE_Global_eval() function is able to execute program text and then store the value associated with the last executed statement in a location given by a value pointer. In a non-interactive environment, this last statement's value is usually meaningless, and the value result return pointer ('res') given to SEE_Global_eval() may be safely given as NULL.

void SEE_Global_eval(struct SEE_interpreter *interp, 
                struct SEE_input *input, 
                struct SEE_value *res);

The program text is first parsed and then executed with this function. If the evaluated text contains function definitions, the function-objects created inside the interpreter will contain a 'precompiled' copy of the function text. This means it is safe to destroy the input immediately after it has been passed to SEE_Global_eval().

4.1 Example

Although the rest of this document explains the library API in detail, a complete, but simple example of using the SEE interpreter follows:

#include <see/see.h>

/* Simple example of using the interpreter */
int
main()
{
        struct SEE_interpreter interp_storage, *interp;
        struct SEE_input *input;
        SEE_try_context_t try_ctxt;
        struct SEE_value result;
        char *program_text = "Math.sqrt(3 + 4 * 7) + 9";

        /* Initialise an interpreter */
        SEE_interpreter_init(&interp_storage);
        interp = &interp_storage;

        /* Create an input stream that provides program text */
        input = SEE_input_utf8(interp, program_text);

        /* Establish an exception context */
        SEE_TRY(interp, try_ctxt) {
                /* Call the program evaluator */
                SEE_Global_eval(interp, input, &result);

                /* Print the result */
                if (SEE_VALUE_GET_TYPE(&result) == SEE_NUMBER)
                        printf("The answer is %f\n", result.u.number);
                else
                        printf("Unexpected answer\n");
        }

        /* Finally: */
        SEE_INPUT_CLOSE(input);

        /* Catch any exceptions */
        if (SEE_CAUGHT(try_ctxt))
                printf("Unexpected exception\n");

        exit(0);
}

When this program is compiled, linked against the SEE library and the garbage collector library, and run, it should respond with:

The answer is 14.567764

This works because the value of the last executed statement in the program_text is stored in result. Calling SEE_Global_eval() is essentially the same as using ECMAScript's built-in eval() function.

4.2 Inputs

SEE uses Unicode character stream sources known as 'inputs' to consume (scan and parse) ECMAScript program text. An input is a stream of 32-bit Unicode UCS-4 characters. The stream is read, one character at a time, through its 'get next character' callback function.

The SEE library provides some useful stream constructors. Each constructor create a new SEE_input structure, initialised for reading the source it is supplied.

struct SEE_input *SEE_input_file(struct SEE_interpreter *interp, 
                FILE *f, const char *filename, const char *encoding);
struct SEE_input *SEE_input_utf8(struct SEE_interpreter *interp,
                const char *s);
struct SEE_input *SEE_input_string(struct SEE_interpreter *interp,
                struct SEE_string *s);

If these constructors do not adequately meet your needs, you are encouraged to develop your own. They're quite easy to do, if a bit fiddly. I recommend you find the source to one of the above and modify it to do what you want.

The rest of this section describes the input API in detail, with a view towards custom input streams.

4.2.1 Input provider API

Why streams instead of strings? SEE uses a stream API for inputs rather than (say) a simple UCS-4 or UTF-8 string API, because Unicode-compliant applications will usually have a much better understanding of the encodings they are using than will SEE. With only a small amount of effort, streams provide this flexibility while avoiding unnecessary duplication or text storage.

Inputs are described by SEE_input structures. These are functionally similar to stdio's FILE type, or Java's ByteReader classes. Except they stream fully-decoded Unicode characters. The SEE_input structure is the focus of the API and maintains the input's stream state and provides a pointer to its access (callback) methods.

struct SEE_input {
        struct SEE_inputclass *inputclass;
        SEE_boolean_t          eof;
        SEE_unicode_t          lookahead;
        ...
};

struct SEE_inputclass {
        SEE_unicode_t   (*next)(struct SEE_input *input);
        void            (*close)(struct SEE_input *input);
};

The inputclass member indicates the access methods. It is a pointer to a SEE_inputclass structure. This class structure contains function pointers to the two methods next() and close().

The next() method should advance the input pointer, update the eof and lookahead members of the SEE_input structure, and return the old value of lookahead. SEE's scanner calls next() repeatedly, until the eof member becomes true. When eof is true, the value of lookahead becomes meaningless (but should be set to -1). Generally, the stream's constructor will internally call its next() function once initially, to 'prime' the lookahead field.

If the next() method encounters an encoding error, it should set lookahead to SEE_INPUT_BADCHAR and try to recover. It can throw an exception if it wants to, but SEE does not attempt to handle that: the application or user program will receive it. If you don't particularly care about Unicode, it is helpful to know that 7-bit ASCII is a direct subset of Unicode, so you can just pass each of your ASCII chars as a 32-bit SEE_unicode_t masked with 0x7f. (See the Unicode standards.)

The close() method should deallocate any operating system resources acquired during the input stream's construction. By convention, SEE will not call the close() method of any application-supplied input. The onus is on the caller to close the inputs supplied to SEE library functions. For this reason, you should use the 'finally' behaviour described in §4.3 to clean up a possibly failed stream.

The SEE_input structure represents the current state of the input stream. Most importantly, the lookahead field must always reflect the next character that a call to next() would return. Once initialised, the filename, first_lineno and interpreter members of the SEE_input structure should not be changed. The lookahead and eof members should also be initialised before the structure is given to SEE.

You are encouraged to read the source code to the three constructors listed at the beginning of this section.

4.2.2 Input client API

Callers will use these convenience macros to call input methods on a constructed input stream, rather than calling through the class structure directly:

SEE_unicode_t SEE_INPUT_NEXT(struct SEE_input *input);
void SEE_INPUT_CLOSE(struct SEE_input *input);

4.3 Try-catch contexts

SEE's exceptions are implemented using C's setjmp()/longjmp() mechanism. SEE provides macros that establish a try-catch context, and test later if a try block terminated abnormally (i.e. due to an thrown exception). Typical code that uses try-catch looks like this:

struct SEE_interpreter *interp;
struct SEE_value *e;
SEE_try_context_t c; /* storage for the try-catch context */

...

SEE_TRY(interp, c) {

        /*
         * Now inside a protected "try block".
         * The following calls may throw exceptions if they want,
         * causing the try block to exit immediately.
         */
        do_something();
        do_something_else();

        /* 
         * Because the SEE_TRY macro expands into a 'for' loop,
         * avoid using 'break', or 'return' statements.
         * If you must leave the try block, use 'continue;',
         * or throw an exception.
         */
}

/* Code placed here always runs. */
do_cleanup();

if ((e = SEE_CAUGHT(c))) {
        /* Handle the thrown exception 'e', somehow. */
        handle_exception(e);

        /* or you can throw it up to the next try-catch like so: */
        SEE_THROW(interp, e);
}

...

Do not return, goto or break out of a try block; the macro does not check for this, and the try-catch context may not be restored properly, causing all sorts of havoc.

Exceptions thrown outside of any try-catch context will cause the interpreter to abort.

If you are not interested in catching exceptions, and only want the 'finally' behaviour, use the following idiom:

SEE_TRY(interp, c) {
        do_something();
}
do_finally();    /* optional */
SEE_DEFAULT_CATCH(interp, c);

The signatures of these macros are:

SEE_TRY(struct SEE_interpreter *interp, SEE_try_context_t ctxt) { stmt... }
struct SEE_object *SEE_CAUGHT(SEE_try_context_t ctxt);
void SEE_THROW(struct SEE_interpreter *interp, struct SEE_object *exception);
void SEE_DEFAULT_CATCH(struct SEE_interpreter *interp, SEE_try_context_t ctxt);

5 Values

Eventually, your host application will want to pass numbers, strings and complex value objects about, through the SEE interpreter, to and from the user code. This section describes the C interface to ECMAScript values.

The ECMAScript language has exactly six types of value. They are:

The SEE_value structure can represent values of all of these types.

struct SEE_value {
    enum { ... }            _type;
    union {
        SEE_boolean_t       boolean;
        SEE_number_t        number;
        struct SEE_string * string;
        struct SEE_object * object;
        ...
    } u;
};

The first member, _type, is the discriminator, and must be one of the enumerated values SEE_UNDEFINED, SEE_NULL, SEE_BOOLEAN, SEE_NUMBER, SEE_STRING or SEE_OBJECT. You should access the _type member using the SEE_VALUE_GET_TYPE() macro.

enum { ... } SEE_VALUE_GET_TYPE(struct SEE_value *value);

Depending on the type, you can directly access the corresponding value of a SEE_value. If the value variable is declared as:

struct SEE_value v;

then the value that it holds is directly accessed through its union member, v.u. The following table shows when the union fields of v.u are valid:

SEE_VALUE_GET_TYPE(&v) Valid member Member's type
SEE_UNDEFINED n/a n/a
SEE_NULL n/a n/a
SEE_BOOLEAN v.u.boolean SEE_boolean_t
SEE_NUMBER v.u.number SEE_number_t
SEE_STRING v.u.string struct SEE_string *
SEE_OBJECT v.u.object struct SEE_object *

Two other types (SEE_COMPLETION and SEE_REFERENCE) are only used internally to SEE and are not documented here.

To convert/coerce values into values of a different types, use the utility functions describe in §5.1.

To create new values in struct SEE_value structures, use the following initialisation macros. They first set the _type field and then copy the second parameter into the appropriate union field. It is fine to use a local variable for a struct SEE_value, because the garbage collector can see what is being used from the stack.

void SEE_SET_UNDEFINED(struct SEE_value *val);
void SEE_SET_NULL(struct SEE_value *val);
void SEE_SET_OBJECT(struct SEE_value *val, struct SEE_object *obj);
void SEE_SET_STRING(struct SEE_value *val, struct SEE_string *str);
void SEE_SET_NUMBER(struct SEE_value *val, SEE_number_t num);
void SEE_SET_BOOLEAN(struct SEE_value *val, SEE_boolean_t bool);

Most SEE_values are passed about the SEE library functions using pointers. This is because the general contract is that the caller supplies storage for the return value (usually named ret), while other pointer arguments are treated as read-only. Conventionally, the result value pointer is provided as the last argument to these functions and is named res.

⚠ Note: The SEE_VALUE_COPY() macro breaks this convention by instead following the better-known idiom of memcpy(), and placing the destination first.

Avoid storing a struct SEE_value as a pointer. Instead, extract and copy values into storage using the following macro:

void SEE_VALUE_COPY(struct SEE_value *dst, struct SEE_value *src);

A simple pitfall to avoid when passing values to SEE functions is to use value storage as both a parameter to the function and as the return result storage. Do not do this. It is possible that the function will initialise its return storage before it accesses its parameters.

5.1 Value conversion

The ECMAScript language specification provides for conversion functions that the host application developer may find useful. They convert arbitrary values into values of a known type:

void SEE_ToPrimitive(struct SEE_interpreter *interp,
                struct SEE_value *val, struct SEE_value *hint,
                struct SEE_value *res);
void SEE_ToBoolean(struct SEE_interpreter *interp,
                struct SEE_value *val, struct SEE_value *res);
void SEE_ToNumber(struct SEE_interpreter *interp,
                struct SEE_value *val, struct SEE_value *res);
void SEE_ToInteger(struct SEE_interpreter *interp,
                struct SEE_value *val, struct SEE_value *res);
void SEE_ToString(struct SEE_interpreter *interp,
                struct SEE_value *val, struct SEE_value *res);
void SEE_ToObject(struct SEE_interpreter *interp,
                struct SEE_value *val, struct SEE_value *res);

5.2 Undefined, null, boolean and number values

The undefined and null types have exactly one implied value each, namely undefined and null.

⚠ Note: null is not an object type, and is not related to C's NULL constant.

Boolean types (SEE_boolean_t) have values of either true (non-zero) or false (zero).

Number values (SEE_number_t) are IEEE 754 signed floating point numbers, normally corresponding to the C compiler's built-in double type.

The following macros may be used to find information about a number value. (They assume that the type is SEE_NUMBER):

int SEE_NUMBER_ISNAN(struct SEE_value *val);
int SEE_NUMBER_ISPINF(struct SEE_value *val);
int SEE_NUMBER_ISNINF(struct SEE_value *val);
int SEE_NUMBER_ISINF(struct SEE_value *val);
int SEE_NUMBER_ISFINITE(struct SEE_value *val);

SEE also provides constants SEE_Infinity and SEE_NaN which may be stored in number values, but should not be used to compare number values with C's == operator. Use the macros mentioned previously, instead.

const SEE_number_t SEE_Infinity;
const SEE_number_t SEE_NaN;

Numbers (and other values) may be converted to integers using the functions SEE_ToInt32(), SEE_ToUint32() or SEE_ToUint16().

SEE_int32_t  SEE_ToInt32(struct SEE_interpreter *interp, struct SEE_value *val);
SEE_uint32_t SEE_ToUint32(struct SEE_interpreter *interp, struct SEE_value *val);
SEE_uint16_t SEE_ToUint16(struct SEE_interpreter *interp, struct SEE_value *val);

SEE provides three data types for integers:

5.3 String values

String values are pointers to SEE_string structures, that hold UTF-16 strings. The structure is defined something like this:

struct SEE_string {
        unsigned int     length;
        SEE_char_t      *data;
        ...
};

The useful members are:

Be aware that other strings may come to share the string's data, such as by forming substrings. A string's content must not be modified after construction because of this risk. However, the length field of a string may be changed to a smaller value at any time without concern.

The SEE_char_t type represents each Unicode character in the string. It is equivalent to a 16-bit unsigned integer.

To manipulate a string, first create a new string using one of the following:

struct SEE_string *SEE_string_new(struct SEE_interpreter *interp,
                unsigned int space);
struct SEE_string *SEE_string_dup(struct SEE_interpreter *interp,
                struct SEE_string *s);
struct SEE_string *SEE_string_concat(struct SEE_interpreter *interp,
                struct SEE_string *s1, struct SEE_string *s2);
struct SEE_string *SEE_string_sprintf(struct SEE_interpreter *interp,
                const char *fmt, ...);
struct SEE_string *SEE_string_vsprintf(struct SEE_interpreter *interp,
                const char *fmt, va_list ap);

And then, before passing your new string to any other function, append characters to it using the following:

void SEE_string_addch(struct SEE_string *s, SEE_char_t ch);
void SEE_string_append(struct SEE_string *s, const struct SEE_string *sffx);
void SEE_string_append_int(struct SEE_string *s, int i);

Once a new string has been passed to any other SEE function, it is generally unwise to modify its contents in any way. It is OK to share a string between different interpreters if the string is guaranteed not to be modified, and the garbage collector can cope with it.

All strings in SEE use UTF-16 encoding, meaning that in some cases you may need to be aware of Unicode 'surrogate' characters. If the host application really needs UCS-4 strings (which are subtly different to UTF-16), you will need to write your own converter function. Use the implementation of SEE_input_string() (§4.2) as the basis for such a converter.

⚠ Note: The SEE_string_sprintf() and SEE_string_vsprintf() functions only generate Unicode characters that lie in the 7-bit ASCII subset of Unicode.

Other string functions provided are:

struct SEE_string *SEE_string_substr(struct SEE_interpreter *interp,
                struct SEE_string *s, int index, int length);
struct SEE_string *SEE_string_literal(struct SEE_interpreter *interp,
                const struct SEE_string *s);
int SEE_string_fputs(const struct SEE_string *s, FILE *file);
int SEE_string_cmp(const struct SEE_string *s1,
                const struct SEE_string *s2);

5.3.1 Internalised strings

If you find yourself comparing strings a lot, you may find it easier to compare internalised strings. These are strings that are kept in a fast hash table and may be compared equal using pointer equality. The SEE_intern() function returns an 'internalized' copy of the given string and is very fast on already-interned strings. It is worth using in lieu of SEE_string_cmp() if the strings are likely to be intern'ed already. (For example, all property names in the standard library are.)

struct SEE_string *SEE_intern(struct SEE_interpreter *interp,
                struct SEE_string *s);

5.3.2 Statically initialised strings

SEE supports statically initialised strings. If you have a large number of strings to create and use (e.g. properties and method names) over many interpreter instances, statically initialised strings can save space, and improve performance.

A statically initialised string, 'Hello, world', would look like this:

/* Example of a statically-initialised UTF-16 string */
static SEE_char_t hello_world_chars[12] = {
    'H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd'
};
static struct SEE_string hello_world = {
    12,                                                /* length */
    hello_world_chars                                  /* data */
};

The main problem with static strings is finding an elegant way to initialise the strings' content. There is no simple way in ANSI C to have the compiler convert common ASCII strings into UTF-16 arrays. The approach taken by SEE in supporting all the standard ECMAScript objects and methods, is to generate C program text from a file of ASCII strings during the build process.

If an application wishes to internalise strings across interpreters, it must add all its global strings into the global intern table before creating any interpreters. This is done by calling SEE_intern_global() for each global string.

void SEE_intern_global(struct SEE_string *str);

When creating global strings, the application can either use the static initialisation technique described above, or create interpreter-less strings by passing a NULL interpreter pointer to the various string creation functions (§5.3). Such strings should be placed into the global intern table, immediately.

6 Objects

ECMAScript uses a prototype-inheritance object model with simple named properties. More information on the object model can be found in the ECMA-262 standard, and in other JavaScript references.

This section describes how in-memory objects can be accessed and manipualated (the 'client interface'), and also how host applications can expose their own application objects and methods (the 'implementation interface').

Object instances are implemented as in-memory structures, with an objectclass pointer to a table of operational methods. Object references are held inside values with a type field of SEE_OBJECT (see §5).

6.1 Object values, and the object client interface

All object values are pointers to object instances. The pointers are of type struct SEE_object *. No object pointer in a SEE_value should ever point to NULL. I find working with struct SEE_object * pointer types directly, instead of using struct SEE_value to be convenient, when I know that I am dealing with objects.

To use an existing object instance, you should interact with it using only the following macros:

void SEE_OBJECT_GET(struct SEE_interpreter *interp,
                struct SEE_object *obj, struct SEE_string *prop,
                struct SEE_value *res);
void SEE_OBJECT_PUT(struct SEE_interpreter *interp,
                struct SEE_object *obj, struct SEE_string *prop,
                struct SEE_value *res, int flags);
int SEE_OBJECT_CANPUT(struct SEE_interpreter *interp,
                struct SEE_object *obj, struct SEE_string *prop);
int SEE_OBJECT_HASPROPERTY(struct SEE_interpreter *interp,
                struct SEE_object *obj, struct SEE_string *prop);
int SEE_OBJECT_DELETE(struct SEE_interpreter *interp,
                struct SEE_object *obj, struct SEE_string *prop);
void SEE_OBJECT_DEFAULTVALUE(struct SEE_interpreter *interp,
                struct SEE_object *obj, struct SEE_value *hint,
                struct SEE_value *res);
void SEE_OBJECT_CONSTRUCT(struct SEE_interpreter *interp,
                struct SEE_object *obj, struct SEE_object *thisobj,
                int argc, struct SEE_value **argv,
                struct SEE_value *res);
void SEE_OBJECT_CALL(struct SEE_interpreter *interp,
                struct SEE_object *obj, struct SEE_object *thisobj,
                int argc, struct SEE_value **argv,
                struct SEE_value *res);
int SEE_OBJECT_HASINSTANCE(struct SEE_interpreter *interp,
                struct SEE_object *obj, struct SEE_value *instance);
struct SEE_enum *SEE_OBJECT_ENUMERATOR(struct SEE_interpreter *interp,
                struct SEE_object *obj);

⚠ Note: The last four macros (SEE_OBJECT_CONSTRUCT(), SEE_OBJECT_CALL(), SEE_OBJECT_HASINSTANCE(), SEE_OBJECT_ENUMERATOR()) will not check if the object has a NULL pointer for the corresponding object method. Calling them on an unchecked object will probably result in a memory access violation (e.g. segmentation fault). The following macros return true if the object safely provides those methods:

int SEE_OBJECT_HAS_CALL(struct SEE_object *obj);
int SEE_OBJECT_HAS_CONSTRUCT(struct SEE_object *obj);
int SEE_OBJECT_HAS_HASINSTANCE(struct SEE_object *obj);
int SEE_OBJECT_HAS_ENUMERATOR(struct SEE_object *obj);

When storing properties in an object with SEE_OBJECT_PUT(), a flags parameter is required. In normal operation, this flag should be supplied as zero, but when populating an object with its properties for the first time, the following bit flags can be used:

Flag Meaning
SEE_ATTR_READONLY Future assignments (puts) on this property will fail
SEE_ATTR_DONTENUM Enumerators will not list this property and will hide inherited prototype properties of the same name until this property is deleted. (see §6.2)
SEE_ATTR_DONTDELETE Future deletes on this property will fail

6.2 Property enumerators

A property enumerator is a mechanism for discovering the properties that an object contains. The language exercises this with its for (var v in ...) construct. The results of the enumeration need not be sorted, nor even to be the same order each time.

Calling SEE_OBJECT_ENUMERATOR() returns a newly created enumerator which is a pointer to a struct SEE_enum. Once obtained, the following macros can be used to access the enumerator:

struct SEE_string *SEE_ENUM_NEXT(struct SEE_interpreter *interp,
                struct SEE_enum *e, int *flags_return);

Enumerators can assume that the underlying object does not change during enumeration. A suggested strategy for a caller that does need to remove or add an object's properties while enumerating them is to first create a private list of its property names, ensuring that it has exhausted the enumerator before attempting to modify the object.

6.3 The object implementation interface

When a host application wishes to expose its own 'host objects' to ECMAScript programs, it must use the object implementation API described in this section.

All SEE objects are in-memory structures starting with a struct SEE_object:

struct SEE_object {
        struct SEE_objectclass *objectclass;
        struct SEE_object *     Prototype;
};

Normally, this structure is part of a larger structure that maintains the object's private state. For example, native Number objects could be implemented with the following:

struct number_object {             /* example implementation of Number */
        struct SEE_object object;
        SEE_number_t      number;
};

Keeping the object part at the top of the number_object structure means that pointers of type struct number_object * can be cast to and from pointers of type struct SEE_object *. This is a general idiom: begin all host object structures with a field member of type struct SEE_object named object.

Although the ECMAScript language does not use classes per se, SEE's internal object implementation does use a class 'abstraction' to speed up execution and make implementation re-use easier. Each object has a field, object.objectclass, that must be initialised to point to a struct SEE_objectclass that provides the object's behaviour. The class structure looks like this:

struct SEE_objectclass {
        struct SEE_string *     Class;          /* mandatory */
        SEE_get_fn_t            Get;            /* mandatory */
        SEE_put_fn_t            Put;            /* mandatory */
        SEE_boolean_fn_t        CanPut;         /* mandatory */
        SEE_boolean_fn_t        HasProperty;    /* mandatory */
        SEE_boolean_fn_t        Delete;         /* mandatory */
        SEE_default_fn_t        DefaultValue;   /* mandatory */
        SEE_enumerator_fn_t     enumerator;     /* optional */
        SEE_call_fn_t           Construct;      /* optional */
        SEE_call_fn_t           Call;           /* optional */
        SEE_hasinstance_fn_t    HasInstance;    /* optional */
};

The application generally provides this structure in static storage, as most of its members are function pointers or strings known at compile time. A member marked optional should be set to NULL if it is meaningless.

The object methods marked mandatory (Get, Put, etc.) are never NULL, and should provide the precise behaviours that SEE expects on native objects. These behaviours are fully described in the ECMA-262 standard, and are summarised in the following table:

Method Behaviour
Get retrieve a named property (or return undefined)
Put create/update a named property
Delete delete a property or return 0
HasProperty returns 0 if the property doesn't exist
CanPut returns 0 if the property cannot be changed
DefaultValue turns the object into a string or number value
Construct constructs a new object; as per the new keyword
Call the object has been called as a function
HasInstance returns 0 if the objects are unrelated
enumerator allow enumeration of the properties (see above)

It is up to the host application to provide storage for the properties, and so forth. The simplest strategy is to ignore property calls to Put and Get that are meaningless. To this end, if the host object does not want to expend effort supporting some of the mandatory operations, it can use the corresponding 'do-nothing' function(s) from this list:

The Prototype field of an object instance can either be set to:

If you choose to use NULL, it is recommended you provide a toString() method (to help with debugging).

Once the host application has constructed its own objects that conform to the API, they can be inserted into the 'Global object' as object-valued properties.

The 'Global object' is an unnamed, top-level object whose sole purpose is to 'hold' all the built-in objects, such as Object, Function, Math, etc., as well as all user-declared global variables. The host application can access it through the Global member of the SEE_interpreter structure.

6.4 Native objects

SEE provides support for a special kind of object class called native objects. Native objects maintain a hash table of properties, and implement the mandatory methods (plus enumerator), and correctly observe the Prototype field.

struct SEE_native {
        struct SEE_object       object;
        struct SEE_property *   properties[SEE_NATIVE_HASHLEN];
};

An application can create host objects based on native objects. First, place a struct SEE_native at the beginning of a structure:

struct some_host_object {
        struct SEE_native       native;
        int                     host_specific_info;
};

Then, use the following objects methods, either directly in the SEE_objectclass structure, or by calling them indirectly from method implementations:

It is very important that you initialize the native field when constructing your host object. Do this using the SEE_native_init() function.

void SEE_native_init(struct SEE_native *obj, struct SEE_interpreter *i,
                const struct SEE_objectclass *obj_class, 
		struct SEE_object *prototype);

6.5 C function objects

The host application will likely want a particular bit of C code to be able to be called from the runtime environment. To do this simply requires construction of an object whose Prototype field points to Function.prototype, and whose objectclass's Call method points to a C function that contains the desired code.

The convenience function SEE_cfunction_make() performs this construction. It takes a pointer to the C function, and an integer indicating the expected number of arguments. (The integer becomes the function object's length property, which is advisory only.)

struct SEE_object *SEE_cfunction_make(struct SEE_interpreter *interp,
                SEE_call_fn_t func, struct SEE_string *name, int argc);

⚠ Note: Objects returned by SEE_cfunction_make() should really only be used in the interpreter context in which they were created, but the current version of SEE does not check for this. (Because cfunction objects are essentially read-only after construction, and if memory allocation operates independently of the interpreters, sharing cfunction objects across interpreters will be OK, but it is not recommended for future portability.)

The C function must conform to the SEE_call_fn_t signature. This is demonstrated below, with math_sqrt(), which is the actual code behind the Math.sqrt object:

/* Implementation of Math.sqrt() method */
static void
math_sqrt(interp, self, thisobj, argc, argv, res)
        struct SEE_interpreter *interp;
        struct SEE_object *self, *thisobj;
        int argc;
        struct SEE_value **argv, *res;
{
        struct SEE_value v;

        if (argc == 0)
                SEE_SET_UNDEFINED(res);
        else {
                SEE_ToNumber(interp, argv[0], &v);
                SEE_SET_NUMBER(res, sqrt(v.u.number));
        }
}

The arguments to this function are described in the following table:

Argument Purpose
interp the current interpreter context
self a pointer to the object called (Math.sqrt here)
thisobj the this object (the Math object in this case)
argc number of arguments
argv array of value pointers, of length argc
res uninitialised value location in which to store the result

A common convention in all ECMAScript functions is that unspecified arguments should be treated as undefined, and extraneous arguments should just be ignored. If the function uses thisobj, it should check any assumptions made about it, especially if it is expected to be a host object. This is because method functions can easily be attached to other objects by user code.

6.6 User function objects

Occasionally, a host application will wish to take some user text and create a callable function object from it. An example of this problem is in attaching the JavaScript code from HTML attributes onto form elements of a web page. One way to achieve this is to invoke the Function constructor object with the SEE_OBJECT_CONSTRUCT() macro, passing it the formal arguments text and body text as arguments. (See the ECMAScript standard for details on the Function constructor.)

Another way, that is more convenient if the user text is available as an input stream, is to use the SEE_Function_new() function:

struct SEE_object *SEE_Function_new(struct SEE_interpreter *interp, 
                struct SEE_string *name, struct SEE_input *param_input, 
                struct SEE_input *body_input);

where any of the the name, param_input and body_input parameters may be NULL (indicating to use the empty string).

The returned function object may be called with the SEE_OBJECT_CALL() macro.

6.7 Errors and Error objects

Host applications sometimes need to convey errors to ECMAScript programs. Errors in ECMAScript are typically indicated by throwing an exception with an object value. The thrown objects conventionally have Error.prototype somewhere in their prototype chain, and provide a message and name property which the Error.prototype reads to generate a human-readable error message.

Host applications can conveniently construct and throw error exceptions using the following macros:

void SEE_error_throw(struct SEE_interpreter *interp,
                struct SEE_object *error_constructor,
                const char *fmt, ...);
void SEE_error_throw_string(struct SEE_interpreter *interp, 
                struct SEE_object *error_constructor,
                struct SEE_string *string);
void SEE_error_throw_sys(struct SEE_interpreter *interp,
                struct SEE_object *error_constructor,
                const char *fmt, ...);

These convenience macros construct a new error object, and throw it as an exception using SEE_THROW(). The object thrown is given a message string property that reflects the rest of the arguments provided to the called macro. The SEE_error_throw_sys() macro works like SEE_error_throw() but appends a textual description of errno using strerror().

The error_constructor argument should be one of the error constructor objects found in the SEE_interpreter structure:

Member Meaning
Error runtime error
EvalError error in eval()
RangeError numeric argument has exceeded allowable range
ReferenceError invalid reference was detected
SyntaxError parsing error
TypeError actual type of an operand different to that expected
URIError error in a global URI handling function

A simple example:

if (something_is_wrong)
        SEE_error_throw(interp, interp->Error, "something is wrong!");

Although Error is usually sufficient for most errors, host applications can create their own error constructor object with the SEE_Error_make() convenience function. Only one constructor of the same name should be created per interpreter.

struct SEE_object *SEE_Error_make(struct SEE_interpreter *interp,
                struct SEE_string *name);

7 Compatibility features

7.1 Compatibility with other JavaScript implementations

SEE provides backward-compatibility with earlier versions of JavaScript and JScript. These features ought never be used, since JavaScript program authors should be mindful of standards. Nevertheless, this section documents the compatibility modes that SEE supplies.

The behaviour of the SEE library is modified on a per-interpreter basis, by passing special flags to a variant of the interpreter's initialisation routine, SEE_interpreter_init_compat(). This function otherwise behaves just like SEE_interpreter_init() (see §2).

void SEE_interpreter_init_compat(struct SEE_interpreter *interp,
		int flags);

The flags parameter is a bitwise OR of the constants described in the following table.

⚠ Note: The following compatibility flag names may change in the future.

Flag Behaviour
SEE_COMPAT_STRICT This is not really a flag. It is defined as zero, and can be used when no compatibility flags are wanted.
SEE_COMPAT_UTF_UNSAFE Treat 'overlong' UTF-8 encodings as valid unicode characters.
SEE_COMPAT_UNDEFDEF Don't throw a ReferenceError when an undefined global property is used. Instead, return the undefined value. This violates step 3 of s8.7.1 of ECMA-262, but it seems that so many interpreters are flexible on this point. It was originally a JavaScript 1.5 thing I believe.
SEE_COMPAT_262_3B Enable optional features in ECMA-262 ed3 Appendix B:
  • Date.toGMTString() is defined, and made equivalent to toUTCString()
  • Date.getYear() and Date.setYear() are defined.
  • Global object has escape() and unescape() functions defined.
  • String.substr() is defined
SEE_COMPAT_SGMLCOM The lexical analyser will treat the 4-character sequence '<!--' much like the '//' comment introducer. This is useful in HTML SCRIPT elements.
SEE_COMPAT_EXT1 Random, unsorted extensions, mainly relating to behaviour of older JavaScript:
  • Enumerating over properties is done in a sorted fashion. During sort, property names are ordered arithmetically if they are suitable as array indicies, otherwise they are ordered lexicographically.
  • Invalid \u or \x escapes will treat the leading \u or \x as a single-letter escape.
  • SEE's lexical analyser will recognise octal integers (i.e. integers starting with '0') and will fall back to decimal if the token contains a non-octal digit.
  • Coercing native values that do not have a [[DefaultValue]] internal property will return an object-unique string, instead of throwing a TypeError.
  • the string representation of a bad date (i.e. 'new Date(NaN)') will return the string 'Invalid Date', instead of 'NaN'.
  • Calling Date as a constructor will recognise Netscape-style date strings of the form '1/1/1999 12:30 AM'.
  • Function.prototype will not have a prototype property of its own.
  • Function.prototype.toString() applied to built-in functions and constructors (which are not function instances) will return a bogus do-nothing FunctionDeclaration instead of throwing a TypeError.
  • The global object has its property [[Prototype]] property set to Object.prototype, effectively making all its properties available to the global scope, but having the good side effect of allowing toString() to work anywhere.
  • Calling eval() with a this different to the global object executes its contents with the scope and variable object always set to this (instead of inheriting the caller's context as per s10.2.2 of the standard).
  • Native objects synthesize a property called __proto__ with the same value as the internal [[Prototype]] property (or null). Assignments to __proto__ are accepted if they don't cause a cycle.
  • Native functions assign themselves an arguments property when called, so that the old idiom of using f.arguments inside the function f will work.
  • The system-generated arguments object created inside a function has a default-value (a comma-separated string representing the arguments), instead of raising a TypeError. The upshot of this is that arguments can be coerced into a string.
  • Calling an empty function will involve the expensive process of extending the scope chain, creating an arguments property, etc. In other words, an simple optimisation is disabled.
  • Reserved words can be used as identifiers (with a warning message)
  • Invalid quantifiers in regular expressions (e.g. /a{12x}/) are treated as literals instead of raising a SyntaxError.
SEE_COMPAT_ARRAYJOIN1 Array.join(undefined) uses the string 'undefined' as the join string instead of ','. However when called without arguments will still use ','. (How bizarre.)

7.2 Compatibility with previous versions of SEE

As distributed, SEE has two different version numbers:

  1. the package, release and shared library version number (e.g. 2.0)
  2. the API version number (e.g. 1.0)

The library version is available to programs to query through the SEE_version() function. This function returns a pointer to a static C string containing identifiers separated by a space character (0x20). The first identifier is the name of the library (e.g. "see") and the second identifier is the package version number (e.g. 2.0). Further identifiers indicate the features used when compiling the library. This string is useful for end users to determine what capabilities their library implementation has.

const char *SEE_version(void);

The major and minor API version numbers indicate backward-compatible and backward-incompatible changes to the API, i.e the interface described in this documentation and the header files. The API version number is independent of the package and library version number.

Practically, developers should use the following code to signal the rare case of major API changes when compiling:

#if SEE_VERSION_API_MAJOR > 1
 #warning "SEE API major version mismatch"
#endif

This warning will indicate that past use of older API is incompatible with newer APIs.

The API versioning rules are:

This document will indicate at what API version new API elements are added, defaulting to 1.0.

const int SEE_VERSION_API_MAJOR;
const int SEE_VERSION_API_MINOR;

8 Debugging facilities

The SEE library contains various debugging facilities, that are omitted if it is compiled with the NDEBUG preprocessor define.

These functions are intended for the developer to use while application debugging, and not for general use.

void SEE_PrintValue(struct SEE_interpreter *interp, 
                struct SEE_value *val, FILE *file);
void SEE_PrintObject(struct SEE_interpreter *interp, 
                struct SEE_object *obj, FILE *file);
void SEE_PrintString(struct SEE_interpreter *interp, 
                struct SEE_string *str, FILE *file);
void SEE_PrintTraceback(struct SEE_interpreter *interp, 
                FILE *file);

If debugging the library itself, it is worth reading the source code to find the debug flag variables that can be turned on by the host application to enable verbose traces during execution.

Defining the NDEBUG preprocessor symbol when building the library also disables (slow) internal assertions that would otherwise help show up application misuse of the API.

The interpreter structure provides a trace callback field, which is called on every transition through the executable AST. This callback is passed a handle to the current execution context, (a struct SEE_context *) and an external debugger may examine it directly, or with the SEE_context_eval() utility function, which is otherwise functionally identical to SEE_Global_eval(). SEE_context_eval() is intended only for use by external debuggers attached to the trace callback. The trace callback should be disabled (by setting it to NULL) when calling SEE_Global_eval(), otherwise re-entrant tracing will occur.

void SEE_context_eval(struct SEE_context *context,
                struct SEE_string *expr, struct SEE_value *res);

References

Name index

SEE_ABORT
SEE_ALLOCA
SEE_CAUGHT
SEE_cfunction_make
SEE_context_eval
SEE_DEFAULT_CATCH
SEE_ENUM_NEXT
SEE_Error_make
SEE_error_throw
SEE_error_throw_string
SEE_error_throw_sys
SEE_Function_new
SEE_Global_eval
SEE_Infinity
SEE_input struct
SEE_inputclass struct
SEE_INPUT_CLOSE
SEE_input_file
SEE_INPUT_NEXT
SEE_input_string
SEE_input_utf8
SEE_intern
SEE_intern_global
SEE_interpreter_init
SEE_interpreter_init_compat
SEE_mem_exhausted_hook
SEE_mem_free_hook
SEE_mem_malloc_hook
SEE_mem_malloc_string_hook
SEE_NaN
SEE_native struct
SEE_native_init
SEE_NEW
SEE_NEW_ARRAY
SEE_NEW_STRING_ARRAY
SEE_NUMBER_ISFINITE
SEE_NUMBER_ISINF
SEE_NUMBER_ISNAN
SEE_NUMBER_ISNINF
SEE_NUMBER_ISPINF
SEE_object struct
SEE_objectclass struct
SEE_OBJECT_CALL
SEE_OBJECT_CANPUT
SEE_OBJECT_CONSTRUCT
SEE_OBJECT_DEFAULTVALUE
SEE_OBJECT_DELETE
SEE_OBJECT_ENUMERATOR
SEE_OBJECT_GET
SEE_OBJECT_HASINSTANCE
SEE_OBJECT_HASPROPERTY
SEE_OBJECT_HAS_CALL
SEE_OBJECT_HAS_CONSTRUCT
SEE_OBJECT_HAS_ENUMERATOR
SEE_OBJECT_HAS_HASINSTANCE
SEE_OBJECT_PUT
SEE_PrintObject
SEE_PrintString
SEE_PrintTraceback
SEE_PrintValue
SEE_SET_BOOLEAN
SEE_SET_NULL
SEE_SET_NUMBER
SEE_SET_OBJECT
SEE_SET_STRING
SEE_SET_UNDEFINED
SEE_string struct
SEE_string_addch
SEE_string_append
SEE_string_append_int
SEE_string_cmp
SEE_string_concat
SEE_string_dup
SEE_string_fputs
SEE_string_literal
SEE_string_new
SEE_string_sprintf
SEE_string_substr
SEE_string_vsprintf
SEE_THROW
SEE_ToBoolean
SEE_ToInt32
SEE_ToInteger
SEE_ToNumber
SEE_ToObject
SEE_ToPrimitive
SEE_ToString
SEE_ToUint16
SEE_ToUint32
SEE_TRY
SEE_value struct
SEE_VALUE_COPY
SEE_VALUE_GET_TYPE
SEE_version
SEE_VERSION_API_MAJOR
SEE_VERSION_API_MINOR

© David Leonard, 2004. This documentation may be entirely reproduced and freely distributed, as long as this copyright notice remains intact, and either the distributed reproduction or translation is a complete and bona fide copy, or the modified reproduction is subtantially the same and includes a brief summary of the modifications made.

$Id: USAGE.html 899 2005-12-22 13:47:29Z d $