Reference: TT (token trees)
General
TTs are a form of generic tree
structure for data storage and manipulation. It is a fundamental building
block in Flux.
Properties overview
- Can have one parent, or
none, in which case it's a root node.
- Can have up to (2^32)-1
direct children.
- Ordered relative to its
siblings. Knows about the two siblings appearing directly prior and
next to itself, if any.
- Can have any form of data,
up to (2^32)-1 bytes, associated with it.
- Homogenous; any node in
a tree represents a tree in itself, and can be referenced as such.
- Memory requirements: Without
data, 32 bytes per node on 32-bit architectures. 56 bytes per node on
64-bit architectures.
Casts
- TT(x)
Casts any kind of tree node to a TT.
Iterators
- TT_FOR_EACH(TT *root, TT *child) statements
Iterates over the direct children of given root
node, using child to store the per-iteration
pointer. Used like a for statement.
- TT_FOR_ALL(TT *root, TT *child) statements
Iterates over entire subtree of root, infix,
storing the per-iteration pointer in child. Used
like a for statement.
Allocation
- TT
*tt_new();
Allocates an empty, unconnected node from memory and returns its pointer.
- TT
*tt_new_with_data(void *data, int len);
Allocates an unconnected node, copies given data
to it and returns pointer to the node.
- TT
*tt_new_with_parent_and_data(TT *parent,
void *data, int len);
Allocates a node, connects it as the last child of parent,
copies given data to it and returns pointer to
the node.
- void
tt_del(TT *tt);
Detaches and frees node pointed to by tt, and
all of its children, recursively.
- TT
*tt_dup(TT *tt);
Makes an unconnected duplicate of given node and its data.
- TT
*tt_dup_all(TT *tt);
Makes an internally connected duplicate of the tree defined by given
node. That is, its children are also duplicated and connected to form
a tree with duplicate of given node as root.
- TT
*tt_split(TT *tt, u32 pos);
Splits data of tt before the byte indicated by
pos, putting the last half in a new node. The
new node is connected as a sibling immediately following tt,
and can be retrieved with tt_get_next(tt).
If pos is zero, then tt_size(tt)
will be zero. If pos equals tt_size(tt)
before the operation, tt_size(tt_get_next(tt))
will be zero afterwards.
Connectivity
These functions must be used to
link trees (nodes) to form larger trees.
- void
tt_add_as_first_child(TT *parent_tt, TT *tt);
Adds tt at the beginning of parent_tt's
child list.
- void
tt_add_as_last_child(TT *parent_tt, TT *tt);
Adds tt to the end of parent_tt's
child list.
- void
tt_add_as_first_sibling(TT *sibling_tt, TT *tt);
Adds tt at the beginning of sibling_tt's
parent's child list. Note: If sibling_tt is a
root, and thus has no parent to hold child lists, a new root will be
created implicitly, holding sibling_tt and tt.
See tt_is_fake_root on dealing with implicitly
created root nodes.
- void
tt_add_as_last_sibling(TT *sibling_tt, TT *tt);
Adds tt to the end of sibling_tt's
parent's child list. Note: If sibling_tt is a
root, and thus has no parent to hold child lists, a new root will be
created implicitly, holding sibling_tt and tt.
See tt_is_fake_root on dealing with implicitly
created root nodes.
- void
tt_add_before(TT *next_tt, TT *tt);
Adds tt before sibling_tt
in sibling_tt's parent's child list. Note: If
sibling_tt is a root, and thus has no parent
to hold child lists, a new root will be created implicitly, holding
sibling_tt and tt. See
tt_is_fake_root on dealing with implicitly created
root nodes.
- void
tt_add_after(TT *prev_tt, TT *tt);
Adds tt after sibling_tt
in sibling_tt's parent's child list. Note: If
sibling_tt is a root, and thus has no parent
to hold child lists, a new root will be created implicitly, holding
sibling_tt and tt. See
tt_is_fake_root on dealing with implicitly created
root nodes.
- int tt_add(TT *parent_tt, TT *tt);
Shortcut to tt_add_as_last_child.
- void
tt_swap(TT *tt0, TT *tt1);
Swaps tt0 and tt1's connectivity
contexts, meaning their positions relative to other nodes are exchanged.
Nothing else is touched, and the nodes take their respective data with
them.
- void
tt_detach(TT *tt);
Disconnects the subtree denoted by tt from its
parent and siblings. After the operation, tt
will be a root node.
- int
tt_is_in_path(TT *tt0, TT *tt1);
Returns TRUE if tt0
is in the path of (or rather, is a direct or indirect parent of) tt1.
Navigation
- int tt_is_root(TT *tt);
Returns TRUE if tt
is a root node (has no parent).
- int tt_is_leaf(TT *tt);
Returns TRUE if tt
is a leaf node (has no children).
- int tt_is_first(TT *tt);
Returns TRUE if tt
comes before all of its siblings (has no previous node).
- int tt_is_last(TT *tt);
Returns TRUE if tt
comes after all of its siblings (has no next node).
- int tt_is_sibling(TT *tt0, TT *tt1);
Returns TRUE if tt0
and tt1 are siblings (have same parent).
- int
tt_has_child(TT *parent, TT *child);
Returns TRUE if child
is a direct child of parent.
- TT
*tt_get_root(TT *tt);
Returns the root node in tt's tree.
- TT
*tt_get_first_sibling(TT *tt);
Returns the first sibling of tt.
- TT
*tt_get_last_sibling(TT *tt);
Returns the last sibling of tt.
- int tt_get_prev(TT *tt);
Returns the previous sibling of tt, or NULL
if tt is the first node.
- int tt_get_next(TT *tt);
Returns the next sibling of tt, or NULL
if tt is the last node.
- int tt_get_parent(TT *tt);
Returns the parent of tt, or NULL
if tt is the root node.
- int tt_get_first_child(TT *tt);
Returns the first child of tt, or NULL
if tt is a leaf node.
- int tt_get_last_child(TT *tt);
Returns the last child of tt, or NULL
if tt is a leaf node.
- TT
*tt_get_next_infix(TT *tt, TT *top);
Returns the next node in a depth-first, infix traversal of the tree
defined by top, where tt
is the last node traversed. Returns NULL
if all nodes have been visited. To traverse all nodes under top,
you should visit it first, then, on the first call, pass tt
equal to top, and pass the returned node
back as tt on each successive iteration.
See also the iterator TT_FOR_ALL which implements
the same method, but does more work for you.
- TT
*tt_get_common_parent(TT *tt0, TT *tt1);
Returns the first common parent in the path going upwards (towards the
root) from tt0 and tt1,
or NULL if they belong to disparate trees.
- TT
*tt_get_next_in_breadth_with_level(TT *tt, int depth, int level);
Made for breadth-first traversal, when you know all the parameters.
There are higher-level, more intuitive (though marginally slower) calls
you can use. To determine the next breadth node, three parameters are
required: tt is the node to start looking from.
depth is the depth of this node (see tt_get_depth).
Passing a wrong value for this parameter is an error - results are undefined.
level is the tree depth you want to breadth-first
traverse. To determine which node gets returned, the function considers
1) the children of tt, 2) any siblings following
tt and 3) parents of tt.
Step 1) and 2) is repeated for each node. Note that the passed node
is not considered, so this can be used iteratively to get all nodes
on a given level. Returns NULL when no more
nodes are found.
- int tt_get_next_in_breadth(TT *tt, TT *depth);
Breadth-first traversal. Returns the next node following tt,
at the same level. depth is the common depth
of these two. Returns NULL if there are no
more nodes at this level. See tt_get_next_in_breadth_with_level
for details.
- TT
*tt_get_next_in_same_depth(TT *tt);
Breadth-first traversal. This is the highest-level function to this
end. tt is the node to start looking from. Returns
the first node found which is at the same level as tt,
or NULL if there are no more.
Data
- int
tt_get_external_handle(TT *tt);
Returns a numeric filehandle to the data in tt.
Makes sense only if tt_data_is_internal is FALSE
for this node.
- void
tt_data_swap(TT *tt0, TT *tt1);
Swaps the data content of tt0 and tt1,
not otherwise touching the nodes.
- void
tt_data_del(TT *tt);
Frees all data associated with tt.
- void
tt_data_set_internal(TT *tt, void *src, u32 len, unsigned int copy);
Lowest-level data assignment function. Removes any previously set data
for tt, and sets it to src,
with length len. If copy
is TRUE, a local duplicate is made and assigned.
Otherwise, the node will just reference the data, and will not free
it when the node is deleted (see tt_data_is_local).
- int
tt_data_set_file(TT *tt, char *path, int local);
Assigns data found in file defined by path to
node tt. Returns TRUE
only if the file exists and can be opened. After a successful assignment,
the file acts as the node's data holder, transparently. If the local
argument is TRUE, the file will be deleted
whenever the node is.
- void
tt_data_set_bytes(TT *tt, void *src, u32 start, u32 len);
Changes len bytes of data in tt,
starting at start, to src.
Expands data size if it crosses the boundary, so this can be used to
set data in previously empty nodes too.
- void
tt_data_append_bytes(TT *tt, void *src, u32 len);
Adds len bytes of data from src
at the end of tt.
- void
tt_data_prepend_bytes(TT *tt, void *src, u32 len);
Adds len bytes of data from src
to the beginning of tt.
- void
tt_data_set_int(TT *tt, int val);
Shortcut for setting the data of tt to an integer
value, val.
- void
tt_data_set_ptr(TT *tt, void *ptr);
Shortcut for setting the data of tt to a pointer
value, ptr.
- void
tt_data_set_str(TT *tt, char *str);
Shortcut for setting the data of tt to a string,
str.
- void
*tt_data_get(TT *tt);
Returns a pointer to tt's data (if it is internal),
or a pointer to its (null-terminated) path (if it is external).
- u32
tt_data_get_bytes(TT *tt, void *dest, u32 start, u32 len);
Copies len bytes of data from tt,
starting at start, to dest.
- int
tt_data_get_int(TT *tt);
Shortcut. Takes the initial bytes of tt, returning
them as an int.
- void
*tt_data_get_ptr(TT *tt);
Shortcut. Takes the initial bytes of tt, returning
them as a generic pointer.
- char
*tt_data_get_str(TT *tt);
Returns a null-terminated copy of the data in tt.
Yes, you have to free this yourself, when you're done with it. If tt
is empty, it still returns a valid string - zero-length but null-terminated.
Matching
- int
tt_cmp(TT *tt0, TT *tt1);
Returns TRUE if the data of tt0
and tt1 are an exact match.
- int
tt_casecmp(TT *tt0, TT *tt1);
Returns TRUE if the data of tt0
and tt1 are an exact match, disregarding case.
- int
tt_memcmp(TT *tt, void *p, u32 len);
Returns TRUE if the data of tt
and the data defined by p and len
are an exact match.
- int
tt_memcasecmp(TT *tt, void *p, u32 len);
Returns TRUE if the data of tt
and the data defined by p and len
are an exact match, disregarding case.
- int tt_strcmp(TT *tt, TT *s);
Returns TRUE if the data of tt
and the string passed as s are an exact match.
- int tt_strcasecmp(TT *tt, TT *s);
Returns TRUE if the data of tt
and the string passed as s are an exact match,
disregarding case.
- int
tt_chr(TT *tt, int c);
Returns the position of the first byte matching c
in tt's data.
- int
tt_rchr(TT *tt, int c);
Returns the position of the last byte matching c
in tt's data.
- int
tt_regcmp_precomp(TT *tt, regex_t *preg);
Returns TRUE if the data of tt
matches the compiled regexp preg.
- int
tt_regcmp(TT *tt, char *regex);
Returns TRUE if the data of tt
matches the regexp regex. This function does
the regexp compilation for you, so watch out if you're concerned with
performance.
- int
tt_regcasecmp(TT *tt, char *regex);
Returns TRUE if the data of tt
matches the regexp regex, disregarding case.
This function does the regexp compilation for you, so watch out if you're
concerned with performance.
- size_t
tt_spn(TT *tt, const char *accept);
Calculates the length of the initial segment of tt's
data which consists entirely of characters in accept.
- size_t
tt_cspn(TT *tt, const char *reject);
Calculates the length of the initial segment of tt's
data which consists entirely of characters not in reject.
Searching
- TT
*tt_find_first_sibling(TT *tt, void *data, u32 len);
Works like tt_get_first_sibling, except it returns
the first sibling matching the exact data defined by data
and len, or NULL if
there are no matches.
- TT
*tt_find_last_sibling(TT *tt, void *data, u32 len);
Works like tt_get_last_sibling, except it returns
the last sibling matching the exact data defined by data
and len, or NULL if
there are no matches.
- TT
*tt_find_next(TT *tt, void *data, u32 len);
Works like tt_get_next, except it returns the
next sibling matching the exact data defined by data
and len, or NULL if
there are no matches.
- TT
*tt_find_prev(TT *tt, void *data, u32 len);
Works like tt_get_prev, except it returns the
previous sibling matching the exact data defined by data
and len, or NULL if
there are no matches.
- TT
*tt_find_first_child(TT *tt, void *data, u32 len);
Works like tt_get_first_child, except it returns
the first child matching the exact data defined by data
and len, or NULL if
there are no matches.
- TT
*tt_find_last_child(TT *tt, void *data, u32 len);
Works like tt_get_last_child, except it returns
the last child matching the exact data defined by data
and len, or NULL if
there are no matches.
- TT
*tt_match_first_sibling(TT *tt, char *regexp);
Works like tt_get_first_sibling, except it returns
the first sibling matching the regexp regexp,
or NULL if there are no matches.
- TT
*tt_match_last_sibling(TT *tt, char *regexp);
Works like tt_get_last_sibling, except it returns
the last sibling matching the regexp regexp,
or NULL if there are no matches.
- TT
*tt_match_next(TT *tt, char *regexp);
Works like tt_get_next, except it returns the
next sibling matching the regexp regexp, or NULL
if there are no matches.
- TT
*tt_match_prev(TT *tt, char *regexp);
Works like tt_get_prev, except it returns the
previous sibling matching the regexp regexp,
or NULL if there are no matches.
- TT
*tt_match_first_child(TT *tt, char *regexp);
Works like tt_get_first_child, except it returns
the first child matching the regexp regexp, or
NULL if there are no matches.
- TT
*tt_match_last_child(TT *tt, char *regexp);
Works like tt_get_last_child, except it returns
the last child matching the regexp regexp, or
NULL if there are no matches.
Statistics
- u32
tt_depth(TT *tt);
Reports the depth at which tt is connected (distance
to root node). For a root node itself, this value is zero.
- void
tt_stat_children_all(TT *root, u32 *count, u32 *size);
Recursively gets number of children and total size of their associated
data for root, and stores the values in count
and size. Note that the values for root
itself is not included.
- int tt_size(TT *tt);
Returns the size of tt's associated data. A size
of zero means there is none.
- u32
tt_size_children(TT *root);
Returns the total size of data associated with root's
direct (i.e. non-recursive) children. The size of root
itself is not included.
- u32
tt_size_children_all(TT *root);
Returns the total size of data associated with root's
direct and indirect (i.e. recursive) children. The size of root itself is not included.
- u32
tt_count_children(TT *tt);
Returns the number of direct children under root.
root itself is not included.
- u32
tt_count_children_all(TT *tt);
Returns the number of direct and indirect children under root.
root itself is not included.
- u32
tt_count_siblings(TT *tt);
Returns the number of nodes having the same direct parent as root.
root itself is included.
Status indicators; getting
- int tt_has_data(TT *tt);
Returns TRUE if tt
has any data associated with it.
- int tt_is_ready(TT *tt);
Used internally. Used in some constructors to determine if the
node has been fully constructed. Outside these constructors, the flag
has no meaning except for whatever you might assign to it. You can safely
use this flag for your own purposes, as long as you don't depend on
its initial value. It is not touched by any elementary operations, and
its value is preserved in duplicates.
- int tt_data_is_internal(TT *tt);
Returns TRUE if the node's data resides in main memory (fast access).
- int tt_data_is_local(TT *tt);
Returns TRUE if tt
owns the data it is assocated with, meaning the data will be deleted/freed
along with the node. This is true unless you explicitly assigned non-local
data to the node - normal operations just make local copies.
- int tt_is_fake_root(TT *tt);
Returns TRUE if the node was implicitly created
because the tree's upper branch expanded (e.g. you added a sibling to
the old root node), and this new root was required to maintain a valid
tree structure. You'd only have to use this if your code added siblings
to potentional root nodes - in practice, it should be avoided.
Status indicators; setting
- int tt_set_ready(TT *tt, int ready);
Used internally. Sets the node's state of readiness to TRUE
or FALSE, depending on the ready
argument. Used in some constructors to determine if the node has been
fully constructed. You can safely use this flag for your own purposes,
as long as you don't depend on its initial value. It is not touched
by any elementary operations, and its value is preserved in duplicates.
- int tt_set_internal(TT *tt, int internal);
Used internally. Used to indicate if the node's data resides
in main memory (fast access) or on an external medium. You shouldn't
have to use this.
- int tt_set_fake_root(TT *tt, int fake_root);
Used internally. Used to indicate if the node was implicitly
created because the tree's upper branch expanded (e.g. you added a sibling
to the old root node), and this new root was required to maintain a
valid tree structure. You shouldn't have to use this.
Hashing
- u32
tt_hash(TT *tt);
Returns a RIPE-MD160 hash of tt's data, XOR-collapsed
to 32 bits.
- u32
tt_hash_all(TT *tt);
Returns a running RIPE-MD160 hash of the tree defined by tt,
XOR-collapsed to 32 bits.
Printable I/O
- TT
*tt_scan_from_file(FILE *in);
Reads a printable token tree from in, and if
it's well-formed, returns the root of the resulting token tree. Otherwise,
it returns NULL.
- void
tt_print_to_file(TT *tt, FILE *out, TT_PRINT_MODE mode, int honour_meta);
Prints a token tree defined by tt in human-readable,
7-bit ASCII to out, using the indenting style
specified by mode. honour_meta
is not used for anything yet. Set it to 0.
Supported values for mode:
TT_PRINT_COMPACT: Uses as little spacing
as possible. Output will look like a solid block of text, almost unreadable.
TT_PRINT_KNR: Uses K&R-style indenting.
TT_PRINT_ALLMAN: Uses Allman-style (also
called BSD-style) indenting.
|