![]() |
![]() |
![]() |
![]() |
![]() ![]() |
![]() |
![]() |
|
![]() |
![]() ![]() ![]() ![]() ![]() |
Flux is divided into subsystems. Now, to quickly get a grasp of it, you should start out by reading about TTs (token trees). They are used in other parts of the library, as well as in finished applications. You'll notice a semblance to LISP lists, in that they're n-ary trees where each node can hold anything - identifier, text, binary data, etc. They're used to hold large data sets as well as smalltime bookkeeping information. For the TT API, completeness of functionality is the most important goal, and after that, efficiency. Note that TT data can only be manipulated indirectly, through API calls. This is so because a TT node can store its data externally, e.g. as a file on disk (read on for planned extensions). TTs have their own 7-bit ASCII print/scan interface, which operates on files with an extremely simple format, good for human-readable configuration files and whatnot (also, this is excellent for debugging): servers { a.styx.net b.styx.net c.styx.net } example-of-encoded-8bit-data { "\ff\ff\ff\ff <- four 8bit bytes, all bits set" }After TTs, take a quick look at SOCKs. Should be simple to figure out, as it's just a layer masking streams through pipes and network. The COMM subsystem combines TTs and SOCKs, serializing the former over the latter. Serialization can happen depth-first or breadth-first. When TTs are received, the receiver can inspect the partially constructed data structure for each node added. Combining this with breadth-first serialization, you get stuff like file browsers where directory hierarchies are reconstructed progressively, level by level, without blocking user interaction. Now for MTs (markup trees). The most important thing to keep in mind here is that MT is not XML. It's designed to be compliant with XML semantics, but it could in theory extend to embody more advanced semantics, like SGML. With a proper datatype, HTML and similar subsets of SGML can be covered too. Structure: Based on TTs. Each MT consists of several TTs forming a small tree describing the element. I haven't gotten around to documenting this structure in detail, and it is likely to change due to optimizations and semantic extensions. The fact that we use TTs here means that any MT can be cast to TT, allowing you to operate directly on the underlying structure, printing it as TT text, or serializing it with COMMs. Import/export: There are XML scanning/printing facilities, based on the RXP engine, which is GPL. This functionality is a little weak, since it discards embedded processing information, and is unable to represent DTDs internally. This is not a fault of RXP, but rather a temporary design decision. Read on. Validation: Done by RXP in the loading stage. This is clearly insufficient, as we want in-memory (late) validation, as well as reverse validation (generating a list of legal tags in a specific context). This calls for keeping datatypes in memory as a TT structure that can be used in conjunction with MTs. Such a structure might for instance be typedeffed "MTD" (markup tree definition - I have some ideas for TT validation too - this could be done with a similar "TTD" structure). This is where things get interesting. The most pressing task at hand is XML and DTD parsers/loaders, which can render serial text files in these standard formats to corresponding MTs and MTDs in memory, without semantic validation. This will replace the RXP code completely. I think a good way to make the parser would be to start out making a generic "markup file" parser, that can load any kind of file with <tag> </tag> <tag/> <? pi ?>, etc, and pass it on to the higher level, where the hierarchical stucture is actually put together. It is important for this lexer/parser not to demand that tags are matched by closing tags (e.g. accept sloppy tag closing), as this'll let us parse HTML later. On top of this, the parsing to MTs and MTD fragments from the document/its referred DTDs can take place. For sloppy tag matching, tag close points are guessed at, so that a valid MT can be represented in memory (you cannot represent unmatched tags with a tree structure). That was one task. It is currently open, I have some other things to finish. If noone claims this before january 5th, I'm likely to take it on. Of course, splitting it up for cooperation would be nice. Other tasks I can think of, off the top of my head:
Other subsystems of Flux you might take a look at:
|
Flux,
these web pages, and all related material are Copyright©1999-2000
Simplemente and the respective
authors, and are licensed under the GNU
GPL. Please see the
About page for more details. Web design by
Joakim Ziegler <joakim@simplemente.net>,
illustrations by Belinda Laws,
<boysdontcry@zombieworld.com>.
|