
FAQ
2.4.2
I'm using the XPath '//foo' on an XML branch node X, and
keep getting all of the 'foo' elements in the entire document.
Why? Shouldn't it return only the 'foo' element descendants of
X?
No. XPath specifies that '/' returns the document root,
regardless of the context node. '//' also starts at the document
root. If you want to limit your search to a branch, you need to use the
self:: axe. EG, 'self::node()//foo', or the shorthand
'.//foo'.
I want to parse a document both as a tree, and as a stream. Can I do
this?
Yes, and no. There is no mechanism that directly supports this in
REXML. However, aside from writing your own traversal layer, there is a
way of doing this. To turn a tree into a stream, just turn the branch you
want to process as a stream back into a string, and re-parse it with your
preferred API. EG: pp = PullParser.new( some_element.to_s ). The other
direction is more difficult; you basically have to build a tree from the
events. REXML will have one of these builders, eventually, but it
doesn't currently exist.
Why is Element.elements indexed off of '1' instead of
'0'?
Because of XPath. The XPath specification states that the index of the
first child node is '1'. Although it may be counter-intuitive to
base elements on 1, it is more undesireable to have element.elements[0] ==
element.elements[ 'node()[1]' ]. Since I can't change the
XPath specification, the result is that Element.elements[1] is the first
child element.
Why isn't REXML a validating parser?
Because validating parsers must include code that parses and interprets
DTDs. I hate DTDs. REXML supports the barest minimum of DTD parsing, and
even that isn't complete. There is DTD parsing code in the works, but
I only work on it when I'm really, really bored. Rumor has it that a
contributor is working on a DTD parser for REXML; rest assured that any
such contribution will be included with REXML as soon as it is available.
I'm trying to create an ISO-8859-1 document, but when I add text to
the document it isn't being properly encoded.
Regardless of what the encoding of your document is, when you add text
programmatically to a REXML document you must ensure that you are
only adding UTF-8 to the tree. In particular, you can't add ISO-8859-1
encoded text that contains characters above 0x80 to REXML trees -- you
must convert it to UTF-8 before doing so. Luckily, this is easy:
text.unpack('C*').pack('U*') will do the
trick. 7-bit ASCII is identical to UTF-8, so you probably won't need
to worry about this.
How do I get the tag name of an Element?
You take a look at the APIs, and notice that Element
includes Namespace. Then you click on the Namespace
link and look at the methods that Element includes from
Namespace. One of these is name(). Another is
expanded_name(). Yet another is prefix(). Then,
you email the author of rdoc and ask him to extend rdoc so that it lists
methods in the API that are included from other files, so that you
don't have to do all of that looking around for your method.