Class Bio::PhyloXML::Parser
In: lib/bio/db/phyloxml/phyloxml_parser.rb
Parent: Object

Description

Bio::PhyloXML::Parser is for parsing phyloXML format files.

Requirements

Libxml2 XML parser is required. Install libxml-ruby bindings from libxml.rubyforge.org or

  gem install -r libxml-ruby

Usage

  require 'bio'

 # Create new phyloxml parser
 phyloxml = Bio::PhyloXML::Parser.open('example.xml')

 # Print the names of all trees in the file
 phyloxml.each do |tree|
   puts tree.name
 end

References

www.phyloxml.org/documentation/version_100/phyloxml.xsd.html

Methods

[]   close   each   for_io   new   next_tree   open   open_uri  

Included Modules

LibXML

Attributes

other  [R]  After parsing all the trees, if there is anything else in other xml format, it is saved in this array of PhyloXML::Other objects

Public Class methods

Initializes LibXML::Reader and reads from the IO until it reaches the first phylogeny element.

Create a new Bio::PhyloXML::Parser object.

  p = Bio::PhyloXML::Parser.for_io($stdin)

Arguments:

  • (required) io: IO object
  • (optional) validate: For IO reader, the "validate" option is ignored and no validation is executed.
Returns:Bio::PhyloXML::Parser object

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 171
171:     def self.for_io(io, validate=true)
172:       obj = new(nil, validate)
173:       obj.instance_eval {
174:         @reader = XML::Reader.io(io,
175:                                  { :options =>
176:                                    LibXML::XML::Parser::Options::NONET })
177:         _skip_leader
178:       }
179:       obj
180:     end

Initializes LibXML::Reader and reads the PhyloXML-formatted string until it reaches the first phylogeny element.

Create a new Bio::PhyloXML::Parser object.

  str = File.read("./phyloxml_examples.xml")
  p = Bio::PhyloXML::Parser.new(str)

Deprecated usage: Reads data from a file. <em>str<em> is a filename.

  p = Bio::PhyloXML::Parser.new("./phyloxml_examples.xml")

Taking filename is deprecated. Use Bio::PhyloXML::Parser.open(filename).


Arguments:

  • (required) str: PhyloXML-formatted string
  • (optional) validate: Whether to validate the file against schema or not. Default value is true.
Returns:Bio::PhyloXML::Parser object

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 271
271:     def initialize(str, validate=true)
272: 
273:       @other = []
274: 
275:       return unless str
276: 
277:       # For compatibility, if filename-like string is given,
278:       # treat it as a filename.
279:       if /[\<\>\r\n]/ !~ str and File.exist?(str) then
280:         # assume that str is filename
281:         warn "Bio::PhyloXML::Parser.new(filename) is deprecated. Use Bio::PhyloXML::Parser.open(filename)."
282:         filename = _secure_filename(str)
283:         _validate(:file, filename) if validate
284:         @reader = XML::Reader.file(filename)
285:         _skip_leader
286:         return
287:       end
288: 
289:       # initialize for string
290:       @reader = XML::Reader.string(str,
291:                                    { :options =>
292:                                      LibXML::XML::Parser::Options::NONET })
293:       _skip_leader
294:     end

Initializes LibXML::Reader and reads the file until it reaches the first phylogeny element.

Create a new Bio::PhyloXML::Parser object.

  p = Bio::PhyloXML::Parser.open("./phyloxml_examples.xml")

Arguments:

  • (required) filename: Path to the file to parse.
  • (optional) validate: Whether to validate the file against schema or not. Default value is true.
Returns:Bio::PhyloXML::Parser object

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 90
 90:     def self.open(filename, validate=true)
 91:       obj = new(nil, validate)
 92:       obj.instance_eval {
 93:         filename = _secure_filename(filename)
 94:         _validate(:file, filename) if validate
 95:         # XML::Parser::Options::NONET for security reason
 96:         @reader = XML::Reader.file(filename,
 97:                                    { :options =>
 98:                                      LibXML::XML::Parser::Options::NONET })
 99:         _skip_leader
100:       }
101:       obj
102:     end

Initializes LibXML::Reader and reads the file until it reaches the first phylogeny element.

Create a new Bio::PhyloXML::Parser object.

  p = Bio::PhyloXML::Parser.open_uri("http://www.phyloxml.org/examples/apaf.xml")

Arguments:

  • (required) uri: (URI or String) URI to the data to parse
  • (optional) validate: For URI reader, the "validate" option is ignored and no validation is executed.
Returns:Bio::PhyloXML::Parser object

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 116
116:     def self.open_uri(uri, validate=true)
117:       case uri
118:       when URI
119:         uri = uri.to_s
120:       else
121:         # raises error if not a String
122:         uri = uri.to_str
123:         # raises error if invalid URI
124:         URI.parse(uri)
125:       end
126: 
127:       obj = new(nil, validate)
128:       obj.instance_eval {
129:         @reader = XML::Reader.file(uri)
130:         _skip_leader
131:       }
132:       obj
133:     end

Public Instance methods

Access the specified tree in the file. It parses trees until the specified tree is reached.

 # Get 3rd tree in the file (starts counting from 0).
 parser = PhyloXML::Parser.open('phyloxml_examples.xml')
 tree = parser[2]

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 317
317:     def [](i)
318:       tree = nil
319:       (i+1).times do
320:        tree =  self.next_tree
321:       end
322:       return tree
323:     end

Closes the LibXML::Reader inside the object. It also closes the opened file if it is created by using Bio::PhyloXML::Parser.open method.

When closed object is closed again, or closed object is used, it raises LibXML::XML::Error.


Returns:nil

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 153
153:     def close
154:       @reader.close
155:       @reader = ClosedPhyloXMLParser.new
156:       nil
157:     end

Iterate through all trees in the file.

 phyloxml = Bio::PhyloXML::Parser.open('example.xml')
 phyloxml.each do |tree|
   puts tree.name
 end

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 304
304:     def each
305:       while tree = next_tree
306:         yield tree
307:       end
308:     end

Parse and return the next phylogeny tree. If there are no more phylogeny element, nil is returned. If there is something else besides phylogeny elements, it is saved in the PhyloXML::Parser#other.

 p = Bio::PhyloXML::Parser.open("./phyloxml_examples.xml")
 tree = p.next_tree

Returns:Bio::PhyloXML::Tree

[Source]

     # File lib/bio/db/phyloxml/phyloxml_parser.rb, line 334
334:     def next_tree()
335: 
336:       if not is_element?('phylogeny')
337:         if @reader.node_type == XML::Reader::TYPE_END_ELEMENT
338:           if is_end_element?('phyloxml')
339:             return nil
340:           else
341:             @reader.read
342:             @reader.read
343:             if is_end_element?('phyloxml')
344:               return nil
345:             end
346:           end
347:         end        
348:         # phyloxml can hold only phylogeny and "other" elements. If this is not
349:         # phylogeny element then it is other. Also, "other" always comes after
350:         # all phylogenies        
351:         @other << parse_other        
352:         #return nil for tree, since this is not valid phyloxml tree.
353:         return nil
354:       end
355: 
356:       tree = Bio::PhyloXML::Tree.new
357: 
358:       # keep track of current node in clades array/stack. Current node is the
359:       # last element in the clades array
360:       clades = []
361:       clades.push tree
362:       
363:       #keep track of current edge to be able to parse branch_length tag
364:       current_edge = nil
365: 
366:       # we are going to parse clade iteratively by pointing (and changing) to
367:       # the current node in the tree. Since the property element is both in
368:       # clade and in the phylogeny, we need some boolean to know if we are
369:       # parsing the clade (there can be only max 1 clade in phylogeny) or
370:       # parsing phylogeny
371:       parsing_clade = false
372: 
373:       while not is_end_element?('phylogeny') do
374:         break if is_end_element?('phyloxml')
375:         
376:         # parse phylogeny elements, except clade
377:         if not parsing_clade
378: 
379:           if is_element?('phylogeny')
380:             @reader["rooted"] == "true" ? tree.rooted = true : tree.rooted = false
381:             @reader["rerootable"] == "true" ? tree.rerootable = true : tree.rerootable = false
382:             parse_attributes(tree, ["branch_length_unit", 'type'])
383:           end
384: 
385:           parse_simple_elements(tree, [ "name", 'description', "date"])
386: 
387:           if is_element?('confidence')
388:             tree.confidences << parse_confidence
389:           end
390: 
391:         end
392: 
393:         if @reader.node_type == XML::Reader::TYPE_ELEMENT
394:           case @reader.name
395:           when 'clade'
396:             #parse clade element
397: 
398:             parsing_clade = true
399: 
400:             node= Bio::PhyloXML::Node.new
401: 
402:             branch_length = @reader['branch_length']
403: 
404:             parse_attributes(node, ["id_source"])
405: 
406:             #add new node to the tree
407:             tree.add_node(node)
408:             # The first clade will always be root since by xsd schema phyloxml can
409:             # have 0 to 1 clades in it.
410:             if tree.root == nil
411:               tree.root = node
412:             else
413:               current_edge = tree.add_edge(clades[-1], node,
414:                                            Bio::Tree::Edge.new(branch_length))
415:             end
416:             clades.push node
417:             #end if clade element
418:           else
419:            parse_clade_elements(clades[-1], current_edge) if parsing_clade
420:           end
421:         end
422: 
423:         #end clade element, go one parent up
424:         if is_end_element?('clade')
425: 
426:            #if we have reached the closing tag of the top-most clade, then our
427:           # curent node should point to the root, If thats the case, we are done
428:           # parsing the clade element
429:           if clades[-1] == tree.root
430:             parsing_clade = false
431:           else
432:             # set current node (clades[-1) to the previous clade in the array
433:             clades.pop
434:           end
435:         end          
436: 
437:         #parsing phylogeny elements
438:         if not parsing_clade
439: 
440:           if @reader.node_type == XML::Reader::TYPE_ELEMENT
441:             case @reader.name
442:             when 'property'
443:               tree.properties << parse_property
444: 
445:             when 'clade_relation'
446:               clade_relation = CladeRelation.new
447:               parse_attributes(clade_relation, ["id_ref_0", "id_ref_1", "distance", "type"])
448: 
449:               #@ add unit test for this
450:               if not @reader.empty_element?
451:                 @reader.read
452:                 if is_element?('confidence')
453:                   clade_relation.confidence = parse_confidence
454:                 end
455:               end
456:               tree.clade_relations << clade_relation
457: 
458:             when 'sequence_relation'
459:               sequence_relation = SequenceRelation.new
460:               parse_attributes(sequence_relation, ["id_ref_0", "id_ref_1", "distance", "type"])
461:               if not @reader.empty_element?
462:                 @reader.read
463:                 if is_element?('confidence')
464:                   sequence_relation.confidence = parse_confidence
465:                 end
466:               end
467:               tree.sequence_relations << sequence_relation
468:             when 'phylogeny'
469:               #do nothing
470:             else
471:               tree.other << parse_other
472:               #puts "Not recognized element. #{@reader.name}"
473:             end
474:           end
475:         end
476:         # go to next element        
477:         @reader.read    
478:       end #end while not </phylogeny>
479:       #move on to the next tag after /phylogeny which is text, since phylogeny
480:       #end tag is empty element, which value is nil, therefore need to move to
481:       #the next meaningful element (therefore @reader.read twice)
482:       @reader.read 
483:       @reader.read
484: 
485:       return tree
486:     end

[Validate]