Path: | lib/bio/db.rb |
Last Update: | Sun Dec 04 02:16:41 +0000 2011 |
Copyright: | Copyright (C) 2001, 2002, 2005 Toshiaki Katayama <k@bioruby.org> |
License: | The Ruby License |
$Id: db.rb,v 0.38 2007/05/08 17:02:13 nakao Exp $
The flatfile parsers (sub classes of the Bio::DB) split the original entry into a Hash and store the hash in the @orig instance variable. To parse in detail is delayed until the method is called which requires a further parsing of a content of the @orig hash. Fully parsed data is cached in the another hash, @data, separately.
— Bio::DB.new(entry)
The ‘new’ method should accept the entire entry in one String and return the parsed database object.
— Bio::DB#entry_id
Database classes should implement the following methods if appropriate:
Every sub class should define the following constants if appropriate:
|<- tag ->||<- data ---->| ENTRY_ID A12345 DEFINITION Hoge gene of the Pokemonia pikachuae
module Bio class Hoge < DB DELIMITER = RS = "\n//\n" TAGSIZE = 12 # You can omit this line if not needed def initialize(entry) end def entry_id end end # class Hoge end # module Bio
In general, the method name should be in the singular form when returns a Object (including the case when the Object is a String), and should be the plural form when returns same Objects in Array. It depends on the database classes that which form of the method name can be use.
For example, GenBank has several REFERENCE fields in one entry, so define Bio::GenBank#references and this method should return an Array of the Reference objects. On the other hand, MEDLINE has one REFERENCE information per one entry, so define Bio::MEDLINE#reference method and this should return a Reference object.
The method names used in the sub classes should be taken from the following list if appropriate:
— entry_id #=> String
The entry identifier.
— definition #=> String
The description of the entry.
— reference #=> Bio::Reference — references #=> Array of Bio::Reference
The reference field(s) of the entry.
— dblink #=> String — dblinks #=> Array of String
The link(s) to the other database entry.
— naseq #=> Bio::Sequence::NA
The DNA/RNA sequence of the entry.
— nalen #=> Integer
The length of the DNA/RNA sequence of the entry.
— aaseq #=> Bio::Sequence::AA
The amino acid sequence of the entry.
— aalen #=> Integer
The length of the amino acid sequence of the entry.
— seq #=> Bio::Sequence::NA or Bio::Sequence::AA
Returns an appropriate sequence object.
— position #=> String
The position of the sequence in the entry or in the genome (depends on the database).
— locations #=> Bio::Locations
Returns Bio::Locations.new(position).
— division #=> String
The sub division name of the database.
— date #=> String
The date of the entry. Should we use Date (by ParseDate) instead of String?
— gene #=> String — genes #=> Array of String
The name(s) of the gene.
— organism #=> String
The name of the organism.