Class Bio::FlatFile
In: lib/bio/io/flatfile.rb
lib/bio/io/flatfile/autodetection.rb
lib/bio/io/flatfile/buffer.rb
lib/bio/io/flatfile/splitter.rb
Parent: Object

Bio::FlatFile is a helper and wrapper class to read a biological data file. It acts like a IO object. It can automatically detect data format, and users do not need to tell the class what the data is.

Methods

Included Modules

Enumerable

Classes and Modules

Module Bio::FlatFile::Splitter
Class Bio::FlatFile::AutoDetect
Class Bio::FlatFile::BufferedInputStream
Class Bio::FlatFile::UnknownDataFormatError

Attributes

dbclass  [R]  Returns database class which is automatically detected or given in FlatFile#initialize.
entry  [R] 
raw  [R]  If true, raw mode.
skip_leader_mode  [RW]  The mode how to skip leader of the data.
:firsttime :(DEFAULT) only head of file (= first time to read)
:everytime :everytime to read entry
nil :never skip

Public Class methods

Same as Bio::FlatFile.open(nil, filename_or_stream, mode, perm, options).

  • Example 1
     Bio::FlatFile.auto(ARGF)
    
  • Example 2
     Bio::FlatFile.auto("embl/est_hum17.dat")
    
  • Example 3
     Bio::FlatFile.auto(IO.popen("gzip -dc nc1101.flat.gz"))
    

[Source]

     # File lib/bio/io/flatfile.rb, line 122
122:     def self.auto(*arg, &block)
123:       self.open(nil, *arg, &block)
124:     end

Detects database class (== file format) of given string. If fails to determine, returns false or nil.

[Source]

     # File lib/bio/io/flatfile.rb, line 460
460:     def self.autodetect(text)
461:       AutoDetect.default.autodetect(text)
462:     end

Detects database class (== file format) of given file. If fails to determine, returns nil.

[Source]

     # File lib/bio/io/flatfile.rb, line 440
440:     def self.autodetect_file(filename)
441:       self.open_file(filename).dbclass
442:     end

Detects database class (== file format) of given input stream. If fails to determine, returns nil. Caution: the method reads some data from the input stream, and the data will be lost.

[Source]

     # File lib/bio/io/flatfile.rb, line 448
448:     def self.autodetect_io(io)
449:       self.new(nil, io).dbclass
450:     end

This is OBSOLETED. Please use autodetect_io(io) instead.

[Source]

     # File lib/bio/io/flatfile.rb, line 453
453:     def self.autodetect_stream(io)
454:       $stderr.print "Bio::FlatFile.autodetect_stream will be deprecated." if $VERBOSE
455:       self.autodetect_io(io)
456:     end

Executes the block for every entry in the stream. Same as FlatFile.open(*arg) { |ff| ff.each { |entry| … }}.

  • Example
      Bio::FlatFile.foreach('test.fst') { |e| puts e.definition }
    

[Source]

     # File lib/bio/io/flatfile.rb, line 194
194:     def self.foreach(*arg)
195:       self.open(*arg) do |flatfileobj|
196:         flatfileobj.each do |entry|
197:           yield entry
198:         end
199:       end
200:     end

Same as FlatFile.open, except that ‘stream’ should be a opened stream object (IO, File, …, who have the ‘gets’ method).

  • Example 1
     Bio::FlatFile.new(Bio::GenBank, ARGF)
    
  • Example 2
     Bio::FlatFile.new(Bio::GenBank, IO.popen("gzip -dc nc1101.flat.gz"))
    

Compatibility Note: Now, you cannot specify ":raw => true" or ":raw => false". Below styles are DEPRECATED.

  • Example 3 (deprecated)
     # Bio::FlatFile.new(nil, $stdin, :raw=>true) # => ERROR
     # Please rewrite as below.
     ff = Bio::FlatFile.new(nil, $stdin)
     ff.raw = true
    
  • Example 3 in old style (deprecated)
     # Bio::FlatFile.new(nil, $stdin, true) # => ERROR
     # Please rewrite as below.
     ff = Bio::FlatFile.new(nil, $stdin)
     ff.raw = true
    

[Source]

     # File lib/bio/io/flatfile.rb, line 225
225:     def initialize(dbclass, stream)
226:       # 2nd arg: IO object
227:       if stream.kind_of?(BufferedInputStream)
228:         @stream = stream
229:       else
230:         @stream = BufferedInputStream.for_io(stream)
231:       end
232:       # 1st arg: database class (or file format autodetection)
233:       if dbclass then
234:         self.dbclass = dbclass
235:       else
236:         autodetect
237:       end
238:       #
239:       @skip_leader_mode = :firsttime
240:       @firsttime_flag = true
241:       # default raw mode is false
242:       self.raw = false
243:     end
  Bio::FlatFile.open(file, *arg)
  Bio::FlatFile.open(dbclass, file, *arg)

Creates a new Bio::FlatFile object to read a file or a stream which contains dbclass data.

dbclass should be a class (or module) or nil. e.g. Bio::GenBank, Bio::FastaFormat.

If file is a filename (which doesn‘t have gets method), the method opens a local file named file with File.open(filename, *arg).

When dbclass is omitted or nil is given to dbclass, the method tries to determine database class (file format) automatically. When it fails to determine, dbclass is set to nil and FlatFile#next_entry would fail. You can still set dbclass using FlatFile#dbclass= method.

  • Example 1
      Bio::FlatFile.open(Bio::GenBank, "genbank/gbest40.seq")
    
  • Example 2
      Bio::FlatFile.open(nil, "embl/est_hum17.dat")
    
  • Example 3
      Bio::FlatFile.open("genbank/gbest40.seq")
    
  • Example 4
      Bio::FlatFile.open(Bio::GenBank, $stdin)
    

If it is called with a block, the block will be executed with a new Bio::FlatFile object. If filename is given, the file is automatically closed when leaving the block.

  • Example 5
      Bio::FlatFile.open(nil, 'test4.fst') do |ff|
          ff.each { |e| print e.definition, "\n" }
      end
    
  • Example 6
      Bio::FlatFile.open('test4.fst') do |ff|
          ff.each { |e| print e.definition, "\n" }
      end
    

Compatibility Note: *arg is completely passed to the File.open and you cannot specify ":raw => true" or ":raw => false".

[Source]

     # File lib/bio/io/flatfile.rb, line 80
 80:     def self.open(*arg, &block)
 81:       # FlatFile.open(dbclass, file, mode, perm)
 82:       # FlatFile.open(file, mode, perm)
 83:       if arg.size <= 0
 84:         raise ArgumentError, 'wrong number of arguments (0 for 1)'
 85:       end
 86:       x = arg.shift
 87:       if x.is_a?(Module) then
 88:         # FlatFile.open(dbclass, filename_or_io, ...)
 89:         dbclass = x
 90:       elsif x.nil? then
 91:         # FlatFile.open(nil, filename_or_io, ...)
 92:         dbclass = nil
 93:       else
 94:         # FlatFile.open(filename, ...)
 95:         dbclass = nil
 96:         arg.unshift(x)
 97:       end
 98:       if arg.size <= 0
 99:         raise ArgumentError, 'wrong number of arguments (1 for 2)'
100:       end
101:       file = arg.shift
102:       # check if file is filename or IO object
103:       unless file.respond_to?(:gets)
104:         # 'file' is a filename
105:         _open_file(dbclass, file, *arg, &block)
106:       else
107:         # 'file' is a IO object
108:         ff = self.new(dbclass, file)
109:         block_given? ? (yield ff) : ff
110:       end
111:     end

Same as FlatFile.auto(filename, *arg), except that it only accept filename and doesn‘t accept IO object. File format is automatically determined.

It can accept a block. If a block is given, it returns the block‘s return value. Otherwise, it returns a new FlatFile object.

[Source]

     # File lib/bio/io/flatfile.rb, line 144
144:     def self.open_file(filename, *arg)
145:       _open_file(nil, filename, *arg)
146:     end

Opens URI specified as uri. uri must be a String or URI object. *arg is passed to OpenURI.open_uri or URI#open.

Like FlatFile#open, it can accept a block.

Note that you MUST explicitly require ‘open-uri’. Because open-uri.rb modifies existing class, it isn‘t required by default.

[Source]

     # File lib/bio/io/flatfile.rb, line 177
177:     def self.open_uri(uri, *arg)
178:       if block_given? then
179:         BufferedInputStream.open_uri(uri, *arg) do |stream|
180:           yield self.new(nil, stream)
181:         end
182:       else
183:         stream = BufferedInputStream.open_uri(uri, *arg)
184:         self.new(nil, stream)
185:       end
186:     end

Same as FlatFile.auto(filename_or_stream, *arg).to_a

(This method might be OBSOLETED in the future.)

[Source]

     # File lib/bio/io/flatfile.rb, line 129
129:     def self.to_a(*arg)
130:       self.auto(*arg) do |ff|
131:         raise 'cannot determine file format' unless ff.dbclass
132:         ff.to_a
133:       end
134:     end

Public Instance methods

Performs determination of database class (file format). Pre-reads lines lines for format determination (default 31 lines). If fails, returns nil or false. Otherwise, returns database class.

The method can be called anytime if you want (but not recommended). This might be useful if input file is a mixture of muitiple format data.

[Source]

     # File lib/bio/io/flatfile.rb, line 429
429:     def autodetect(lines = 31, ad = AutoDetect.default)
430:       if r = ad.autodetect_flatfile(self, lines)
431:         self.dbclass = r
432:       else
433:         self.dbclass = nil unless self.dbclass
434:       end
435:       r
436:     end

Closes input stream. (similar to IO#close)

[Source]

     # File lib/bio/io/flatfile.rb, line 351
351:     def close
352:       @stream.close
353:     end

Sets database class. Plese use only if autodetect fails.

[Source]

     # File lib/bio/io/flatfile.rb, line 400
400:     def dbclass=(klass)
401:       if klass then
402:         @dbclass = klass
403:         begin
404:           @splitter = @dbclass.flatfile_splitter(@dbclass, @stream)
405:         rescue NameError, NoMethodError
406:           begin
407:             splitter_class = @dbclass::FLATFILE_SPLITTER
408:           rescue NameError
409:             splitter_class = Splitter::Default
410:           end
411:           @splitter = splitter_class.new(klass, @stream)
412:         end
413:       else
414:         @dbclass = nil
415:         @splitter = nil
416:       end
417:     end
each()

Alias for each_entry

Iterates over each entry in the flatfile.

  • Example
     include Bio
     ff = FlatFile.open(GenBank, "genbank/gbhtg14.seq")
     ff.each_entry do |x|
       puts x.definition
     end
    

[Source]

     # File lib/bio/io/flatfile.rb, line 334
334:     def each_entry
335:       while e = self.next_entry
336:         yield e
337:       end
338:     end

(end position of the last entry) + 1

[Source]

     # File lib/bio/io/flatfile.rb, line 322
322:     def entry_ended_pos
323:       @splitter.entry_ended_pos
324:     end

a flag to write down entry start and end positions

[Source]

     # File lib/bio/io/flatfile.rb, line 307
307:     def entry_pos_flag
308:       @splitter.entry_pos_flag
309:     end

Sets flag to write down entry start and end positions

[Source]

     # File lib/bio/io/flatfile.rb, line 312
312:     def entry_pos_flag=(x)
313:       @splitter.entry_pos_flag = x
314:     end

Returns the last raw entry as a string.

[Source]

     # File lib/bio/io/flatfile.rb, line 302
302:     def entry_raw
303:       @splitter.entry
304:     end

start position of the last entry

[Source]

     # File lib/bio/io/flatfile.rb, line 317
317:     def entry_start_pos
318:       @splitter.entry_start_pos
319:     end

Returns true if input stream is end-of-file. Otherwise, returns false. (Similar to IO#eof?, but may not be equal to io.eof?, because FlatFile has its own internal buffer.)

[Source]

     # File lib/bio/io/flatfile.rb, line 380
380:     def eof?
381:       @stream.eof?
382:     end

Similar to IO#gets. Internal use only. Users should not call it directly.

[Source]

     # File lib/bio/io/flatfile.rb, line 395
395:     def gets(*arg)
396:       @stream.gets(*arg)
397:     end

(DEPRECATED) IO object in the flatfile object.

Compatibility Note: Bio::FlatFile#io is deprecated. Please use Bio::FlatFile#to_io instead.

[Source]

     # File lib/bio/io/flatfile.rb, line 255
255:     def io
256:       warn "Bio::FlatFile#io is deprecated."
257:       @stream.to_io
258:     end

Get next entry.

[Source]

     # File lib/bio/io/flatfile.rb, line 277
277:     def next_entry
278:       raise UnknownDataFormatError, 
279:       'file format auto-detection failed?' unless @dbclass
280:       if @skip_leader_mode and
281:           ((@firsttime_flag and @skip_leader_mode == :firsttime) or
282:              @skip_leader_mode == :everytime)
283:         @splitter.skip_leader
284:       end
285:       if raw then
286:         r = @splitter.get_entry
287:       else
288:         r = @splitter.get_parsed_entry
289:       end
290:       @firsttime_flag = false
291:       return nil unless r
292:       if raw then
293:         r
294:       else
295:         @entry = r
296:         @entry
297:       end
298:     end

Pathname, filename or URI (or nil).

[Source]

     # File lib/bio/io/flatfile.rb, line 268
268:     def path
269:       @stream.path
270:     end

Returns current position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos. Note that it will not be equal to io.pos, because FlatFile has its own internal buffer.

[Source]

     # File lib/bio/io/flatfile.rb, line 361
361:     def pos
362:       @stream.pos
363:     end

(Not recommended to use it.) Sets position of input stream. If the input stream is not a normal file, the result is not guaranteed. It is similar to IO#pos=. Note that it will not be equal to io.pos=, because FlatFile has its own internal buffer.

[Source]

     # File lib/bio/io/flatfile.rb, line 372
372:     def pos=(p)
373:       @stream.pos=(p)
374:     end

If true is given, the next_entry method returns a entry as a text, whereas if false, returns as a parsed object.

[Source]

     # File lib/bio/io/flatfile.rb, line 386
386:     def raw=(bool)
387:       @raw = (bool ? true : false)
388:     end

Resets file pointer to the start of the flatfile. (similar to IO#rewind)

[Source]

     # File lib/bio/io/flatfile.rb, line 343
343:     def rewind
344:       r = (@splitter || @stream).rewind
345:       @firsttime_flag = true
346:       r
347:     end

IO object in the flatfile object.

Compatibility Note: Bio::FlatFile#io is deprecated.

[Source]

     # File lib/bio/io/flatfile.rb, line 263
263:     def to_io
264:       @stream.to_io
265:     end

[Validate]