Class Bio::Locations
In: lib/bio/location.rb
Parent: Object

Description

The Bio::Locations class is a container for Bio::Location objects: creating a Bio::Locations object (based on a GenBank style position string) will spawn an array of Bio::Location objects.

Usage

  locations = Bio::Locations.new('join(complement(500..550), 600..625)')
  locations.each do |loc|
    puts "class = " + loc.class.to_s
    puts "range = #{loc.from}..#{loc.to} (strand = #{loc.strand})"
  end
  # Output would be:
  #   class = Bio::Location
  #   range = 500..550 (strand = -1)
  #   class = Bio::Location
  #   range = 600..625 (strand = 1)

 # For the following three location strings, print the span and range
 ['one-of(898,900)..983',
  'one-of(5971..6308,5971..6309)',
  '8050..one-of(10731,10758,10905,11242)'].each do |loc|
     location = Bio::Locations.new(loc)
     puts location.span
     puts location.range
 end

GenBank location descriptor classification

Definition of the position notation of the GenBank location format

According to the GenBank manual ‘gbrel.txt’, position notations were classified into 10 patterns - (A) to (J).

  3.4.12.2 Feature Location

    The second column of the feature descriptor line designates the
  location of the feature in the sequence. The location descriptor
  begins at position 22. Several conventions are used to indicate
  sequence location.

    Base numbers in location descriptors refer to numbering in the entry,
  which is not necessarily the same as the numbering scheme used in the
  published report. The first base in the presented sequence is numbered
  base 1. Sequences are presented in the 5 to 3 direction.

  Location descriptors can be one of the following:

  (A) 1. A single base;

  (B) 2. A contiguous span of bases;

  (C) 3. A site between two bases;

  (D) 4. A single base chosen from a range of bases;

  (E) 5. A single base chosen from among two or more specified bases;

  (F) 6. A joining of sequence spans;

  (G) 7. A reference to an entry other than the one to which the feature
       belongs (i.e., a remote entry), followed by a location descriptor
       referring to the remote sequence;

  (H) 8. A literal sequence (a string of bases enclosed in quotation marks).

Description commented with pattern IDs.

  (C)   A site between two residues, such as an endonuclease cleavage site, is
      indicated by listing the two bases separated by a carat (e.g., 23^24).

  (D)   A single residue chosen from a range of residues is indicated by the
      number of the first and last bases in the range separated by a single
      period (e.g., 23.79). The symbols < and > indicate that the end point
  (I) of the range is beyond the specified base number.

  (B)   A contiguous span of bases is indicated by the number of the first and
      last bases in the range separated by two periods (e.g., 23..79). The
  (I) symbols < and > indicate that the end point of the range is beyond the
      specified base number. Starting and ending positions can be indicated
      by base number or by one of the operators described below.

        Operators are prefixes that specify what must be done to the indicated
      sequence to locate the feature. The following are the operators
      available, along with their most common format and a description.

  (J) complement (location): The feature is complementary to the location
      indicated. Complementary strands are read 5 to 3.

  (F) join (location, location, .. location): The indicated elements should
      be placed end to end to form one contiguous sequence.

  (F) order (location, location, .. location): The elements are found in the
      specified order in the 5 to 3 direction, but nothing is implied about
      the rationality of joining them.

  (F) group (location, location, .. location): The elements are related and
      should be grouped together, but no order is implied.

  (E) one-of (location, location, .. location): The element can be any one,
    but only one, of the items listed.

Reduction strategy of the position notations

Methods

==   []   absolute   each   equals?   first   last   length   new   range   relative   size   span   to_s  

Included Modules

Enumerable

Attributes

locations  [RW]  (Array) An Array of Bio::Location objects
operator  [RW]  (Symbol or nil) Operator. nil (means :join), :order, or :group (obsolete).

Public Class methods

Parses a GenBank style position string and returns a Bio::Locations object, which contains a list of Bio::Location objects.

  locations = Bio::Locations.new('join(complement(500..550), 600..625)')

Arguments:

  • (required) str: GenBank style position string
Returns:Bio::Locations object

[Source]

     # File lib/bio/location.rb, line 346
346:   def initialize(position)
347:     @operator = nil
348:     if position.is_a? Array
349:       @locations = position
350:     else
351:       position   = gbl_cleanup(position)        # preprocessing
352:       @locations = gbl_pos2loc(position)        # create an Array of Bio::Location objects
353:     end
354:   end

Public Instance methods

If other is equal with the self, returns true. Otherwise, returns false.


Arguments:

  • (required) other: any object
Returns:true or false

[Source]

     # File lib/bio/location.rb, line 381
381:   def ==(other)
382:     return true if super(other)
383:     return false unless other.instance_of?(self.class)
384:     if self.locations == other.locations and
385:         self.operator == other.operator then
386:       true
387:     else
388:       false
389:     end
390:   end

Returns nth Bio::Location object.

[Source]

     # File lib/bio/location.rb, line 400
400:   def [](n)
401:     @locations[n]
402:   end

Converts relative position in the locus to position in the whole of the DNA sequence.

This method can for example be used to relate positions in a DNA-sequence with those in RNA. In this use, the optional ’:aa’-flag returns the position of the associated amino-acid rather than the nucleotide.

  loc = Bio::Locations.new('complement(12838..13533)')
  puts loc.absolute(10)          # => 13524
  puts loc.absolute(10, :aa)     # => 13506

Arguments:

  • (required) position: nucleotide position within locus
  • _:aa_: flag to be used if position is a aminoacid position rather than a nucleotide position
Returns:position within the whole of the sequence

[Source]

     # File lib/bio/location.rb, line 490
490:   def absolute(n, type = nil)
491:     case type
492:     when :location
493:       ;
494:     when :aa
495:       n = (n - 1) * 3 + 1
496:       rel2abs(n)
497:     else
498:       rel2abs(n)
499:     end
500:   end

Iterates on each Bio::Location object.

[Source]

     # File lib/bio/location.rb, line 393
393:   def each
394:     @locations.each do |x|
395:       yield(x)
396:     end
397:   end

Evaluate equality of Bio::Locations object.

[Source]

     # File lib/bio/location.rb, line 364
364:   def equals?(other)
365:     if ! other.kind_of?(Bio::Locations)
366:       return nil
367:     end
368:     if self.sort == other.sort
369:       return true
370:     else
371:       return false
372:     end
373:   end

Returns first Bio::Location object.

[Source]

     # File lib/bio/location.rb, line 405
405:   def first
406:     @locations.first
407:   end

Returns last Bio::Location object.

[Source]

     # File lib/bio/location.rb, line 410
410:   def last
411:     @locations.last
412:   end

Returns a length of the spliced RNA.

[Source]

     # File lib/bio/location.rb, line 429
429:   def length
430:     len = 0
431:     @locations.each do |x|
432:       if x.sequence
433:         len += x.sequence.size
434:       else
435:         len += (x.to - x.from + 1)
436:       end
437:     end
438:     len
439:   end

Similar to span, but returns a Range object min..max

[Source]

     # File lib/bio/location.rb, line 423
423:   def range
424:     min, max = span
425:     min..max
426:   end

Converts absolute position in the whole of the DNA sequence to relative position in the locus.

This method can for example be used to relate positions in a DNA-sequence with those in RNA. In this use, the optional ’:aa’-flag returns the position of the associated amino-acid rather than the nucleotide.

  loc = Bio::Locations.new('complement(12838..13533)')
  puts loc.relative(13524)        # => 10
  puts loc.relative(13506, :aa)   # => 3

Arguments:

  • (required) position: nucleotide position within whole of the sequence
  • _:aa_: flag that lets method return position in aminoacid coordinates
Returns:position within the location

[Source]

     # File lib/bio/location.rb, line 458
458:   def relative(n, type = nil)
459:     case type
460:     when :location
461:       ;
462:     when :aa
463:       if n = abs2rel(n)
464:         (n - 1) / 3 + 1
465:       else
466:         nil
467:       end
468:     else
469:       abs2rel(n)
470:     end
471:   end
size()

Alias for length

Returns an Array containing overall min and max position [min, max] of this Bio::Locations object.

[Source]

     # File lib/bio/location.rb, line 416
416:   def span
417:     span_min = @locations.min { |a,b| a.from <=> b.from }
418:     span_max = @locations.max { |a,b| a.to   <=> b.to   }
419:     return span_min.from, span_max.to
420:   end

String representation.

Note: In some cases, it fails to detect whether "complement(join(…))" or "join(complement(..))", and whether "complement(order(…))" or "order(complement(..))".


Returns:String

[Source]

     # File lib/bio/location.rb, line 511
511:   def to_s
512:     return '' if @locations.empty?
513:     complement_join = false
514:     locs = @locations
515:     if locs.size >= 2 and locs.inject(true) do |flag, loc|
516:         # check if each location is complement
517:         (flag && (loc.strand == -1) && !loc.xref_id)
518:       end and locs.inject(locs[0].from) do |pos, loc|
519:         if pos then
520:           (pos >= loc.from) ? loc.from : false
521:         else
522:           false
523:         end
524:       end then
525:       locs = locs.reverse
526:       complement_join = true
527:     end
528:     locs = locs.collect do |loc|
529:       lt = loc.lt ? '<' : ''
530:       gt = loc.gt ? '>' : ''
531:       str = if loc.from == loc.to then
532:               "#{lt}#{gt}#{loc.from.to_i}"
533:             elsif loc.carat then
534:               "#{lt}#{loc.from.to_i}^#{gt}#{loc.to.to_i}"
535:             else
536:               "#{lt}#{loc.from.to_i}..#{gt}#{loc.to.to_i}"
537:             end
538:       if loc.xref_id and !loc.xref_id.empty? then
539:         str = "#{loc.xref_id}:#{str}"
540:       end
541:       if loc.strand == -1 and !complement_join then
542:         str = "complement(#{str})"
543:       end
544:       if loc.sequence then
545:         str = "replace(#{str},\"#{loc.sequence}\")"
546:       end
547:       str
548:     end
549:     if locs.size >= 2 then
550:       op = (self.operator || 'join').to_s
551:       result = "#{op}(#{locs.join(',')})"
552:     else
553:       result = locs[0]
554:     end
555:     if complement_join then
556:       result = "complement(#{result})"
557:     end
558:     result
559:   end

[Validate]