Usage


WARNING!!!

This library include Ruby version and C version of StringScanner. But Both class is completely different class really. Read whole this page before using it.

Porpose of this extension

StringScanner is Ruby extension for fast scanning.

Since Regexp class of Ruby cannot match to sub-string, to scan string you must make new String. For example

p " I_want_to_match_this_word but can't".index( /\A\w+/, 1 )
This code display "nil". Another way to match is as like this:
str = " word word word"
while str.size > 0 do
  if /\A[ \t]+/ === str then
    str = $'
  elsif /\A\w+/ === str then
    str = $'
  end
end
But this method has big problem on speed issue. $' makes new string EVERY time. Then, in this example, all these strings are created:
" word word word"
"word word word"
" word word"
"word word"
" word"
"word"
""
This makes heavy load. If length of 'str' is 50KB, nearly 50KB ** 2 / 5 = 50MB memory is used.

StringScanner resolves this.
StringScanner has C string and pointer to it. When scanning, StringScanner do only increment pointer and not create new string. As a result, both of speed and application memory size decrease.

simple examples, and methods

Then, here's two short example of scanning routine.
First is easy to write but slow scanning code. Second is also easy to write, but FAST scanning code using StringScanner class.

First example:

ATOM = /\A\w+/
SPACE = /\A[ \t]+/

while str.size > 0 do
  if ATOM === str then
    str = $'
    return $&
  elsif SPACE === str then
    str = $'
    return $&
  end
end

Second example:

ATOM = /\A\w+/
SPACE = /\A[ \t]+/

s = StringScanner.new( str )
while s.rest? do
  if tmp = s.scan( ATOM ) then
    return tmp
  elsif tmp = s.scan( SPACE ) then
    return tmp
  end
end

Usage of StringScanner is simple.
First: Create StringScanner object, next call 'scan' method. It return matched string and at the same time it increments its internal maintained "scan pointer". It is simply implemented as pointer to char(char*).
'skip' method is similer to 'scan', but it returns length of matched string.

s = StringScanner.new( "abcdefg" )   # scan pointer is on 'a', index 0
puts s.scan( /a/ )        # return 'a'. scan pointer is on 'b', index 1
puts s.skip( /bc/ )       # return 2. scan pointer is on 'd', index 3
At that time previous "scan pointer" is preserved in StringScanner object. Then, str[ prev pointer..current pointer ] means the string which is returned from 'scan' --- "matched string". We can get it by 'matched' method.

puts s.matched            # return 'bc'. scan pointer don't move
puts s.scan( /a/ )        # return nil. scan pointer don't move, too.
puts s.matched            # return 'bc'.

To puts scan pointer back, is also permitted. 'unscan' method implements that. But 'unscan' can do only once for one 'scan' because StringScanner object can't preserve more than one pointer.

puts s.scan( /de/ )       # return 'de'. scan pointer is on 'f', index 5
s.unscan                  # scan pointer is on 'd', index 3
puts s.scan( /def/ )      # return 'def'. scan pointer is on 'g', index 6
For more details, see reference manual. And of course source code is most inportant documentation, I think :-)

Ruby version strscan

Ruby version of StringScanner, StringScanner_R class resembles to C version, but it requires

This is troublesome, but there's no resolution for this problem.

If you want to use only C version, simply put this in your code:

StringScanner.must_C_version

Copyright (c) 1999-2001 Minero Aoki <aamine@cd.xdsl.ne.jp>