This library include Ruby version and C version of StringScanner. But Both class is completely different class really. Read whole this page before using it.
StringScanner is Ruby extension for fast scanning.
Since Regexp class of Ruby cannot match to sub-string, to scan string you must make new String. For example
This code display "nil". Another way to match is as like this:p " I_want_to_match_this_word but can't".index( /\A\w+/, 1 )
But this method has big problem on speed issue. $' makes new string EVERY time. Then, in this example, all these strings are created:str = " word word word" while str.size > 0 do if /\A[ \t]+/ === str then str = $' elsif /\A\w+/ === str then str = $' end end
This makes heavy load. If length of 'str' is 50KB, nearly 50KB ** 2 / 5 = 50MB memory is used." word word word" "word word word" " word word" "word word" " word" "word" ""
StringScanner resolves this.
StringScanner has C string and pointer to it. When scanning, StringScanner
do only increment pointer and not create new string. As a result, both of
speed and application memory size decrease.
Then, here's two short example of scanning routine.
First is easy to write but slow scanning code. Second is also easy to write,
but FAST scanning code using StringScanner class.
First example:
ATOM = /\A\w+/ SPACE = /\A[ \t]+/ while str.size > 0 do if ATOM === str then str = $' return $& elsif SPACE === str then str = $' return $& end end
Second example:
ATOM = /\A\w+/ SPACE = /\A[ \t]+/ s = StringScanner.new( str ) while s.rest? do if tmp = s.scan( ATOM ) then return tmp elsif tmp = s.scan( SPACE ) then return tmp end end
Usage of StringScanner is simple.
First: Create StringScanner object, next call 'scan' method. It return matched
string and at the same time it increments its internal maintained "scan pointer".
It is simply implemented as pointer to char(char*).
'skip' method is similer to 'scan', but it returns length of matched string.
At that time previous "scan pointer" is preserved in StringScanner object. Then, str[ prev pointer..current pointer ] means the string which is returned from 'scan' --- "matched string". We can get it by 'matched' method.s = StringScanner.new( "abcdefg" ) # scan pointer is on 'a', index 0 puts s.scan( /a/ ) # return 'a'. scan pointer is on 'b', index 1 puts s.skip( /bc/ ) # return 2. scan pointer is on 'd', index 3
puts s.matched # return 'bc'. scan pointer don't move puts s.scan( /a/ ) # return nil. scan pointer don't move, too. puts s.matched # return 'bc'.To puts scan pointer back, is also permitted. 'unscan' method implements that. But 'unscan' can do only once for one 'scan' because StringScanner object can't preserve more than one pointer.
For more details, see reference manual. And of course source code is most inportant documentation, I think :-)puts s.scan( /de/ ) # return 'de'. scan pointer is on 'f', index 5 s.unscan # scan pointer is on 'd', index 3 puts s.scan( /def/ ) # return 'def'. scan pointer is on 'g', index 6Ruby version strscan
Ruby version of StringScanner, StringScanner_R class resembles to C version, but it requires
This is troublesome, but there's no resolution for this problem.
If you want to use only C version, simply put this in your code:
StringScanner.must_C_version
Copyright (c) 1999-2001 Minero Aoki <aamine@cd.xdsl.ne.jp>