README

Path: README
Last Update: Mon Nov 29 07:51:35 +0000 2010
                   Unicode Library for Ruby
                        Version 0.3.0

                       Yoshida Masato
  • Introduction

    Unicode string manipulation library for Ruby. This library is based on UTR 15 Unicode Normalization Forms(*1).

      *1 <URL:http://www.unicode.org/unicode/reports/tr15/>
    
  • Install

    This can work with ruby-1.8 or later. I recommend you to use ruby-1.8.1 or later.

    Make and install usually. For example, when Ruby supports dynamic linking on your OS,

      ruby extconf.rb
      make
      make install
    
  • Usage

    If you do not link this module with Ruby statically,

      require "unicode"
    

    before using.

  • Module Functions

    All parameters of functions must be UTF-8 strings.

    Unicode::strcmp(str1, str2) Unicode::strcmp_compat(str1, str2)

      Compare Unicode strings with a normalization.
      strcmp uses the Normalization Form D, strcmp_compat uses
      Normalization Form KD.
    

    Unicode::decompose(str) Unicode::decompose_compat(str)

      Decompose Unicode string. Then the trailing characters
      are sorted in canonical order.
      decompose uses the canonical decomposition,
      decompose_compat uses the compatibility decomposition.
      The decomposition is based on the character decomposition
      mapping in UnicodeData.txt and the Hangul decomposition
      algorithm.
    

    Unicode::compose(str)

      Compose Unicode string. Before composing, the trailing
      characters are sorted in canonical order.
      The parameter must be decomposed.
      The composition is based on the reverse of the
      character decomposition mapping in UnicodeData.txt,
      CompositionExclusions.txt and the Hangul composition
      algorithm.
    

    Unicode::normalize_D(str) Unicode::normalize_KD(str)

      Normalize Unicode string in form D or form KD.
      These are aliases of decompose/decompose_compat.
    

    Unicode::normalize_C(str) Unicode::normalize_KC(str)

      Normalize Unicode string in form C or form KC.
        normalize_C  = decompose + compose
        normalize_KC = decompose_compat + compose
    

    Unicode::upcase(str) Unicode::downcase(str) Unicode::capitalize(str)

      Case conversion functions.
      The mappings that are used by these functions are not normative
      in UnicodeData.txt.
    
  • Bugs

    UTR 15 suggests that the look up for Normalization Form C should not be implemented with a hash of string for better performance.

    Case conversion functions should reflecte UTR 21.

  • Copying

    This extension module is copyrighted free software by Yoshida Masato.

    You can redistribute it and/or modify it under the same term as Ruby.

  • Author

    Yoshida Masato <yoshidam@yoshidam.net>

  • History

    Feb 26, 2010 version 0.3.0 fix a capitalize bug and support SpecialCasing Dec 29, 2009 version 0.2.0 update for Ruby 1.9.1 and Unicode 5.2 Sep 10, 2005 version 0.1.2 update unidata.map for Unicode 4.1.0 Aug 26, 2004 version 0.1.1 update unidata.map for Unicode 4.0.1 Nov 23, 1999 version 0.1

[Validate]