Tesseract  3.02
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
cjkpitch.h
Go to the documentation of this file.
1 
2 // File: cjkpitch.h
3 // Description: Code to determine fixed pitchness and the pitch if fixed,
4 // for CJK text.
5 // Copyright 2011 Google Inc. All Rights Reserved.
6 // Author: takenaka@google.com (Hiroshi Takenaka)
7 // Created: Mon Jun 27 12:48:35 JST 2011
8 //
9 // Licensed under the Apache License, Version 2.0 (the "License");
10 // you may not use this file except in compliance with the License.
11 // You may obtain a copy of the License at
12 // http://www.apache.org/licenses/LICENSE-2.0
13 // Unless required by applicable law or agreed to in writing, software
14 // distributed under the License is distributed on an "AS IS" BASIS,
15 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16 // See the License for the specific language governing permissions and
17 // limitations under the License.
18 //
20 #ifndef CJKPITCH_H_
21 #define CJKPITCH_H_
22 
23 #include "blobbox.h"
24 #include "notdll.h"
25 
26 // Function to test "fixed-pitchness" of the input text and estimating
27 // character pitch parameters for it, based on CJK fixed-pitch layout
28 // model.
29 //
30 // This function assumes that a fixed-pitch CJK text has following
31 // characteristics:
32 //
33 // - Most glyphs are designed to fit within the same sized square
34 // (imaginary body). Also they are aligned to the center of their
35 // imaginary bodies.
36 // - The imaginary body is always a regular rectangle.
37 // - There may be some extra space between character bodies
38 // (tracking).
39 // - There may be some extra space after punctuations.
40 // - The text is *not* space-delimited. Thus spaces are rare.
41 // - Character may consists of multiple unconnected blobs.
42 //
43 // And the function works in two passes. On pass 1, it looks for such
44 // "good" blobs that has the pitch same pitch on the both side and
45 // looks like a complete CJK character. Then estimates the character
46 // pitch for every row, based on those good blobs. If we couldn't find
47 // enough good blobs for a row, then the pitch is estimated from other
48 // rows with similar character height instead.
49 //
50 // Pass 2 is an iterative process to fit the blobs into fixed-pitch
51 // character cells. Once we have estimated the character pitch, blobs
52 // that are almost as large as the pitch can be considered to be
53 // complete characters. And once we know that some characters are
54 // complete characters, we can estimate the region occupied by its
55 // neighbors. And so on.
56 //
57 // We repeat the process until all ambiguities are resolved. Then make
58 // the final decision about fixed-pitchness of each row and compute
59 // pitch and spacing parameters.
60 //
61 // (If a row is considered to be propotional, pitch_decision for the
62 // row is set to PITCH_CORR_PROP and the later phase
63 // (i.e. Textord::to_spacing()) should determine its spacing
64 // parameters)
65 //
66 // This function doesn't provide all information required by
67 // fixed_pitch_words() and the rows need to be processed with
68 // make_prop_words() even if they are fixed pitched.
69 void compute_fixed_pitch_cjk(ICOORD page_tr, // top right
70  TO_BLOCK_LIST *port_blocks); // input list
71 
72 #endif // CJKPITCH_H_