維漢英混排文檔識(shí)別
Uyghur, Chinese and English Multilingual Document Recognition
-
摘要: 維、漢、英是特點(diǎn)完全不同的文字。該文依據(jù)多層次語(yǔ)言判斷和適當(dāng)干預(yù)的多語(yǔ)言字符識(shí)別系統(tǒng)設(shè)計(jì)原則首次實(shí)現(xiàn)了維、漢、英混排文本識(shí)別系統(tǒng)。識(shí)別系統(tǒng)首先根據(jù)維、漢、英文字的各自特點(diǎn)實(shí)現(xiàn)字符塊語(yǔ)言屬性的初步判斷,然后針對(duì)每種文字設(shè)計(jì)不同的字符切割算法。字符識(shí)別可信度用來(lái)判斷字符語(yǔ)言屬性和字符切分結(jié)果是否正確。實(shí)驗(yàn)結(jié)果表明,各種維、漢、英混排文本識(shí)別率達(dá)到96.4%以上。
-
關(guān)鍵詞:
- 混排文本識(shí)別;字符切割;字符識(shí)別;維吾爾文
Abstract: The characteristics of Uyghur, Chinese and English scripts are totally different. A Uyghur, Chinese and English multilingual document recognition system is implemented the first time based on the multilingual OCR system design principle, which includes multi-layer character language estimation and suitable adjustment. At first, the language property of each text block is estimated according to the characteristics of Uyghur, Chinese and English scripts. After that, language-oriented character segmentation algorithms are performed on text blocks, and the character recognition confidence is used to judge whether the results of character segmentation and language property estimation of a text block are right. Experimental results show the recognition accuracy of Uyghur, Chinese and English multilingual documents achieves 96.4% and above. -
計(jì)量
- 文章訪問(wèn)數(shù): 2303
- HTML全文瀏覽量: 145
- PDF下載量: 1097
- 被引次數(shù): 0