低數(shù)據(jù)資源條件下基于結(jié)構(gòu)信息共享的無切分維文文檔識(shí)別字符建模
doi: 10.11999/JEIT150019
基金項(xiàng)目:
國家自然科學(xué)基金(61032008)和國家973計(jì)劃項(xiàng)目(2013CB329403)
Uyghur Character Models with Shared Structure Information for Segmentation-free Recognition under Low Data Resource Conditions
-
摘要: 無切分維吾爾文文檔識(shí)別技術(shù)能夠有效避免字符切分錯(cuò)誤,但是對(duì)于低數(shù)據(jù)資源的新樣本類型,原有模型往往難以獲得較高的識(shí)別性能。為此,該文提出共享常用維文字體間相對(duì)穩(wěn)定的字符結(jié)構(gòu)信息,并用Bootstrap方法提高樣本利用效率的解決方法。通過在實(shí)際書籍樣本上的實(shí)驗(yàn)表明,僅利用規(guī)模約原始訓(xùn)練樣本1/5的新類型樣本,該方法在測試集上的平均字符識(shí)別準(zhǔn)確率就可以達(dá)到95.05%;而與常用的最大后驗(yàn)概率估計(jì)方法相比,也能使識(shí)別錯(cuò)誤率相對(duì)降低55.76%~63.84%。因此,該方法能夠有效解決低數(shù)據(jù)資源條件下的維文字符建模問題,實(shí)現(xiàn)對(duì)新樣本類型的高性能識(shí)別。
-
關(guān)鍵詞:
- 文字識(shí)別 /
- 隱馬爾可夫模型 /
- 統(tǒng)計(jì)學(xué)習(xí) /
- 維吾爾文
Abstract: Although segmentation-free Uyghur character document recognition can efficiently avoid character segmentation error, it does not work well on low-resource new-type samples. This paper suggests sharing stable character structure among different Uyghur fonts, and improves the efficiency of utilizing samples through Bootstrap. Experiments are made on new-type book samples, which contains only 1/5 training sample amount than the original. The average character recognition accuracy of the proposed method on test samples is 95.05%, and has 55.76%~63.84% recognition error rate relative decrease than the one of Maximum A Posteriori (MAP) method. Therefore, the proposed method can accomplish accurate Uyghur character model training under low data resource conditions. -
錢彥旻. 低數(shù)據(jù)資源條件下的語音識(shí)別技術(shù)新方法研究[D]. [博士論文], 清華大學(xué), 2012: 67-85. Qian Yan-min. Study on new speech recognition technology under low data resource conditions[D]. [Ph.D. dissertation], Tsinghua University, 2012: 67-85. 錢彥旻, 劉加. 低數(shù)據(jù)資源條件下基于優(yōu)化的數(shù)據(jù)選擇策略的無監(jiān)督語音識(shí)別聲學(xué)建模[J]. 清華大學(xué)學(xué)報(bào)(自然科學(xué)版), 2013, 53(7): 1001-1004. Qian Yan-min and Liu Jia. Optimized data selection strategy based unsupervised acoustic modeling for low data resource speech recognition[J]. Journal of Tsinghua University (Science and Technology), 2013, 53(7): 1001-1004. Gunter S and Bunke H. Optimizing the number of states, training iterations and Gaussians in an HMM-based handwritten word recognizer[C]. 7th International Conference on Document Analysis and Recognition (ICDAR), Edinburgh, Scotland, UK, 2003: 472-476. Geiger J, Schenk J, Wallhoff F, et al.. Optimizing the number of states for HMM-based on-line handwritten whiteboard recognition[C]. 12th International Conference on Frontiers in Handwriting Recognition (ICFHR), Kolkata, India, 2010: 107-112. Qing H, Chan C, and Chin-Hui L. Bayesian learning of the SCHMM parameters for speech recognition[C]. IEEE 19th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Adelaide, USA, 1994, I: 221-224. Leggetter C J and Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models[J]. Computer Speech Language, 1995, 9(2): 171-185. 劉杰. 序列模型中的遷移學(xué)習(xí)研究[D]. [博士論文], 南開大學(xué)計(jì)算機(jī)與控制工程學(xué)院, 2008: 66-89. Liu Jie. Research on transfer learning on sequence model[D]. [Ph.D. dissertation], Nankai University, 2008: 66-89. Ait-Mohand K, Paquet T, and Ragot N. Combining structure and parameter adaptation of HMMs for printed text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(9): 1716-1732. Ait-Mohand K, Paquet T, Ragot N, et al.. Structure adaptation of HMM applied to OCR[C]. 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 2010: 2877-2880. Jiang Zhi-wei, Ding Xiao-qing, Peng Liang-rui, et al.. Analyzing the information entropy of states to optimize the number of states in an HMM-based off-line handwritten Arabic word recognizer[C]. 21st International Conference on Pattern Recognition, Tsukuba, Japan, 2012: 697-700. 王歡良, 韓紀(jì)慶, 鄭鐵然. 高斯混合分布之間K-L散度的近似計(jì)算[J]. 自動(dòng)化學(xué)報(bào), 2008, 34(5): 529-534. Wang Huan-liang, Han Ji-qing, and Zheng Tie-ran. Approximation of Kullback-Leibler divergence between two Gaussian mixture distributions[J]. Acta Automatica Sinica, 2008, 34(5): 529-534. Bicego M, Murino V, and Figueiredo M A T. A sequential pruning strategy for the selection of the number of states in hidden Markov models[J]. Pattern Recognition Letters, 2003, 24(9): 1395-1407. Seymore K, McCallum A, and Rosenfeld R. Learning hidden Markov model structure for information extraction[C]. AAAI-99 Workshop on Machine Learning for Information Extraction, Orlando, USA, 1999: 37-42. Jiang Zhi-wei, Ding Xiao-qing, Peng Liang-rui, et al.. Modified bootstrap approach with state number optimization for hidden Markov model estimation in small-size printed Arabic text-line recognition[C]. 10th International Conference on Machine Learning and Data Mining in Pattern Recognition, St. Petersburg, Russia, 2014: 437-441. Young S, Evermann G, Gales M, et al.. The HTK Book (for HTK Version 3.4)[M]. Cambridge, UK, Cambridge University Engineering Department, 2009: 97-147. Al-Hajj M R, Likforman-Sulem L, and Mokbel C. Combining slanted-frame classifiers for improved HMM-based Arabic handwriting recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(7): 1165-1177. -
計(jì)量
- 文章訪問數(shù): 1216
- HTML全文瀏覽量: 123
- PDF下載量: 446
- 被引次數(shù): 0