一種基于SVM/RS的中文機(jī)構(gòu)名稱自動(dòng)識(shí)別方法
A Method of Automatic Recognition for Chinese Organization Name Based on SVM/RS
-
摘要: 該文提出一種支持向量機(jī)(Support Vector Machines,SVM)和粗糙集(Rough Set, RS)相結(jié)合的中文機(jī)構(gòu)名稱短語識(shí)別方法。該方法借助詞的基本語義搭配關(guān)系表示短語的構(gòu)成規(guī)則,并通過粗糙集屬性約簡的方法自動(dòng)學(xué)習(xí)到機(jī)構(gòu)名稱構(gòu)成規(guī)則的無冗余集。識(shí)別時(shí),首先尋找到與這些規(guī)則匹配的詞串作為候選機(jī)構(gòu)名,然后結(jié)合候選機(jī)構(gòu)名以及其上下文詞的語義特征,利用SVM分類器判斷該候選是否是真正的機(jī)構(gòu)名稱。這種方法對(duì)1617萬字人民日報(bào)語料開放測試的F值分別達(dá)到82.06%。
-
關(guān)鍵詞:
- 模式識(shí)別;SVM;特征選擇;語義;粗糙集;語義搭配
Abstract: A method to identify Chinese organization names by utilizing SVM (Support Vector Machines) and RS (Rough Set) is provided. Forming rule of organization name is defined based on semanteme collocation relation, and then the un-redundancy set of rough forming rules can be learned by employing attribute reduction in RS automatically. A chain of words matching forming rule is selected first as candidate, then a SVM classifier discern whether a candidate is real organization name according to candidate semanteme and its contextual semanteme while recognizing. Results of open testing achieve F-measure 82.06% in 16.17 million words news based on this project separately. -
計(jì)量
- 文章訪問數(shù): 2657
- HTML全文瀏覽量: 110
- PDF下載量: 1588
- 被引次數(shù): 0