基于影響函數(shù)的k-近鄰分類
doi: 10.11999/JEIT141433
基金項目:
國家自然科學(xué)基金(61170223)和河南省教育廳科學(xué)技術(shù)研究重點項目(14A520016)資助課題
k-nearest Neighbor Classification Based on Influence Function
-
摘要: 分類是一種監(jiān)督學(xué)習(xí)方法,通過在訓(xùn)練數(shù)據(jù)集學(xué)習(xí)模型判定未知樣本的類標(biāo)號。與傳統(tǒng)的分類思想不同,該文從影響函數(shù)的角度理解分類,即從訓(xùn)練樣本集對未知樣本的影響來判定未知樣本的類標(biāo)號。首先介紹基于影響函數(shù)分類的思想;其次給出影響函數(shù)的定義,設(shè)計3種影響函數(shù);最后基于這3種影響函數(shù),提出基于影響函數(shù)的k-近鄰(kNN)分類方法。并將該方法應(yīng)用到非平衡數(shù)據(jù)集分類中。在18個UCI數(shù)據(jù)集上的實驗結(jié)果表明,基于影響函數(shù)的k-近鄰分類方法的分類性能好于傳統(tǒng)的k-近鄰分類方法,且對非平衡數(shù)據(jù)集分類有效。
-
關(guān)鍵詞:
- 數(shù)據(jù)挖掘 /
- 監(jiān)督學(xué)習(xí) /
- 非平衡數(shù)據(jù)集分類 /
- 影響函數(shù) /
- k-近鄰
Abstract: Classification is a supervised learning. It determines the class label of an unlabeled instance by learning model based on the training dataset. Unlike traditional classification, this paper views classification problem from another perspective, that is influential function. That is, the class label of an unlabeled instance is determined by the influence of the training data set. Firstly, the idea of classification is introduced based on influence function. Secondly, the definition of influence function is given and three influence functions are designed. Finally, this paper proposes k-nearest neighbor classification method based on these three influence functions and applies it to the classification of imbalanced data sets. The experimental results on 18 UCI data sets show that the proposed method improves effectively the k-nearest neighbor generalization ability. Besides, the proposed method is effective for imbalanced classification. -
Tan P N and Steinbach M著, 范明, 范宏建, 譯. 數(shù)據(jù)挖掘入門[M]. 第2版, 北京: 人民郵電出版社, 2011: 127-187. Quinlan J S. Induction of decision trees[J]. Machine Learning, 1986, 1(1): 81-106. Domingos P and Pazzani M J. Beyond independence: conditions for the optimality of the simple bayesian classifier[C].?Proceedings of the International Conference on Machine Learning, Bari, Italy, 1996: 105-112. Rumelhart D E, Hinton G E, and Williams R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(9): 533-536. Boser B E,?Guyon I M, and Vapnik V N.?A training algorithm for optimal margin classifiers[C].?Proceedings of the Conference on Learning Theory, Pittsburgh, USA, 1992: 144-152. Dasarathy B V. Nearest Neighbor (NN) norms: NN Pattern Classification Techniques[M]. Michigan: IEEE Computer Society Press, 1991: 64-85. Leake D B.?Experience, introspection and expertise: learning to refine the case-based reasoning process[J].?Journal of Experimental Theoretical Artificial Intelligent, 1996, 8(3/4): 319-339. Hinneburg A and Keim D A. An efficient approach to clustering in large multimedia databases with noise[C]. Proceedings of the Knowledge Discovery and Data Mining, New York, USA, 1998: 58-65. html. 2014.5. Liu X Y, Li Q Q, and Zhou Z H. Learning imbalanced multi-class data with optimal dichotomy weights[C]. Proceedings of the 13th IEEE International Conference on Data Mining, Dallas, USA, 2013: 478-487. He H B and Edwardo A G. Learning from imbalanced Data [J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284. Maratea A, Petrosino A, and Manzo M. Adjusted F-measure and kernel scaling for imbalanced data learning[J]. Information Sciences, 2014(257): 331-341. Wang S and Yao X. Multiclass imbalance problems: analysis and potential solutions[J]. IEEE Transactions on Systems, Man and Cybernetics, Part B, 2012, 42(4): 1119-1130. Lin M, Tang K, and Yao X. Dynamic sampling approach to training neural networks for multiclass imbalance classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(4): 647-660. Peng L Z, Zhang H L, Yang B, et al.. A new approach for imbalanced data classification based on data gravitation[J]. Information Sciences, 2014(288): 347-373. Menardi G and Torelli N. Training and assessing classification rules with imbalanced data[J]. Data Mining and Knowledge Discovery, 2014, 28(1): 92-122. -
計量
- 文章訪問數(shù): 1437
- HTML全文瀏覽量: 154
- PDF下載量: 573
- 被引次數(shù): 0