K-最近鄰分類技術(shù)的改進算法
An Improved K-Nearest Neighbor Algorithm
-
摘要: 該文提出了一種改進的K-最近鄰分類算法。該算法首先將訓練事例集中的每一類樣本進行聚類,既減小了訓練事例集的數(shù)據(jù)量,又去除了孤立點,大大提高了算法的快速性和預測精度,從而使該算法適用于海量數(shù)據(jù)集的情況。同時,在算法中根據(jù)每個屬性對分類貢獻的大小,采用神經(jīng)網(wǎng)絡計算其權(quán)重,將這些屬性權(quán)重用在最近鄰計算中,從而提高了算法的分類精度。在幾個標準數(shù)據(jù)庫和實際數(shù)據(jù)庫上的實驗結(jié)果表明,該算法適合于對復雜而數(shù)據(jù)量比較大的數(shù)據(jù)庫進行分類。
-
關(guān)鍵詞:
- K-最近鄰; 聚類; 權(quán)值調(diào)整; 分類
Abstract: This paper presents a improved K-NN algorithm. The CURE clustering is carried out to select the subset of the training set. It can reduce the volume of the training set and omit the outlier. Therefore it can lead both to computational efficiency and to higher classification accuracy. In the algorithm, the weights of each feature are learned using neural network. The feature weights are used in the nearest measure computation such that the important features contribute more in the nearest measure. Experiments on several UCI databases and practical data sets show the efficiency of the algorithm. -
Shin C, Yun U, Kim H, Park S. A hybrid approach of neural network and memory-based learning to data mining[J].IEEE Trans. on Neural Networks.2000, 11(3):637-[2]Wettschereck D, Aha D W, Mohri T. A review and empirical evaluation of feature weighting metbords for a class of lazy learning algorithms. AI Review, 1997, 11 (2): 273 - 314.[3]范明,孟小峰.數(shù)據(jù)挖掘概念與技術(shù),北京:機械工業(yè)出版社,2001,第七章第七節(jié).[4]Kuncheva L I. Fitness Functions in Editing k-nn Reference Set by Genetic Algorithms[J].Pattern Recognition.1997, 30(6):1041-[5]Setiono R, Liu H. Neural-network feature selector. IEEE Trans.on Neural Networks, 1997 8(3): 654 - 662.[6]Guha S, Rastugi R, Shim K. CURE: An efficient clustering algorithm for large databases. In Proc. 1998 ACM-SIGMOD Int.Conf. Management of Data (SIGMOD98), Seattle, WA, June 1998:73 - 84.[7]Pemg C, Wang H, Zhang S, parker D. Landmarks: A new model for similarity-based pattern querying in time series databases.IEEE Conf. on Data Engineering, 2000:33 - 44.[8]Quinlan J R. C4.5: Programs for Machine Learning. San Mateo,CA: Morgan Kaufmann, 1993, Chapter 3. -
計量
- 文章訪問數(shù): 2428
- HTML全文瀏覽量: 114
- PDF下載量: 1152
- 被引次數(shù): 0