一種新的數(shù)據(jù)流模糊聚類方法
doi: 10.11999/JEIT141415
-
2.
(南京郵電大學(xué)計算機學(xué)院 南京 210003) ②(南京郵電大學(xué)江蘇省無線傳感網(wǎng)高技術(shù)研究重點實驗室 南京 210003)
基金項目:
國家自然科學(xué)基金(61171053, 61300239),教育部博士點基金(20113223110002),中國博士后科學(xué)基金(2014M551635)和江蘇省博士后科研資助計劃項目(1302085B)資助課題
New Fuzzy-Clustering Algorithm for Data Stream
-
2.
(College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, China)
-
摘要: 針對數(shù)據(jù)流上的聚類任務(wù)受到時間、空間限制等問題,該文提出一種基于權(quán)值衰減的數(shù)據(jù)流模糊微簇聚類算法(WDSMC)。該算法使用改進的帶權(quán)值的模糊C均值算法進行處理,并采用微簇結(jié)構(gòu)和權(quán)值時間衰減結(jié)構(gòu)提高聚類質(zhì)量。實驗表明,相對于現(xiàn)有的數(shù)據(jù)流加權(quán)模糊C均值聚類(SWFCM)算法和StreamKM++算法而言,WDSMC算法具有更好的聚類精度。
-
關(guān)鍵詞:
- 數(shù)據(jù)挖掘 /
- 數(shù)據(jù)流 /
- 模糊C均值聚類 /
- 權(quán)值衰減 /
- 微簇聚類
Abstract: There is a great challenge in the data stream clustering due to a limitation of time and space. In order to solve this problem, a new fuzzy-clustering algorithm, called Weight Decay Streaming Micro Clustering (WDSMC), is presented in this paper. The algorithm uses a reformed weighted Fuzzy C-Means (FCM) algorithm, and improves the quality of clustering by the structures of micro-clusters and weight-decay. Experimental results show that this algorithm has better accuracy than Stream Weight Fuzzy C-Means (SWFCM) and StreamKM++ algorithm.-
Key words:
- Data mining /
- Data stream /
- Fuzzy C-Means (FCM) /
- Weight decay /
- Micro-clustering
-
Jonathan A S, Elaine R F, Rodrigo C B, et al.. Data stream clustering: a survey[J]. ACM Computing Surveys, 2013, 46(1):13:1-13:31. Shifei D, Fulin W, Jun Q, et al.. Research on data stream clustering algorithms[J]. Artificial Intelligence Review, 2013, 43(4): 593-600. Tian Z, Raghu R, and Miron L. BIRCH: an efficient data clustering method for very large databases[C]. Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, USA, 1996: 103-114. Aggarwal C C, Han J, and Yu P S. A framework for clustering evolving data streams[C]. Proceedings of the 29th Conference on Very Large Data Bases, Berlin, Germany, 2003: 81-92. Chen Y and Tu L. Density-based clustering for real-time stream data[C]. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, 2007: 133-142. Cao F, Ester M, Qian W, et al.. Density-based clustering over an evolving data stream with noise[C]. Proceedings of the 16th SIAM International Conference on Data Mining, Maryland, USA, 2006: 328-339. Ackermann M R, M?rtens M, Raupach C, et al.. StreamKM ++: a clustering algorithm for data streams[J]. Journal of Experimental Algorithmics, 2012, 17(1): 2-4. Arthur D and Vassilvitskii S. K-means++: the advantages of careful seeding[C]. Proceedings of the 2007 ACM-SIAM Symposium on Discrete Algorithm, New Orleans, USA, 2007: 1027-1035. Baraldi A and Blonda P. A survey of fuzzy clustering algorithms for pattern recognition[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 1999, 29(6): 778-785. Renxia W, Xiaoya Y, and Xiaoke S. A weighted fuzzy clustering algorithm for data stream[C]. Proceedings of the 2008 ISECS International Colloquium on Computing, Communication, Control, and Management, Guangzhou, China, 2008: 360-364. 郭躬德, 李南, 陳黎飛. 一種基于混合模型的數(shù)據(jù)流概念漂移檢測算法[J]. 計算機研究與發(fā)展, 2014, 51(4): 731-742. Guo Gong-de, Li Nan, and Chen Li-fei. Concept drift detection for data stream based on mixture model[J]. Journal of Computer Research and Development, 2014, 51(4): 731-742. 胡偉. 一種改進的動態(tài)k-均值聚類算法[J]. 計算機系統(tǒng)應(yīng)用, 2013, 22(5): 116-121. Hu Wei. Research and realization of a web information extraction and knowledge presentation system[J]. Application of Computer System, 2013, 22(5): 116-121. 李子柳. 大數(shù)據(jù)實時流式聚類框架研究[D]. [碩士論文], 中山大學(xué), 2013. Li Zi-liu. A framework for real time stream clustering of big data[D]. [Master dissertation], Sun Yat-sen University, 2013. Hossein M K, Suhaimi I, and Javad H. Outlier detection in stream data by clustering method[J]. International Journal of Advanced Computer Science and Information Technology, 2013, 2(3): 25-34. Jiawei H, Micheline K, Jian P. 范明, 孟小峰. 數(shù)據(jù)挖掘: 概念與技術(shù)[M]. 第3版, 北京: 機械工業(yè)出版社, 2012: 323-350. David Aha. UCI Machine Learning Repository[OL]. https:// archive.ics.uci.edu/ml, 2014. 史峰, 王輝, 郁磊, 等. Matlab智能算法: 30個案例分析[M]. 北京: 北京航天航空大學(xué)出版社, 2011: 188-196. -
計量
- 文章訪問數(shù): 1363
- HTML全文瀏覽量: 143
- PDF下載量: 538
- 被引次數(shù): 0