簇間可分的魯棒模糊C均值聚類算法
doi: 10.11999/JEIT180604
-
1.
廈門大學(xué)航空航天學(xué)院 ??廈門 ??361102
-
2.
集美大學(xué)信息工程學(xué)院 ??廈門 ??361021
Robust Fuzzy C-means Clustering Algorithm Integrating Between-cluster Information
-
1.
School of Aerospace Engineering, Xiamen University, Xiamen 361102, China
-
2.
Information Engineering College, Jimei University, Xiamen 361021, China
-
摘要:
與經(jīng)典的K均值聚類算法相比,模糊C均值(FCM)聚類算法通過引入模糊因子,考慮不同聚類數(shù)據(jù)簇之間的相互關(guān)系,得到可分性更好的聚類結(jié)果。但是模糊因子的引入,使得任意一個樣本點都存在模糊性,造成FCM極易受到噪聲和離群點的影響,聚類結(jié)果泛化性能較差。因此,該文提出一種簇間可分的魯棒FCM算法(RBI-FCM)。RBI-FCM利用K均值算法對模糊隸屬度的稀疏特征,降低不同數(shù)據(jù)簇之間的相互作用,突出不同數(shù)據(jù)簇相鄰區(qū)域的可分性;另外,RBI-FCM在極小化數(shù)據(jù)簇內(nèi)部散布度的條件下,考慮不同數(shù)據(jù)簇之間的可分性,可提高聚類模型的泛化性能。該文設(shè)計了有效的模型求解迭代算法。實驗結(jié)果表明,RBI-FCM算法提高了FCM的魯棒性,有效降低FCM對數(shù)據(jù)簇分布差異性和抽樣不均衡的敏感性,得到理想的聚類結(jié)果。
Abstract:Comparing with K-means, Fuzzy logic is introduced in Fuzzy C-Means to handle the information between clusters. It can obtain better cluster results. However, fuzzy logic makes observations could belong to more than just one cluster, which results FCM is especially sensitivity to the noisy and outlier and has poor generalization performance. So a Rrobust Fuzzy C-Means clustering integrated Between-cluster Information algorithm (RBI-FCM) is proposed. Taking advantage of the sparsity of K-means, RBI-FCM helps to reduce the interactions among different clusters and improve the separability of sample points which locate in the adjacent domains of different clusters. Beside minimizing the inner-cluster scattering condition, RBI-FCM considers the between-cluster information. The generalization performance of RBI-FCM can be improved. An effective iterative algorithm for solving the model is designed in this paper. The experimental results show that the RBI-FCM improves the robustness of FCM and reduce effectively its sensitivity to size-imbalance and differences on the distribution of clusters of FCM. The great clustering result is obtained.
-
Key words:
- Clustering /
- Fuzzy C-Means (FCM) /
- Sample distribution /
- Between-cluster information
-
表 1 實驗1:人造樣本數(shù)據(jù)集主要參數(shù)
樣本集 類中心 協(xié)方差矩陣 各類樣本數(shù) 1 (5, 5), (15, 15) [1 0; 0 1], [1 0; 0 1] 50, 50 2 (5, 5), (15, 15) [1 0; 0 1], [2 0; 0 2] 50, 50 $\vdots $ $\vdots $ $\vdots $ $\vdots $ 10 (5, 5), (15, 15) [1 0; 0 1], [10 0; 0 10] 50, 50 下載: 導(dǎo)出CSV
表 2 實驗2:人造樣本數(shù)據(jù)集主要參數(shù)
樣本集 樣本隨機分布的圓心 各類樣本數(shù) 1 (5, 5), (15, 15) 50, 50 2 (5, 5), (15, 15) 50, 51 $\vdots $ $\vdots $ $\vdots $ $\vdots $ 151 (5, 5), (15, 15) 50, 200 下載: 導(dǎo)出CSV
表 3 UCI數(shù)據(jù)集聚類實驗的NMI正確率和RI正確率
UCI數(shù)據(jù)集 FCM PFCM GIFP-FCM RBI-FCM UCI數(shù)據(jù)集 FCM PFCM GIFP-FCM RBI-FCM Auto-mgp 0.5190 0.5167 0.5008 0.5443 Wine 0.4169 0.4168 0.3946 0.4911 0.7534 0.7537 0.7505 0.7895 0.7104 0.7105 0.6700 0.7287 Zoo 0.6760 0.6824 0.6284 0.6873 Balance Scale 0.1223 0.1232 0.1293 0.1326 0.8381 0.8400 0.8236 0.8464 0.5887 0.5900 0.5806 0.5947 Parkinsons 0.0926 0.0936 0.0526 0.1071 House Votes 0.4743 0.4743 0.2917 0.4948 0.5934 0.5934 0.5693 0.6266 0.7752 0.7752 0.6688 0.7890 Credit Approval 0.0304 0.0304 0.0365 0.1020 Vowel 0.3019 0.3127 0.3357 0.3737 0.5048 0.5048 0.5207 0.5448 0.7755 0.7988 0.8275 0.8153 Banknote Authentication 0.0292 0.0292 0.1145 0.5249 Mammographic Masses 0.1054 0.1065 0.1020 0.1130 0.5236 0.5236 0.5555 0.8053 0.5676 0.5683 0.5524 0.5746 注:每個數(shù)據(jù)集實驗結(jié)果的第1行為NMI正確率,第2行為RI正確率 下載: 導(dǎo)出CSV
-
陳新泉, 周靈晶, 劉耀中. 聚類算法研究綜述[J]. 集成技術(shù), 2017, 6(3): 41–49. doi: 10.3969/j.issn.2095-3135.2017.03.004CHEN Xinquan, ZHOU Lingjing, and LIU Yaozhong. Review on clustering algorithms[J]. Journal of Integrati on Technology, 2017, 6(3): 41–49. doi: 10.3969/j.issn.2095-3135.2017.03.004 張傳錦, 李璐璐. 基于模糊C均值聚類的無線傳感器網(wǎng)絡(luò)節(jié)點定位算法[J]. 電子設(shè)計工程, 2016, 24(8): 58–60. doi: 10.14022/j.cnki.dzsjgc.2016.08.017ZHANG Chuanjin and LI Lulu. Improving multilateration algorithm based on fuzzy C-means cluster in WSN[J]. Electronic Design Engineering, 2016, 24(8): 58–60. doi: 10.14022/j.cnki.dzsjgc.2016.08.017 池桂英, 王忠華. 基于分層的直覺模糊C均值聚類圖像分割算法[J]. 計算機工程與設(shè)計, 2017(12): 3368–3373. doi: 10.16208/j.issn1000-7024.2017.12.031CHI Guiying and WANG Zhonghua. Intuitionistic fuzzy C-means clustering algorithm based on hierarchy for image segmentation[J]. Computer Engineering and Design, 2017(12): 3368–3373. doi: 10.16208/j.issn1000-7024.2017.12.031 黃艷國, 羅云鵬. 基于改進模糊C均值聚類算法的城市道路狀態(tài)判別方法[J]. 科學(xué)技術(shù)與工程, 2018, 18(9): 335–342. doi: 10.3969/j.issn.1671-1815.2018.09.052HUANG Yanguo and LUO Yunpeng. Identification method of urban road condition based on improved fuzzy C-means method clustering algorithm[J]. Science Technology and Engineering, 2018, 18(9): 335–342. doi: 10.3969/j.issn.1671-1815.2018.09.052 趙泉華, 劉曉燕, 趙雪梅, 等. 基于可變類FCM算法的多光譜遙感影像分割[J]. 電子與信息學(xué)報, 2018, 40(1): 157–165. doi: 10.11999/JEIT170397ZHAO Quanhua, LIU Xiaoyan, ZHAO Xuemei, et al. Multispectral remote sensing image segmentation based on FCM algorithm with unknown number of clusters[J]. Journal of Electronics &Information Technology, 2018, 40(1): 157–165. doi: 10.11999/JEIT170397 XU Rui and WUNSCH D. Survey of clustering algorithms[J]. IEEE Transactions on Neural Networks, 2005, 16(3): 645–678. doi: 10.1109/tnn.2005.845141 陳海鵬, 申鉉京, 龍建武, 等. 自動確定聚類個數(shù)的模糊聚類算法[J]. 電子學(xué)報, 2017, 45(3): 687–694. doi: 10.3969/j.issn.0372-2112.2017.03.028CHEN Haipeng, SHEN Xuanjing, LONG Jianwu, et al. Fuzzy clustering algorithm for automatic identification of clusters[J]. Acta Electronica Sinica, 2017, 45(3): 687–694. doi: 10.3969/j.issn.0372-2112.2017.03.028 YANG MiinShen and NATALIANI Y. Robust-learning fuzzy c-means clustering algorithm with unknown number of clusters[J]. Pattern Recognition, 2017, 71: 45–59. doi: 10.1109/nafips.2010.5548175 PAL N R, PAL K, KELLER J M, et al. A possibilistic fuzzy C-means clustering algorithm[J]. IEEE Transactions on Fuzzy Systems, 2005, 13(4): 517–530. doi: 10.1109/tfuzz.2004.840099 肖滿生, 肖哲, 文志誠, 等. 一種空間相關(guān)性與隸屬度平滑的FCM改進算法[J]. 電子與信息學(xué)報, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710XIAO Mansheng, XIAO Zhe, WEN Zhicheng, et al. Improved FCM clustering algorithm based on spatial correlation and membership smoothing[J]. Journal of Electronics &Information Technology, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710 LIU Yun, HOU Tao, and LIU Fu. Improving fuzzy c-means method for unbalanced dataset[J]. Electronics Letters, 2015, 51(23): 1880–1882. doi: 10.1049/el.2015.1541 史慧峰, 馬曉寧. 一種自適應(yīng)的模糊C均值聚類算法[J]. 無線通信技術(shù), 2016, 25(3): 40–45. doi: 10.3969/j.issn.1003-8329.2016.03.009SHI Huifeng and MA Xiaoning. An adaptive fuzzy C-means clustering algorithm[J]. Wireless Communication Technology, 2016, 25(3): 40–45. doi: 10.3969/j.issn.1003-8329.2016.03.009 曲福恒. 模糊聚類算法及應(yīng)用[M]. 北京: 國防工業(yè)出版社, 2011.QU Fuheng. Fuzzy clustering algorithm and its application[M]. Beijing, National Defense Industry Press, 2011. DUNN J C. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters[J]. Journal of Cybernetics, 1974, 3(3): 32–57. doi: 10.1080/01969727308546046 BEZDEK J C. Pattern Recognition with Fuzzy Objective Function Algorithms[J]. Springer US, 1981. doi: 10.1007/978-1-4757-0450-1 ZHU Lin, CHUNG FuLai, and WANG Shitong. Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions[J]. IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics A, 2009, 39(3): 578–591. doi: 10.3724/sp.j.1087.2013.02355 H?PPNER F and KLAWONN F. Improved fuzzy partitions for fuzzy regression models[J]. International Journal of Approximate Reasoning, 2003, 32(2): 85–102. doi: 10.1016/s0888-613x(02)00078-6 DENG Zhaohong, CHOI K S, CHUNG Fulai, et al. Enhanced soft subspace clustering integrating within-cluster and between-cluster information[J]. Pattern Recognition, 2010, 43(3): 767–781. doi: 10.1016/j.patcog.2009.09.010 -