一種基于屬性空間相似性的模糊聚類算法
doi: 10.11999/JEIT180974
-
上海海事大學(xué) 上海 201306
A Novel Fuzzy Clustering Algorithm Based on Similarity of Attribute Space
-
Shanghai Maritime University, Shanghai 201306, China
-
摘要: 模糊C均值(FCM)聚類算法及其相關(guān)改進(jìn)算法基于最大模糊隸屬度原則確定聚類結(jié)果,沒有充分利用迭代后的模糊隸屬度矩陣和簇類中心的樣本屬性特征信息,影響聚類準(zhǔn)確度。針對(duì)這個(gè)問題,該文提出一種新的改進(jìn)思路:改進(jìn)FCM算法輸出定類原則。給出二元屬性拓?fù)渥涌臻g中屬性相似度的定義,最終提出一種基于屬性空間相似性的改進(jìn)FCM算法(FCM-SAS):首先,選擇FCM算法聚類后模糊隸屬度低于聚類置信度的樣本作為存疑樣本;然后,計(jì)算存疑樣本與聚類后聚類中心的屬性相似度;最后,基于最大屬性相似度原則更新存疑樣本的簇類標(biāo)簽。通過UCI數(shù)據(jù)集實(shí)驗(yàn),證明算法不僅有效,還較一些基于最大模糊隸屬度原則定類的改進(jìn)算法具有更優(yōu)的聚類評(píng)價(jià)指標(biāo)。
-
關(guān)鍵詞:
- 模糊C均值聚類 /
- 屬性拓?fù)渥涌臻g /
- 拓?fù)湎嗨贫?/a> /
- 聚類置信度 /
- 最大屬性相似度原則
Abstract: With the attribute feature information of the fuzzy membership matrix and cluster centers after the iteration not fully utilized, the results of Fuzzy C-Means (FCM) Clustering and related modified algorithms are determined based on the principle of maximum fuzzy membership, causing bad influence on the clustering accuracy. To solve this problem, the improvement ideas are proposed: to improve classification principle of FCM. The formula definition of attribute similarity in binary topological subspaces is given. Then, the improved FCM algorithm based on the Similarity of Attribute Space (FCM-SAS) is proposed: First, samples with fuzzy membership degree lower than the clustering reliability are selected as suspicious samples. Next, the attribute similarity between the suspicious samples and the cluster centers after clustering are calculated. Finally, cluster labels of suspicious samples based on the principle of maximum attribute similarity are updated. The validity and superiority of the proposed algorithm is verified by the UCI sample set experiments and comparisons with other modified algorithms based on the principle of maximum fuzzy membership. -
表 1 FCM-SAS算法具體步驟
輸入: 樣本集${\text{X}}$、樣本數(shù)$n$,聚類個(gè)數(shù)$c$、加權(quán)指數(shù)$m$、迭代閾
值$\varepsilon $、最大迭代次數(shù)T、聚類存疑率$\xi $、屬性占比率$\kappa $;輸出: 樣本標(biāo)簽集${{\text{X}}_l}'$; 1 按表1的傳統(tǒng)FCM算法步驟得到迭代后模糊隸屬度矩陣${\text{U}}$、簇類中心${\text{V}}$和樣本標(biāo)簽集${{\text{X}}_l}$,令$j = 0$; 2 計(jì)算所有樣本點(diǎn)${\text{x}}$的模糊隸屬度最大值,按遞增順序排序并組成數(shù)組,選出此數(shù)組中第$\left[ {\xi \times n} \right]$個(gè)元素的數(shù)值作為聚類置信度$\eta $; 3 令$j = j + 1$,判斷第$j$個(gè)樣本的模糊隸屬度$\max \left( {\left\{ {{u_{ij}}\left| {i = 1, 2, ·\!·\!· , c} \right.} \right\}} \right)$是否不大于聚類置信度$\eta $,若是則此樣本為存疑樣本,轉(zhuǎn)步驟4;否則轉(zhuǎn)步驟8; 4 按式(8)計(jì)算第$j$個(gè)樣本與各個(gè)簇類中心在2元屬性拓?fù)渥涌臻g中的拓?fù)湎嗨贫燃?\gamma \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)$,將所有集合中的元素取絕對(duì)值后按遞增的順序排序并組成數(shù)組,計(jì)算此數(shù)組中第$\left[ {n \times {\gamma _{dis}} \times \kappa } \right]$個(gè)元素與數(shù)值1之間差的絕對(duì)值作為鄰域半徑$\delta $; 5 以$\delta $為鄰域半徑,按式(10)計(jì)算第$j$個(gè)樣本與各個(gè)簇類中心的屬性相似度$\psi \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)$; 6 若最大屬性相似度$\max \left( {\psi \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)} \right)$只有一個(gè),則選出最大屬性相似度時(shí)的簇類中心所在的類別作為此樣本更新后的標(biāo)簽$x_{lj}'$,轉(zhuǎn)步驟8;否則,轉(zhuǎn)步驟7; 7 若最大屬性相似度不止一個(gè),則選擇這些簇類中最大拓?fù)湎嗨贫燃现?\widehat S = \max \left( {\Sigma \gamma \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)} \right)$時(shí)的簇類中心所在的類別作為$x_{lj}'$; 8 判斷$j < n$,若是則轉(zhuǎn)步驟3,否則輸出更新后的樣本標(biāo)簽集${{\text{X}}_l}'$。 下載: 導(dǎo)出CSV
表 2 算法輸入?yún)?shù)設(shè)置
參數(shù) 數(shù)值 加權(quán)指數(shù)$m$ 2 迭代閾值$\varepsilon $ ${10^{ - 3}}$ 最大迭代次數(shù)$T\;$ $100$ 聚類存疑率$\xi $ $0.3$ 屬性占比率$\kappa $ $0.5$ 下載: 導(dǎo)出CSV
表 3 UCI數(shù)據(jù)集的統(tǒng)計(jì)描述
數(shù)據(jù)集 樣本數(shù) 維數(shù) 簇類 各類占比 Iris 150 4 3 50:50:50 Wine 178 13 3 59:71:48 Seeds 210 7 3 70:70:70 Breast 683 9 2 444:239 Glass 214 9 6 70:17:76:13:9:29 下載: 導(dǎo)出CSV
表 4 UCI數(shù)據(jù)集聚類結(jié)果評(píng)價(jià)指標(biāo)對(duì)比(1)
FCM RL-FCM RCA WFCM FRCM FCM-SAS(標(biāo)準(zhǔn)化樣本) FCM-SAS(未標(biāo)準(zhǔn)化樣本) Iris AR 0.893 0.907 0.967 0.957 0.960 0.987 0.953 RI 0.880 0.892 0.958 0.934 0.952 0.983 0.942 NMI 0.750 – – 0.831 0.873 0.949 0.8498 Seeds AR 0.895 0.895 0.903 0.895 0.895 0.919 0.900 RI 0.874 0.874 0.884 0.873 0.876 0.899 0.877 NMI 0.695 – – 0.677 0.697 0.717 0.671 Breast AR 0.937 0.953 0.655 0.938 0.947 0.965 0.946 RI 0.876 0.910 0.548 0.884 0.911 0.932 0.897 NMI 0.730 – – 0.736 0.755 0.782 0.688 下載: 導(dǎo)出CSV
表 5 UCI數(shù)據(jù)集聚類結(jié)果評(píng)價(jià)指標(biāo)對(duì)比(2)
FCM PSO-IFCM GA-IFCM ABC-IFCM KFCM WGFCM FCM-SAS(標(biāo)準(zhǔn)化樣本) FCM-SAS(未標(biāo)準(zhǔn)化樣本) Iris AR 0.893 0.807 0.849 0.787 0.895 0.973 0.987 0.953 Wine AR 0.949 0.655 0.652 0.642 0.942 0.966 0.955 0.781 Glass AR 0.421 0.419 0.393 0.467 0.460 0.733 0.533 0.472 下載: 導(dǎo)出CSV
表 6 FCM-SAS算法聚類過程統(tǒng)計(jì)數(shù)據(jù)
樣本集 1次定類錯(cuò)誤樣本數(shù) 1次定類正確的存疑樣本數(shù) 1次定類錯(cuò)誤的存疑樣本數(shù) 存疑樣本數(shù) 2次定類正確樣本數(shù) 2次定類錯(cuò)誤樣本數(shù) Iris 16 29 16 45 43 2 Seeds 21 44 19 63 48 15 Breast 43 173 27 200 191 9 Wine 9 45 8 53 47 6 Glass 126 21 42 63 45 18 下載: 導(dǎo)出CSV
-
BEZDEK J C. Pattern Recognition with Fuzzy Objective Function Algorithms[M]. Boston: Springer, 1981: 155–201. FRIGUI H and KRISHNAPURAM R. A robust competitive clustering algorithm with applications in computer vision[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(5): 450–465. doi: 10.1109/34.765656 YANG M and NATALIANI Y. Robust-learning fuzzy C-means clustering algorithm with unknown number of clusters[J]. Pattern Recognition, 2017, 71: 45–59. doi: 10.1016/j.patcog.2017.05.017 SON L H and TIEN N D. Tune up fuzzy C-means for big data: Some novel hybrid clustering algorithms based on initial selection and incremental clustering[J]. International Journal of Fuzzy Systems, 2017, 19(5): 1585–1602. doi: 10.1007/s40815-016-0260-3 HUANG Chengquan, CHUNG F, and WANG Shitong. Generalized competitive agglomeration clustering algorithm[J]. International Journal of Machine Learning and Cybernetics, 2017, 8(6): 1945–1969. doi: 10.1007/s13042-016-0572-5 SINGH C and BALA A. A DCT-based local and non-local fuzzy C-means algorithm for segmentation of brain magnetic resonance images[J]. Applied Soft Computing, 2018, 68: 447–457. doi: 10.1016/j.asoc.2018.03.054 SAHA A and DAS S. Geometric divergence based fuzzy clustering with strong resilience to noise features[J]. Pattern Recognition Letters, 2016, 79: 60–67. doi: 10.1016/j.patrec.2016.04.013 肖滿生, 肖哲, 文志誠(chéng), 等. 一種空間相關(guān)性與隸屬度平滑的FCM改進(jìn)算法[J]. 電子與信息學(xué)報(bào), 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710XIAO Mansheng, XIAO Zhe, WEN Zhicheng, et al. Improved fcm clustering algorithm based on spatial correlation and membership smoothing[J]. Journal of Electronics &Information Technology, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710 WANG Xizhao, WANG Yadong, and WANG Lijuan. Improving fuzzy C-means clustering based on feature-weight learning[J]. Pattern Recognition Letters, 2004, 25(10): 1123–1132. doi: 10.1016/j.patrec.2004.03.008 YANG M S and NATALIANI Y. A feature-reduction fuzzy clustering algorithm based on feature-weighted entropy[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 817–835. doi: 10.1109/TFUZZ.2017.2692203 KUO R J, LIN T C, ZULVIA F E, et al. A hybrid metaheuristic and kernel intuitionistic fuzzy c-means algorithm for cluster analysis[J]. Applied Soft Computing, 2018, 67: 299–308. doi: 10.1016/j.asoc.2018.02.039 JIANG Zhaohui, LI Tingting, MIN Wengfang, et al. Fuzzy c-means clustering based on weights and gene expression programming[J]. Pattern Recognition Letters, 2017, 90: 1–7. doi: 10.1016/j.patrec.2017.02.015 JIE Lilin, LIU Weidong, SUN Zheng, et al. Hybrid fuzzy clustering methods based on improved self-adaptive cellular genetic algorithm and optimal-selection-based fuzzy c-means[J]. Neurocomputing, 2017, 249: 140–156. doi: 10.1016/j.neucom.2017.03.068 KIM E H, OH S K, and PEDRYCZ W. Reinforced hybrid interval fuzzy neural networks architecture: Design and analysis[J]. Neurocomputing, 2018, 303: 20–36. doi: 10.1016/j.neucom.2018.04.003 DAGHER I. Fuzzy clustering using multiple Gaussian kernels with optimized-parameters[J]. Fuzzy Optimization and Decision Making, 2018, 17(2): 159–176. doi: 10.1007/s10700-017-9268-x 王駿, 劉歡, 蔣亦樟, 等. 堆疊隱空間模糊C均值聚類算法[J]. 控制與決策, 2016, 31(9): 1671–1677. doi: 10.13195/j.kzyjc.2015.0768WANG Jun, LIU Huan, JIANG Yizhang, et al. Cascaded hidden space fuzzy C means clustering algorithm[J]. Control and Decision, 2016, 31(9): 1671–1677. doi: 10.13195/j.kzyjc.2015.0768 LUO Xiong, XU Yang, WANG Weiping, et al. Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropy[J]. Journal of the Franklin Institute, 2018, 355(4): 1945–1966. doi: 10.1016/j.jfranklin.2017.08.014 DAI Jianhua, HU Qinghua, HU Hu, et al. Neighbor inconsistent pair selection for attribute reduction by rough set approach[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 937–950. doi: 10.1109/TFUZZ.2017.2698420 BU Fanyu. An efficient fuzzy c-means approach based on canonical polyadic decomposition for clustering big data in IoT[J]. Future Generation Computer Systems, 2018, 88: 675–682. doi: 10.1016/j.future.2018.04.045 MACIEJEWSKI R, JANG Y, WOO I, et al. Abstracting attribute space for transfer function exploration and design[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(1): 94–107. doi: 10.1109/TVCG.2012.105 李保珍, 張亭亭. 成對(duì)屬性關(guān)聯(lián)分析及其屬性空間構(gòu)建[J]. 情報(bào)學(xué)報(bào), 2014, 33(11): 1194–1203.LI Baozhen and ZHANG Tingting. Association analysis of pairwise attributes and construction of attribute space[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(11): 1194–1203. WEI Cuiping, WANG Pei, and ZHANG Yuzhong. Entropy, similarity measure of interval-valued intuitionistic fuzzy sets and their applications[J]. Information Sciences, 2011, 181(19): 4273–4286. doi: 10.1016/j.ins.2011.06.001 BACHE K and LICHMAN M. UCI irvine machine learning repository[EB/OL]. Irvine, CA: University of California, School of Information and Computer Science, http://archive.ics.uci.edu/ml2013. RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336): 846–850. doi: 10.1080/01621459.1971.10482356 COOMBS C H, DAWES R M, and TVERSKY A. Mathematical Psychology: An Elementary Introduction[M]. Englewood Cliffs, USA: Prentice-Hall, 1970: 391–406. ZHANG Daoqiang and CHEN Songcan. Clustering incomplete data using kernel-based fuzzy C-means algorithm[J]. Neural Processing Letters, 2003, 18(3): 155–162. doi: 10.1023/B:NEPL.0000011135.19145.1b -
計(jì)量
- 文章訪問數(shù): 3122
- HTML全文瀏覽量: 1229
- PDF下載量: 81
- 被引次數(shù): 0