一種基于屬性空間相似性的模糊聚類算法

施偉鋒; 卓金寶; 蘭瑩

doi:10.11999/JEIT180974

一種基于屬性空間相似性的模糊聚類算法

doi: 10.11999/JEIT180974

上海海事大學(xué) 上海 201306

基金項(xiàng)目: 國(guó)家自然科學(xué)基金(61503240)，上海海事大學(xué)研究生創(chuàng)新基金(2016ycx078)

詳細(xì)信息

作者簡(jiǎn)介:
施偉鋒：男，1963年生，博士，教授，主要研究方向?yàn)殡娏ο到y(tǒng)自動(dòng)化

卓金寶：男，1991年生，博士生，研究方向?yàn)橹悄芄收显\斷與預(yù)測(cè)

蘭瑩：女，1985年生，博士，講師，研究方向?yàn)槎嘧灾黧w與混雜系統(tǒng)研究

通訊作者:
施偉鋒　wfshi@shmtu.edu.cn

中圖分類號(hào): TP391
計(jì)量
- 文章訪問數(shù): 3122
- HTML全文瀏覽量: 1229
- PDF下載量: 81
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2018-10-17
- 修回日期: 2019-02-28
- 網(wǎng)絡(luò)出版日期: 2019-04-25
- 刊出日期: 2019-11-01

A Novel Fuzzy Clustering Algorithm Based on Similarity of Attribute Space

Shanghai Maritime University, Shanghai 201306, China

Funds: The National Natural Science Foundation of China (61503240), Shanghai Maritime University Graduate Student Innovation Fund Project (2016ycx078)

摘要

摘要: 模糊C均值(FCM)聚類算法及其相關(guān)改進(jìn)算法基于最大模糊隸屬度原則確定聚類結(jié)果，沒有充分利用迭代后的模糊隸屬度矩陣和簇類中心的樣本屬性特征信息，影響聚類準(zhǔn)確度。針對(duì)這個(gè)問題，該文提出一種新的改進(jìn)思路：改進(jìn)FCM算法輸出定類原則。給出二元屬性拓?fù)渥涌臻g中屬性相似度的定義，最終提出一種基于屬性空間相似性的改進(jìn)FCM算法(FCM-SAS)：首先，選擇FCM算法聚類后模糊隸屬度低于聚類置信度的樣本作為存疑樣本；然后，計(jì)算存疑樣本與聚類后聚類中心的屬性相似度；最后，基于最大屬性相似度原則更新存疑樣本的簇類標(biāo)簽。通過UCI數(shù)據(jù)集實(shí)驗(yàn)，證明算法不僅有效，還較一些基于最大模糊隸屬度原則定類的改進(jìn)算法具有更優(yōu)的聚類評(píng)價(jià)指標(biāo)。
- 模糊C均值聚類 /
- 屬性拓?fù)渥涌臻g /
- 拓?fù)湎嗨贫?/a> /
- 聚類置信度 /
- 最大屬性相似度原則
Abstract: With the attribute feature information of the fuzzy membership matrix and cluster centers after the iteration not fully utilized, the results of Fuzzy C-Means (FCM) Clustering and related modified algorithms are determined based on the principle of maximum fuzzy membership, causing bad influence on the clustering accuracy. To solve this problem, the improvement ideas are proposed: to improve classification principle of FCM. The formula definition of attribute similarity in binary topological subspaces is given. Then, the improved FCM algorithm based on the Similarity of Attribute Space (FCM-SAS) is proposed: First, samples with fuzzy membership degree lower than the clustering reliability are selected as suspicious samples. Next, the attribute similarity between the suspicious samples and the cluster centers after clustering are calculated. Finally, cluster labels of suspicious samples based on the principle of maximum attribute similarity are updated. The validity and superiority of the proposed algorithm is verified by the UCI sample set experiments and comparisons with other modified algorithms based on the principle of maximum fuzzy membership.
- Fuzzy C-Means (FCM) clustering /
- Attribute topology subspace /
- Attribute similarity /
- Clustering reliability /
- Principle of maximum attribute similarity

HTML全文

表 1 FCM-SAS算法具體步驟

輸入：	樣本集${\text{X}}$、樣本數(shù)$n$，聚類個(gè)數(shù)$c$、加權(quán)指數(shù)$m$、迭代閾值$\varepsilon $、最大迭代次數(shù)T、聚類存疑率$\xi $、屬性占比率$\kappa $；
輸出：	樣本標(biāo)簽集${{\text{X}}_l}'$；
1	按表1的傳統(tǒng)FCM算法步驟得到迭代后模糊隸屬度矩陣${\text{U}}$、簇類中心${\text{V}}$和樣本標(biāo)簽集${{\text{X}}_l}$，令$j = 0$；
2	計(jì)算所有樣本點(diǎn)${\text{x}}$的模糊隸屬度最大值，按遞增順序排序并組成數(shù)組，選出此數(shù)組中第$\left[ {\xi \times n} \right]$個(gè)元素的數(shù)值作為聚類置信度$\eta $；
3	令$j = j + 1$，判斷第$j$個(gè)樣本的模糊隸屬度$\max \left( {\left\{ {{u_{ij}}\left\| {i = 1, 2, ·\!·\!· , c} \right.} \right\}} \right)$是否不大于聚類置信度$\eta $，若是則此樣本為存疑樣本，轉(zhuǎn)步驟4；否則轉(zhuǎn)步驟8；
4	按式(8)計(jì)算第$j$個(gè)樣本與各個(gè)簇類中心在2元屬性拓?fù)渥涌臻g中的拓?fù)湎嗨贫燃?\gamma \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)$，將所有集合中的元素取絕對(duì)值后按遞增的順序排序并組成數(shù)組，計(jì)算此數(shù)組中第$\left[ {n \times {\gamma _{dis}} \times \kappa } \right]$個(gè)元素與數(shù)值1之間差的絕對(duì)值作為鄰域半徑$\delta $；
5	以$\delta $為鄰域半徑，按式(10)計(jì)算第$j$個(gè)樣本與各個(gè)簇類中心的屬性相似度$\psi \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)$；
6	若最大屬性相似度$\max \left( {\psi \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)} \right)$只有一個(gè)，則選出最大屬性相似度時(shí)的簇類中心所在的類別作為此樣本更新后的標(biāo)簽$x_{lj}'$，轉(zhuǎn)步驟8；否則，轉(zhuǎn)步驟7；
7	若最大屬性相似度不止一個(gè)，則選擇這些簇類中最大拓?fù)湎嗨贫燃现?\widehat S = \max \left( {\Sigma \gamma \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)} \right)$時(shí)的簇類中心所在的類別作為$x_{lj}'$；
8	判斷$j < n$，若是則轉(zhuǎn)步驟3，否則輸出更新后的樣本標(biāo)簽集${{\text{X}}_l}'$。

下載: 導(dǎo)出CSV

表 2 算法輸入?yún)?shù)設(shè)置

參數(shù)	數(shù)值
加權(quán)指數(shù)$m$	2
迭代閾值$\varepsilon $	${10^{ - 3}}$
最大迭代次數(shù)$T\;$	$100$
聚類存疑率$\xi $	$0.3$
屬性占比率$\kappa $	$0.5$

下載: 導(dǎo)出CSV

表 3 UCI數(shù)據(jù)集的統(tǒng)計(jì)描述

數(shù)據(jù)集	樣本數(shù)	維數(shù)	簇類	各類占比
Iris	150	4	3	50:50:50
Wine	178	13	3	59:71:48
Seeds	210	7	3	70:70:70
Breast	683	9	2	444:239
Glass	214	9	6	70:17:76:13:9:29

下載: 導(dǎo)出CSV

表 4 UCI數(shù)據(jù)集聚類結(jié)果評(píng)價(jià)指標(biāo)對(duì)比(1)

		FCM	RL-FCM	RCA	WFCM	FRCM	FCM-SAS(標(biāo)準(zhǔn)化樣本)	FCM-SAS(未標(biāo)準(zhǔn)化樣本)
Iris	AR	0.893	0.907	0.967	0.957	0.960	0.987	0.953
	RI	0.880	0.892	0.958	0.934	0.952	0.983	0.942
	NMI	0.750	–	–	0.831	0.873	0.949	0.8498
Seeds	AR	0.895	0.895	0.903	0.895	0.895	0.919	0.900
	RI	0.874	0.874	0.884	0.873	0.876	0.899	0.877
	NMI	0.695	–	–	0.677	0.697	0.717	0.671
Breast	AR	0.937	0.953	0.655	0.938	0.947	0.965	0.946
	RI	0.876	0.910	0.548	0.884	0.911	0.932	0.897
	NMI	0.730	–	–	0.736	0.755	0.782	0.688

下載: 導(dǎo)出CSV

表 5 UCI數(shù)據(jù)集聚類結(jié)果評(píng)價(jià)指標(biāo)對(duì)比(2)

		FCM	PSO-IFCM	GA-IFCM	ABC-IFCM	KFCM	WGFCM	FCM-SAS(標(biāo)準(zhǔn)化樣本)	FCM-SAS(未標(biāo)準(zhǔn)化樣本)
Iris	AR	0.893	0.807	0.849	0.787	0.895	0.973	0.987	0.953
Wine	AR	0.949	0.655	0.652	0.642	0.942	0.966	0.955	0.781
Glass	AR	0.421	0.419	0.393	0.467	0.460	0.733	0.533	0.472

下載: 導(dǎo)出CSV

表 6 FCM-SAS算法聚類過程統(tǒng)計(jì)數(shù)據(jù)

樣本集	1次定類錯(cuò)誤樣本數(shù)	1次定類正確的存疑樣本數(shù)	1次定類錯(cuò)誤的存疑樣本數(shù)	存疑樣本數(shù)	2次定類正確樣本數(shù)	2次定類錯(cuò)誤樣本數(shù)
Iris	16	29	16	45	43	2
Seeds	21	44	19	63	48	15
Breast	43	173	27	200	191	9
Wine	9	45	8	53	47	6
Glass	126	21	42	63	45	18

下載: 導(dǎo)出CSV

參考文獻(xiàn)(26)

BEZDEK J C. Pattern Recognition with Fuzzy Objective Function Algorithms[M]. Boston: Springer, 1981: 155–201.

FRIGUI H and KRISHNAPURAM R. A robust competitive clustering algorithm with applications in computer vision[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(5): 450–465. doi: 10.1109/34.765656

YANG M and NATALIANI Y. Robust-learning fuzzy C-means clustering algorithm with unknown number of clusters[J]. Pattern Recognition, 2017, 71: 45–59. doi: 10.1016/j.patcog.2017.05.017

SON L H and TIEN N D. Tune up fuzzy C-means for big data: Some novel hybrid clustering algorithms based on initial selection and incremental clustering[J]. International Journal of Fuzzy Systems, 2017, 19(5): 1585–1602. doi: 10.1007/s40815-016-0260-3

HUANG Chengquan, CHUNG F, and WANG Shitong. Generalized competitive agglomeration clustering algorithm[J]. International Journal of Machine Learning and Cybernetics, 2017, 8(6): 1945–1969. doi: 10.1007/s13042-016-0572-5

SINGH C and BALA A. A DCT-based local and non-local fuzzy C-means algorithm for segmentation of brain magnetic resonance images[J]. Applied Soft Computing, 2018, 68: 447–457. doi: 10.1016/j.asoc.2018.03.054

SAHA A and DAS S. Geometric divergence based fuzzy clustering with strong resilience to noise features[J]. Pattern Recognition Letters, 2016, 79: 60–67. doi: 10.1016/j.patrec.2016.04.013

肖滿生, 肖哲, 文志誠(chéng), 等. 一種空間相關(guān)性與隸屬度平滑的FCM改進(jìn)算法[J]. 電子與信息學(xué)報(bào), 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710

XIAO Mansheng, XIAO Zhe, WEN Zhicheng, et al. Improved fcm clustering algorithm based on spatial correlation and membership smoothing[J]. Journal of Electronics &Information Technology, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710

WANG Xizhao, WANG Yadong, and WANG Lijuan. Improving fuzzy C-means clustering based on feature-weight learning[J]. Pattern Recognition Letters, 2004, 25(10): 1123–1132. doi: 10.1016/j.patrec.2004.03.008

YANG M S and NATALIANI Y. A feature-reduction fuzzy clustering algorithm based on feature-weighted entropy[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 817–835. doi: 10.1109/TFUZZ.2017.2692203

KUO R J, LIN T C, ZULVIA F E, et al. A hybrid metaheuristic and kernel intuitionistic fuzzy c-means algorithm for cluster analysis[J]. Applied Soft Computing, 2018, 67: 299–308. doi: 10.1016/j.asoc.2018.02.039

JIANG Zhaohui, LI Tingting, MIN Wengfang, et al. Fuzzy c-means clustering based on weights and gene expression programming[J]. Pattern Recognition Letters, 2017, 90: 1–7. doi: 10.1016/j.patrec.2017.02.015

JIE Lilin, LIU Weidong, SUN Zheng, et al. Hybrid fuzzy clustering methods based on improved self-adaptive cellular genetic algorithm and optimal-selection-based fuzzy c-means[J]. Neurocomputing, 2017, 249: 140–156. doi: 10.1016/j.neucom.2017.03.068

KIM E H, OH S K, and PEDRYCZ W. Reinforced hybrid interval fuzzy neural networks architecture: Design and analysis[J]. Neurocomputing, 2018, 303: 20–36. doi: 10.1016/j.neucom.2018.04.003

DAGHER I. Fuzzy clustering using multiple Gaussian kernels with optimized-parameters[J]. Fuzzy Optimization and Decision Making, 2018, 17(2): 159–176. doi: 10.1007/s10700-017-9268-x

王駿, 劉歡, 蔣亦樟, 等. 堆疊隱空間模糊C均值聚類算法[J]. 控制與決策, 2016, 31(9): 1671–1677. doi: 10.13195/j.kzyjc.2015.0768

WANG Jun, LIU Huan, JIANG Yizhang, et al. Cascaded hidden space fuzzy C means clustering algorithm[J]. Control and Decision, 2016, 31(9): 1671–1677. doi: 10.13195/j.kzyjc.2015.0768

LUO Xiong, XU Yang, WANG Weiping, et al. Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropy[J]. Journal of the Franklin Institute, 2018, 355(4): 1945–1966. doi: 10.1016/j.jfranklin.2017.08.014

DAI Jianhua, HU Qinghua, HU Hu, et al. Neighbor inconsistent pair selection for attribute reduction by rough set approach[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 937–950. doi: 10.1109/TFUZZ.2017.2698420

BU Fanyu. An efficient fuzzy c-means approach based on canonical polyadic decomposition for clustering big data in IoT[J]. Future Generation Computer Systems, 2018, 88: 675–682. doi: 10.1016/j.future.2018.04.045

MACIEJEWSKI R, JANG Y, WOO I, et al. Abstracting attribute space for transfer function exploration and design[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(1): 94–107. doi: 10.1109/TVCG.2012.105

李保珍, 張亭亭. 成對(duì)屬性關(guān)聯(lián)分析及其屬性空間構(gòu)建[J]. 情報(bào)學(xué)報(bào), 2014, 33(11): 1194–1203.

LI Baozhen and ZHANG Tingting. Association analysis of pairwise attributes and construction of attribute space[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(11): 1194–1203.

WEI Cuiping, WANG Pei, and ZHANG Yuzhong. Entropy, similarity measure of interval-valued intuitionistic fuzzy sets and their applications[J]. Information Sciences, 2011, 181(19): 4273–4286. doi: 10.1016/j.ins.2011.06.001

BACHE K and LICHMAN M. UCI irvine machine learning repository[EB/OL]. Irvine, CA: University of California, School of Information and Computer Science, http://archive.ics.uci.edu/ml2013.

RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336): 846–850. doi: 10.1080/01621459.1971.10482356

COOMBS C H, DAWES R M, and TVERSKY A. Mathematical Psychology: An Elementary Introduction[M]. Englewood Cliffs, USA: Prentice-Hall, 1970: 391–406.

ZHANG Daoqiang and CHEN Songcan. Clustering incomplete data using kernel-based fuzzy C-means algorithm[J]. Neural Processing Letters, 2003, 18(3): 155–162. doi: 10.1023/B:NEPL.0000011135.19145.1b

相關(guān)文章

施引文獻(xiàn)

資源附件(0)

訪問統(tǒng)計(jì)