距離決策下的模糊聚類集成模型

費(fèi)博雯; 邱云飛; 劉萬(wàn)軍; 劉大千

doi:10.11999/JEIT171065

距離決策下的模糊聚類集成模型

doi: 10.11999/JEIT171065

1.
遼寧工程技術(shù)大學(xué)工商管理學(xué)院 ??葫蘆島 ??125105
2.
遼寧工程技術(shù)大學(xué)軟件學(xué)院 ??葫蘆島 ??125105
3.
遼寧工程技術(shù)大學(xué)電子與信息工程學(xué)院 ??葫蘆島 ??125105

基金項(xiàng)目: 國(guó)家自然科學(xué)基金青年科學(xué)基金(61401185)

詳細(xì)信息

作者簡(jiǎn)介:
費(fèi)博雯：女，1991年生，博士生，研究方向?yàn)閿?shù)據(jù)挖掘與智能數(shù)據(jù)處理

邱云飛：男，1976年生，博士，教授，主要研究方向?yàn)閿?shù)據(jù)挖掘與智能數(shù)據(jù)處理

劉萬(wàn)軍：男，1959年生，碩士，教授，主要研究方向?yàn)閳D像與視覺(jué)信息計(jì)算、運(yùn)動(dòng)目標(biāo)檢測(cè)與跟蹤

劉大千：男，1992年生，博士生，研究方向?yàn)閳D像與視覺(jué)信息計(jì)算、運(yùn)動(dòng)目標(biāo)檢測(cè)與跟蹤

通訊作者:
費(fèi)博雯　 feibowen2098@163.com

中圖分類號(hào): TP391
計(jì)量
- 文章訪問(wèn)數(shù): 2306
- HTML全文瀏覽量: 771
- PDF下載量: 65
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2017-11-15
- 修回日期: 2018-05-09
- 網(wǎng)絡(luò)出版日期: 2018-06-07
- 刊出日期: 2018-08-01

Fuzzy Clustering Ensemble Model Based on Distance Decision

1.
School of Business Administration, Liaoning Technical University, Huludao 125105, China
2.
School of Software, Liaoning Technical University, Huludao 125105, China
3.
School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105, China

Funds: The Young Scientists Fund of the National Natural Science Foundation of China (61401185)

摘要

摘要: 模糊聚類是近年來(lái)使用的一類性能較為優(yōu)越的聚類算法，但該類算法對(duì)初始聚類中心敏感且對(duì)邊界樣本的聚類結(jié)果不夠準(zhǔn)確。為了提高聚類準(zhǔn)確性、穩(wěn)定性，該文通過(guò)聯(lián)合多個(gè)模糊聚類結(jié)果，提出一種距離決策下的模糊聚類集成模型。首先，利用模糊C均值(FCM)算法對(duì)數(shù)據(jù)樣本進(jìn)行多次聚類，得到相應(yīng)的隸屬度矩陣。然后，提出一種新的距離決策方法，充分利用得到的隸屬度關(guān)系構(gòu)建一個(gè)累積距離矩陣。最后，將距離矩陣引入密度峰值(DP)算法中，利用改進(jìn)的DP算法進(jìn)行聚類集成以獲取最終聚類結(jié)果。在UCI機(jī)器學(xué)習(xí)庫(kù)中選擇9個(gè)數(shù)據(jù)集進(jìn)行測(cè)試，實(shí)驗(yàn)結(jié)果表明，相比經(jīng)典的聚類集成模型，該文提出的聚類集成模型效果更佳。
- 模糊聚類 /
- 集成模型 /
- 距離決策 /
- 隸屬度矩陣 /
- 密度峰值算法
Abstract: Fuzzy clustering is a kind of clustering algorithm which shows superior performance in recent years, however, the algorithm is sensitive to the initial cluster center and can not obtain accurate results of clustering for the boundary samples. In order to improve the accuracy and stability of clustering, this paper proposes a novel approach of fuzzy clustering ensemble model based on distance decision by combining multiple fuzzy clustering results. First of all, performing several times clustering for data samples by using FCM (Fuzzy C-Means), and corresponding membership matrices are obtained. Then, a new method of distance decision is proposed, a cumulative distance matrix is constructed by the membership matrices. Finally, the distance matrix is introduced into the Density Peaks (DP) algorithm, and the final results of clustering are obtained by using the improved DP algorithm for clustering ensemble. The results of the experiment show that the clustering ensemble model proposed in this paper is more effective than other classical clustering ensemble model on the 9 data sets in UCI machine learning database.
- Fuzzy clustering /
- Ensemble model /
- Distance decision /
- Membership matrices /
- Density Peaks (DP) algorithm

HTML全文

圖 1 聚類集成模型整體框架圖

下載: 全尺寸圖片幻燈片

圖 2 不同m值下的平均RI、NMI值的比較結(jié)果

下載: 全尺寸圖片幻燈片

圖 3 基于不同距離度量的聚類模型聚類效果

下載: 全尺寸圖片幻燈片

圖 4 基于距離的DP集成決策圖

下載: 全尺寸圖片幻燈片

表 1 算法實(shí)現(xiàn)流程

輸入：實(shí)驗(yàn)數(shù)據(jù)集 $\small {{X}}{\rm{ = \{ }}{{x}_{i}}{\rm{|}}{i}{\rm{ = }}{1,2,}·\!·\!·\!{,}{n}{\rm{\} }}$，基聚類器運(yùn)算次數(shù) $\small {m}$，實(shí)驗(yàn)數(shù)據(jù)集的總類簇?cái)?shù) $\small C$

輸出：數(shù)據(jù)集 $\small {{X}}{\rm{ = \{ }}{{x}_{i}}{\rm{|}}{i}{\rm{ = }}{1,2,}·\!·\!·{,}{n}{\rm{\} }}$的聚類集成標(biāo)簽Q

步驟 1　獲取基聚類FCM的聚類結(jié)果：

(1) 判斷基聚類器運(yùn)算次數(shù)是否小于 $\small {m}$；

(2) 利用范圍在(0,1)之間的隨機(jī)數(shù)初始化模糊隸屬度 $\small {{{U}}_{j}},\;{j} = {1,2,}·\!·\!·\!{,}{m}$，滿足式(1)的約束條件；

(3) 通過(guò)式(2)計(jì)算 $\small C$個(gè)類簇的中心 $\small {c}_{k}^{j}{(}{k} = {1,2,}·\!·\!·\!{,}{C})$；

(4) 通過(guò)式(1)計(jì)算FCM的目標(biāo)函數(shù)，若 $\small {J_{hj}}(u,c)$小于設(shè)定閾值，或相對(duì)上次計(jì)算的目標(biāo)函數(shù)的變化量 $\small \Delta {J_{hj}}$小于閾值，則迭代終止；

(5) 利用式(3)重新計(jì)算新的模糊隸屬度 $\small {{{U}}_j}$，返回(3)；

(6) 保存模糊隸屬度 $\small {{U}_j}$, $\small j = j + 1$，返回(1)；

步驟 2　構(gòu)建累積距離矩陣：

(1) 利用式(8)計(jì)算每次得到的隸屬度矩陣 $\small {{{U}}_j},\ j = 1,2,·\!·\!·\!,m$對(duì)應(yīng)的最大隸屬類信息矩陣 $\small {{{L}}_j}$；

(2) 利用式(9)計(jì)算單個(gè)隸屬度矩陣 $\small {{{U}}_{j}}$與信息矩陣 $\small {{{L}}_j}$構(gòu)造出的隸屬相似矩陣 $\small {{{U}}\!_j}\!^\prime $為例進(jìn)行距離矩陣的構(gòu)建；

(3) 重復(fù)執(zhí)行 $\small {m}$次(2)，得到累積隸屬相似矩陣 $\small {{{U}}^\prime }$；

(4) 利用式(10)構(gòu)建累積距離矩陣 $\small {{D}}$；

步驟 3　基于DP的聚類集成：

(1) 利用步驟2得到的累積距離矩陣 $\small {{D}}$計(jì)算數(shù)據(jù)樣本間的兩兩距離 $\small {d_{ij}}$，并確定截?cái)嗑嚯x $\small {d_c}$；

(2) 按照式(4)和式(5)分別計(jì)算數(shù)據(jù)樣本 $\small {x_i}$的局部密度 $\small {\rho _i}$和與更高局部密度的點(diǎn)的距離 $\small {\delta _i}$；

(3) 利用式(11)中的 $\small {\gamma _i}$選擇前K個(gè)密度峰值點(diǎn)作為集成聚類中心 $\small \{ {{c}_{k}}{,}{k} = {1,2,}·\!·\!·\!{,}{C}\} $，對(duì)非數(shù)據(jù)中心的數(shù)據(jù)樣本進(jìn)行歸類；

(4) 計(jì)算聚類的邊界區(qū)域，排除光暈點(diǎn)的干擾。

下載: 導(dǎo)出CSV

表 2 實(shí)驗(yàn)數(shù)據(jù)集信息

序號(hào)	1	2	3	4	5	6	7	8	9
數(shù)據(jù)集	parkinsons	wdbc	ionosphere	german	iris	wine	waveform	vehicle	x8d5k
數(shù)據(jù)樣本數(shù)	195	569	351	1000	150	178	5000	846	1000
屬性	22	30	34	24	4	13	21	18	8
類別數(shù)	2	2	2	2	3	3	3	4	5

下載: 導(dǎo)出CSV

表 3 5種聚類方法的RI比較結(jié)果

序號(hào)	FCM		CSP		HGP		MCL		本文
序號(hào)	均值	標(biāo)準(zhǔn)差	均值	標(biāo)準(zhǔn)差	均值	標(biāo)準(zhǔn)差	均值	標(biāo)準(zhǔn)差	均值	標(biāo)準(zhǔn)差
1	0.579	0.068	0.638	0.012	0.607	0.007	0.613	0.016	0.646	0.003
2	0.723	0.047	0.833	0.013	0.812	0.019	0.841	0.021	0.852	0.010
3	0.563	0.057	0.628	0.015	0.617	0.017	0.613	0.007	0.662	0.014
4	0.625	0.039	0.729	0.016	0.746	0.006	0.731	0.004	0.758	0.009
5	0.786	0.034	0.832	0.023	0.822	0.015	0.852	0.016	0.887	0.006
6	0.732	0.041	0.856	0.011	0.859	0.017	0.826	0.008	0.863	0.013
7	0.628	0.037	0.646	0.007	0.638	0.014	0.637	0.011	0.641	0.015
8	0.547	0.022	0.627	0.021	0.593	0.014	0.616	0.005	0.635	0.007
9	0.715	0.033	0.816	0.004	0.849	0.005	0.834	0.010	0.866	0.008
平均值	0.655	0.042	0.734	0.014	0.727	0.013	0.729	0.011	0.757	0.009

下載: 導(dǎo)出CSV

表 4 5種聚類方法的NMI比較結(jié)果

序號(hào)	FCM		CSP		HGP		MCL		本文
序號(hào)	均值	標(biāo)準(zhǔn)差	均值	標(biāo)準(zhǔn)差	均值	標(biāo)準(zhǔn)差	均值	標(biāo)準(zhǔn)差	均值	標(biāo)準(zhǔn)差
1	0.412	0.043	0.483	0.004	0.488	0.007	0.474	0.006	0.516	0.002
2	0.536	0.028	0.576	0.008	0.619	0.007	0.577	0.012	0.634	0.016
3	0.512	0.063	0.568	0.006	0.567	0.003	0.544	0.008	0.574	0.002
4	0.499	0.052	0.582	0.011	0.612	0.005	0.594	0.008	0.625	0.004
5	0.692	0.025	0.783	0.016	0.808	0.015	0.747	0.012	0.813	0.014
6	0.645	0.026	0.721	0.007	0.736	0.004	0.712	0.009	0.742	0.003
7	0.525	0.020	0.585	0.011	0.564	0.007	0.579	0.002	0.591	0.004
8	0.118	0.016	0.135	0.004	0.136	0.002	0.133	0.005	0.142	0.006
9	0.632	0.034	0.724	0.012	0.716	0.011	0.712	0.008	0.742	0.010
平均值	0.508	0.034	0.573	0.009	0.583	0.007	0.564	0.008	0.598	0.007

下載: 導(dǎo)出CSV

參考文獻(xiàn)(15)

MEI Jianping, WANG Yangtao, CHEN Lihui, et al.. Large scale document categorization with fuzzy clustering[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(5): 1239–1251. DOI: 10.1109/TFUZZ.2016.2604009.

張潔玉, 李佐勇. 基于核空間的加權(quán)鄰域約束直覺(jué)模糊聚類算法[J]. 電子與信息學(xué)報(bào), 2017, 39(9): 2162–2168. DOI: 10.11999/JEIT161317.

ZHANG Jieyu and LI Zuoyong. Kernel-based algorithm with weighted spatial information intuitionistic fuzzy c-means[J]. Journal of Electronics & Information Technology, 2017, 39(9): 2162–2168. DOI: 10.11999/JEIT161317.

葉茂, 劉文芬. 基于快速地標(biāo)采樣的大規(guī)模譜聚類算法[J]. 電子與信息學(xué)報(bào), 2017, 39(2): 278–284. DOI: 10.11999/JEIT160260.

YE Mao and LIU Wenfen. Large scale spectral clustering based on fast landmark sampling[J]. Journal of Electronics & Information Technology, 2017, 39(2): 278–284. DOI: 10.11999/JEIT160260.

周林, 平西建, 徐森, 等. 基于譜聚類的聚類集成算法[J]. 自動(dòng)化學(xué)報(bào), 2012, 38(8): 1335–1342. DOI: 10.3724/SP.J.1004.2012.01335.

ZHOU Lin, PING Xijian, XU Sen, et al.. Cluster ensemble based on spectral clustering[J]. Acta Automatica Sinica, 2012, 38(8): 1335–1342. DOI: 10.3724/SP.J.1004.2012.01335.

張敏, 于劍. 基于劃分的模糊聚類算法[J]. 軟件學(xué)報(bào), 2004, 15(6): 858–868. DOI: 10.13328/j.cnki.jos.2004.06.008.

ZHANG Min and YU Jian. Fuzzy partitional clustering algorithms[J]. Journal of Software, 2004, 15(6): 858–868. DOI: 10.13328/j.cnki.jos.2004.06.008.

BEZDEK J C, HATHAWAY R J, SABIN M J, et al.. Convergence theory for fuzzy c-means: Counter-examples and repairs[J]. IEEE Transaction on Systems, Man, and Cybernetics, 1987, 17(5): 873–877. DOI: 10.1109/TSMC.1987.6499296.

STREHL A and GHOSH J. Cluster ensembles a knowledge reuse framework for combining multiple partitions[J]. The Journal of Machine Learning Research, 2003, 3(3): 583–617. DOI: 10.1162/153244303321897735.

GOSWAMI J P and MAHANTA A K. A genetic algorithm based ensemble approach for categorical data clustering[C]. Proceedings of the 2015 Annual IEEE India Conference (INDICON), New Delhi, India, 2015: 1–6.

BANERJEE B, BOVOLO F, BHATTACHARYA A, et al.. A new self-training-based unsupervised satellite image classification technique using cluster ensemble strategy[J]. IEEE Geoscience and Remote Sensing Letters, 2015, 12(4): 741–745. DOI: 10.1109/LGRS.2014.2360833.

HAO Zhifeng, WANG Lijuan, CAI Ruichu, et al.. An improved clustering ensemble method based link analysis[J]. World Wide Web, 2015, 18(2): 185–195. DOI: 10.1007/s11280-013-0208-6.

ZHONG Caiming, YUE Xiaodong, ZHANG Zehua, et al.. A clustering ensemble: two-level-refined co-association matrix with path-based transformation[J]. Pattern Recognition, 2015, 48(8): 2699–2709. DOI: 10.1016/j.patcog.2015.02.014.

褚睿鴻, 王紅軍, 楊燕, 等. 基于密度峰值的聚類集成[J]. 自動(dòng)化學(xué)報(bào), 2016, 42(9): 1401–1412. DOI: 10.16383/j.aas.2016.c150864.

CHU Ruihong, WANG Hongjun, YANG Yan, et al.. Clustering ensemble based on density peaks[J]. Acta Automatica Sinica, 2016, 42(9): 1401–1412. DOI: 10.16383/j.aas.2016.c150864.

RODRIGUEZ A and LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492–1496. DOI: 10.1126/science.1242072.

ZHOU Zhihua and TANG Wei. Clusterer ensemble[J]. Knowledge-Based Systems, 2006, 19(1): 77–83. DOI: 10.1016/j.knosys.2005.11.003.

GAN Guojun, YANG Zijiang, and WU Jianhong. A genetic K-modes algorithm for clustering categorical data[J]. Springer Berlin Heidelberg, 2005, 36(2): 728–728. DOI: 10.1007/11527503_23.

相關(guān)文章

施引文獻(xiàn)

資源附件(0)

訪問(wèn)統(tǒng)計(jì)