一種基于T-分布隨機近鄰嵌入的聚類集成方法
doi: 10.11999/JEIT170937
基金項目:
國家自然科學基金(61105057, 61375001),江蘇省自然科學基金(BK20151299),江蘇省產(chǎn)學研前瞻性聯(lián)合研究項目(BY2016065-01)
Cluster Ensemble Approach Based on T-distributed Stochastic Neighbor Embedding
Funds:
The National Natural Science Foundation of China (61105057, 61375001), The Natural Science Foundation of Jiangsu Province (BK20151299), The Industry-Education-Research Prospective Project of Jiangsu Province (BY2016065-01)
-
摘要: 該文將T-分布隨機近鄰嵌入(TSNE)引入到聚類集成問題中,提出一種基于TSNE的聚類集成方法。首先通過TSNE最小化超圖鄰接矩陣的行對應的高維數(shù)據(jù)點與低維映射點分布之間的KL散度,使得高維空間結構在低維空間得以保持,然后在低維空間運行層次聚類算法獲得最終的聚類結果。在基準數(shù)據(jù)集上的實驗結果表明: TSNE能夠提高層次聚類算法的聚類質量,該文方法獲得了優(yōu)于主流聚類集成方法的結果。Abstract: T-distributed Stochastic Neighbor Embedding (TSNE) is introduced into cluster ensemble problem and a cluster ensemble approach based on TSNE is proposed. First, TSNE is utilized to minimize Kullback-Leibler divergences between the high-dimensinal points corresponding to the rows of hypergraphs adjacent matrix and the low-dimensional mapping points, which preserves the structure of high-dimensional space in low-dimensional space. Then, a hierarchical clustering algorithm is carried out in the low-dimensional space to obtain the final clustering result. Experimental results on several baseline datasets indicate that TSNE can improve the cluster results of hierarchical clustering algorithm and the proposed cluster ensemble method via TSNE outperforms state-of-the-art methods.
-
Key words:
- Machine learning /
- Clustering analysis /
- Cluster ensemble /
- Hierarchical clustering
-
JAIN A K, MURTY M N, and FLYNN P J. Data clustering: A review[J]. ACM Computing Surveys, 1999, 31(3): 264-323. JAIN A K. Data clustering: 50 years beyond K-means[J]. Pattern Recognition Letters, 2010, 31(8): 651-666. 汪曉鋒, 劉功申, 李建華. 基于模糊聚類的多分辨率社區(qū)發(fā)現(xiàn)方法[J]. 電子與信息學報, 2017, 39(9): 2033-2039. doi: 10.11999/JEIT161116. WANG Xiaofeng, LIU Gongshen, and LI Jianhua. Multiresolution community detection based on fuzzy clustering[J]. Journal of Electronics Information Technology, 2017, 39(9): 2033-2039. doi: 10.11999/JEIT 161116. STREHL A and GHOSH J. Cluster ensembles: A knowledge reuse framework for combining multiple partitions[J]. Journal of Machine Learning Research, 2002, 3: 583-617. ZHOU Zhihua and TANG Wei. Clusterer ensemble[J]. Knowledge-Based Systems, 2006, 19(1): 77-83. 羅會蘭, 孔繁勝, 李一嘯. 聚類集成中的差異性度量研究[J]. 計算機學報, 2007, 30(8): 1315-1323. LUO Huilan, KONG Fansheng, and LI Yixiao. An analysis of diversity measures in clustering ensembles[J]. Chinese Journal of Computers, 2007, 30(8): 1315-1323. WU Junjie, LIU Hongfu, XIONG Hui, et al. K-means based consensus clustering: A unified view[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(1): 155-169. doi: 10.1109/TKDE.2014.2316512. FRED A and LOURENGO A. Cluster ensemble methods: From single clusterings to combined solutions[J]. Studies in Computational Intelligence, 2008, 126(1): 3-30. XU Sen, CHAN Kungsic, Gao Jun, et al. An integrated K-means?Laplacian cluster ensemble approach for document datasets[J]. Neurocomputing, 2016, 214(6): 495-507. doi: 10.1016/j.neucom.2016.06.034. YU Zhiwen, LI Le, LIU Jiming, et al. Adaptive noise immune cluster ensemble using affinity propagation[J]. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(12): 3176-3189. doi: 10.1109/TKDE.2015.2453162. 褚睿鴻, 王紅軍, 楊燕, 等. 基于密度峰值的聚類集成[J]. 自動化學報, 2016, 42(9): 1401-1412. doi: 10.16383/j.aas.2016. c150864. CHU Ruihong, WANG Hongjun, YANG Yan, et al. Clustering ensemble based on density peaks[J]. Acta Automatica Sinica, 2016, 42(9): 1401-1412. doi: 10.16383/ j.aas.2016.c150864. BERIKOV V and PESTUNOV I. Ensemble clustering based on weighted co-association matrices: Error bound and convergence properties[J]. Pattern Recognition, 2017, 63: 427-436. doi: 10.1016/j.patcog.2016.10.017. MAATEN L V D and HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(11): 2579-2605. MAATEN L V D. Learning a parametric embedding by preserving local structure[C]. Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, Clearwater Beach, Florida, USA, 2009: 384-391. MAATEN L V D. Accelerating t-SNE using tree-based algorithms[J]. Journal of Machine Learning Research, 2014, 15(1): 3221-3245. SALTON G and BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information Processing and Management, 1998, 24(5): 513-523. FERN X Z and LIN W. Cluster ensemble selection[J]. Statistical Analysis Data Mining, 2008, 1(3): 128-141. ZHAO Xingwang, LIANG Jiye, and DANG Chuangyin. Clustering ensemble selection for categorical data based on internal validity indices[J]. Pattern Recognition, 2017, 69(4): 150-168. doi: 10.1016/j.patcog.2017.04.019. -