支持聯(lián)機分析處理的推特用戶興趣維層次提取方法
doi: 10.11999/JEIT170030
基金項目:
國家自然科學基金項目(61100043, 61472112),浙江省自然科學基金資助項目(LY12F02003),浙江省科技計劃重點資助項目(2017C01010, 2016F50014)
Extracting Dimension Hierarchy of Tweeters Interests for On-line Analytical Processing
Funds:
The National Natural Science Foundation of China (61100043, 61472112), The Natural Science Foundation of Zhejiang Province (LY12F02003), The Key Science and Technology Project of Zhejiang Province (2017C01010, 2016F50014)
-
摘要: 從海量推特數(shù)據(jù)中探索用戶興趣的分布規(guī)律和相關性有利于實現(xiàn)精確的個性化推薦。聯(lián)機分析處理(On- Line Analytical Processing, OLAP)提供了一種適合人們探究數(shù)據(jù)的直觀形式。將OLAP技術(shù)應用于推特數(shù)據(jù)的關鍵是如何挖掘和構(gòu)建推特用戶的興趣維層次。針對現(xiàn)有方法只能提取單一層次興趣的不足,該文提出一種支持聯(lián)機分析處理的推特用戶興趣維層次提取方法。該方法首先通過RestAPI獲取推特數(shù)據(jù),然后通過改進的LDA(Latent Dirichlet Allocation)模型挖掘用戶的興趣和子興趣,最后在此基礎上構(gòu)建興趣維層次結(jié)構(gòu)。實驗評估了該方法的模型效果和可擴展性,并證實與LDA和hLDA相比可以更有效地提取出推特用戶的興趣維層次并應用于聯(lián)機分析處理。
-
關鍵詞:
- 聯(lián)機分析處理 /
- 推特 /
- 維層次 /
- 興趣 /
- LDA(Latent Dirichlet Allocation)模型
Abstract: To explore the distribution and correlation from massive Twitter data helps the accurate personalized recommendation. On-Line Analytical Processing (OLAP) provides an intuitive form that is suitable for people to explore the Twitter data. The key of applying OLAP to Twitter data is how to mine and build dimension hierarchy of tweeter interests. Different from the existing approaches that can extract interests of tweeters with only one level, an approach to the extraction of dimension hierarchy of interests for OLAP is proposed. Firstly, it retrieves the Twitter data through RestAPI. Afterwards, it detects the interests and sub-interests using an improved (Latent Dirichlet Allocation, LDA) model. Based on the extracted interests and sub-interests it finally constructs the dimension hierarchy of interests. The experiment verifies its effectiveness and scalability, and demonstrates it can extract dimension hierarchy of tweeters interests for OLAP more effectively than LDA and hLDA. -
ZHANG Yubao, RUAN Xin, WANG Haining, et al. Twitter trends manipulation: A first look inside the security of Twitter trending[J]. IEEE Transactions on Information Forensics and Security, 2017, 12(1): 144-156. doi: 10.1109/ TIFS.2016.2604226. BEHESHTI S M R, BENATALLAH B, and MOTAHARI- NEZHAD H R. Scalable graph-based OLAP analytics over process execution data[J]. Distributed and Parallel Databases, 2016, 34(3): 379-423. doi: 10.1007/s10619-014-7171-9. OUKID Lamia, BENBLIDIA Nadjia, BENTAYEB Fadila, et al. Contextualized text OLAP based on information retrieval [J]. International Journal of Data Warehousing and Mining, 2015, 11(2): 1-21. doi: 10.4018/ijdwm.2015040101. DRZADZEWSKI G and TOMPA F W. Partial materialization for online analytical processing over multi- tagged document collections[J]. Knowledge and Information Systems, 2016, 47(3): 697-732. doi: 10.1007/s10115-015- 0871-2. SISWANTO E, KHODRA M L, and DEWI L J E. Prediction of interest for dynamic profile of Twitter user[C]. International Conference of Advanced Informatics: Concept, Theory and Application, Bandung, 2014: 266-271. LIM K H and DATTA A. Interest classification of Twitter users using Wikipedia[C]. International Symposium on Wikis and Open Collaboration, Hong Kong, 2013: 1-2. PU X, CHATTI M A, US H T, et al. Wiki-LDA: A mixed- method approach for effective interest mining on Twitter data[C]. The 8th International Conference on Computer Supported Education, Rome, 2016: 426-433. XU Z, RU L, XIANG L, et al. Discovering user interest on Twitter with a modified author-topic model[C]. IEEE/WIC/ ACM International Conference on Web Intelligence, Lyon, 2011: 422-429. ZHAO W X, JIANG J, WENG J S, et al. Comparing Twitter and traditional media using topic models[C]. The 33rd European Conference on IR Research, Dublin, 2011: 338-349. BLEI D M, GRIFFITH T L, JORDAN M I, et al. Hierarchical topic models and the nested Chinese restaurant process[C]. International Conference on Neural Information Processing Systems, Vancouver, 2003: 17-24. OUKID L, BOUSSAID O, BENBLIDIA N, et al. TLabel: A new OLAP aggregation operator in text cubes[J]. International Journal of Data Warehousing and Mining, 2016, 12(4): 54-74. doi: 10.4018/IJDWM.2016100103. BERBEL TDRL and GONZLEZ SM. How to help end users to get better decisions? personalising OLAP aggregation queries through semantic recommendation of text documents[J]. International Journal of Business Intelligence Data Mining, 2015, 10(1): 1-18. doi: 10.1504/ IJBIDM.2015.069022. BOUAKKAZ M, LOUDCHER S, and OUINTEN Y. OLAP textual aggregation approach using the Google similarity distance[J]. International Journal of Business Intelligence Data Mining, 2016, 11(1): 31-48. doi: 10.1504/IJBIDM.2016. 076425. BEN K M, FEKI J, KHROUF K, et al. OLAP of the tweets: from modeling toward exploitation[C]. The 8th International Conference on Research Challenges in Information Science IEEE, Marrakech, 2014: 1-10. REHMAN N U, MANSMANN S, WEILER A, et al. Building a data warehouse for Twitter stream exploration[C]. IEEE/ ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, 2012: 1341-1348. REHMAN N U, WEILER A, and SCHOLL M H. OLAPing social media: The case of Twitter[C]. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Niagara, Ontario, Canada, 2013: 1139-1146. BLEI D M, NG A Y, and JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3(1): 993-1022. -
計量
- 文章訪問數(shù): 1315
- HTML全文瀏覽量: 179
- PDF下載量: 271
- 被引次數(shù): 0