基于多維擴(kuò)展特征與深度學(xué)習(xí)的微博短文本情感分析
doi: 10.11999/JEIT160975
-
1.
(情感計(jì)算與先進(jìn)智能機(jī)器安徽省重點(diǎn)實(shí)驗(yàn)室 合肥 230009) ②(德島大學(xué)工學(xué)部 德島 770-8509)
國(guó)家自然科學(xué)基金(61432004),模式識(shí)別國(guó)家重點(diǎn)實(shí)驗(yàn)室開放課題(NLPR)(201407345),安徽省自然科學(xué)基金(1508085 QF119),中國(guó)博士后科學(xué)基金(2015M580532)
Extended Multi-modality Features and Deep Learning Based Microblog Short Text Sentiment Analysis
-
1.
(Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei 230009, China)
The National Natural Science Foundation of China (61432004), The Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (201407345), The Natural Science Foundation of Anhui Province (1508085QF119), The China Postdoctoral Science Foundation (2015M580532)
-
摘要: 該文提出了一種基于深度信念網(wǎng)絡(luò)(DBN)和多維擴(kuò)展特征的模型,實(shí)現(xiàn)對(duì)中文微博短文本的情感分類。為降低傳統(tǒng)文本分類方法在處理微博短文時(shí)特征稀疏的影響,引入社交關(guān)系網(wǎng)絡(luò)作為擴(kuò)展特征,依據(jù)評(píng)論者和博主之間的社交關(guān)系,提取相關(guān)評(píng)論擴(kuò)展原始微博,將擴(kuò)展后的多維特征作為深度信念網(wǎng)絡(luò)的輸入。通過疊加多層玻爾茲曼機(jī)(RBM)構(gòu)建DBN模型底層網(wǎng)絡(luò)結(jié)構(gòu),多層玻爾茲曼機(jī)可以對(duì)原始輸入抽象并獲得數(shù)據(jù)的深層語義特征。在多個(gè)RBM層上疊加一層分類玻爾茲曼機(jī)(ClassRBM),實(shí)現(xiàn)最終情感分類。實(shí)驗(yàn)結(jié)果表明,通過調(diào)整模型參數(shù)和網(wǎng)絡(luò)結(jié)構(gòu),構(gòu)建的深度學(xué)習(xí)模型在情感分類中能夠獲得比SVM和NB等淺層分類系統(tǒng)更優(yōu)的結(jié)果,另外,實(shí)驗(yàn)證明使用擴(kuò)展多維特征方法可提高短文本情感分類的性能。
-
關(guān)鍵詞:
- 社交網(wǎng)絡(luò) /
- 深度信念網(wǎng)絡(luò) /
- 擴(kuò)展多維特征 /
- 受限玻爾茲曼機(jī) /
- 分類受限玻爾茲曼機(jī)
Abstract: This paper presents a Deep Belief Nets (DBN) model and a multi-modality feature extraction method to extend features, dimensionalities of short text for Chinese microblogging sentiment classification. Besides traditional features sets for document classification, comments for certain posts are also extracted as part of the microblogging features according to the relationship between commenters and posters through constructing microblogging social network as input information. Multi-modality features are combined and adopted as the input vector for DBN. A DBN model, which is stacked with several layers of Restricted Boltzmann Machine (RBM), is implemented to initialize the structure of neural network. The RBM layers can take probability distribution samples of input data to learn hidden syntactic structures for better feature representation. A Classification RBM (ClassRBM) layer, which is stacked on top of the former RBM layers, is adapted to achieve the final sentiment classification. The results demonstrate that, with proper structure and parameter, the performance of the proposed deep learning method on sentiment classification is better than the state of the art surface learning models such as SVM or NB, which proves that DBN is suitable for short-length document classification with the proposed feature dimensionality extension method. -
劉斌, 黃鐵軍, 程軍, 等. 一種新的基于統(tǒng)計(jì)的自動(dòng)文本分類方法[J]. 中文信息學(xué)報(bào), 2002, 16(6): 18-24. LIU Bin, HUANG Tiejun, CHENG Jun, et al. The automatic text classification method based on statistics[J]. Journal of Chinese Information Processing, 2002, 16(6): 18-24. doi: 10.3969/j.issn.1003-0077.2002.06.003. 覃曉, 元昌安, 彭昱忠, 等. 基于詞典和遺傳算法的文本特征獲取方法[J]. 計(jì)算機(jī)工程與設(shè)計(jì), 2008, 29(21): 5651-5654. QIN Xiao, YUAN Chang,an, PENG Yuzhong, et al. Based on the dictionary method and genetic algorithm for text feature extraction[J]. Computer Engineering and Design, 2008, 29(21): 5651-5654. 胡侯立, 魏維, 胡蒙娜. 深度學(xué)習(xí)算法的原理及應(yīng)用[J]. 信息技術(shù), 2015(2): 175-177. doi: 10.13274/j.cnki.hdzj.2015.02. 045. HU Houli, WEI Wei, and HU Mengna. The principle and application of deep learning algorithm[J]. Information Technology, 2015(2): 175-177. doi: 10.13274/j.cnki.hdzj.2015. 02.045. 王榮波, 諶志群, 周建政, 等. 基于Wikipedia的短文本語義相關(guān)度計(jì)算方法[J]. 計(jì)算機(jī)應(yīng)用與軟件, 2015, 32(1): 82-85. doi: 10.3969/j.issn.1000-386x.2015.01.021. WANG Rongbo, SHEN Zhiqun, ZHOU Jianzheng, et al. Short text semantic relatedness calculation method based on Wikipedia[J]. Computer Applications and Software, 2015, 32(1): 82-85. doi: 10.3969/j.issn.1000-386x.2015.01.021. GLOROT X, BORDES A, and BENGIO Y. Domain adaptation for large-scale sentiment classification: A deep learning approach[C]. Proceedings of the 28 International Conference on Machine Learning, Bellevue, WA, USA, 2011: 513-520. SAIF H, HE Y, ALANI H, et al. On stopwords, filtering and data sparsity for sentiment analysis of twitter[C]. The International Conference on Language Resources and Evaluation, Reykjavik, Iceland, 2014: 810-817. XIA R, XU F, YU J, et al. Polarity shift detection, elimination and ensemble: A three-stage model for document- level sentiment analysis[J]. Information Processing Management, 2015, 52(1): 36-45. doi: 10.1016/j.ipm.2015.04. 003. PEISENIEKS J, SKADIN R, and PEISENIEKS J. Uses of machine translation in the sentiment analysis of tweets[C]. Human Language Technologies-the Baltic Perspective, Kaunas, Lithuania, 2014: 126-131. doi: 10.3233/978-1-61499- 442-8-126. SUBRAHMANIAN and REFORGIATO D. AVA: Adjective- verb-adverb combinations for sentiment analysis[J]. IEEE Intelligent Systems, 2008, 23(4): 43-50. doi: 10.1109/MIS. 2008.57. NARENDRA B, SAI K U, RAJESH G, et al. Sentiment analysis on movie reviews: A comparative study of machine learning algorithms and open source technologies[J]. International Journal of Intelligent Systems Technologies and Applications, 2016, 8(8): 66-70. doi: 10.5815/ijisa.2016.08.08. WU F and HUANG Y. Collaborative multi-domain sentiment classification[C]. IEEE International Conference on Data Mining, Atlantic City, NJ, USA, 2015: 459-468. doi: 10.1109/ICDM.2015.68. ZHENG W L, ZHU J Y, PENG Y, et al. EEG-based emotion classification using deep belief networks[C]. IEEE International Conference on Multimedia and Expo, Chengdu, China, 2014: 1-6. doi: 10.1109/ICME.2014.6890166. PSOMAKELIS E, TSERPES K, ANAGNOSTOPOULOS D, et al. Comparing methods for twitter sentiment analysis[C]. International Conference on Knowledge Discovery and Information Retrieval. Rome, Italy, 2015: 225-232. doi: 10.5220/0005075302250232. BRAVO-MARQUEZ F, MENDOZA M, and POBLETE B, Combining strengths, emotions and polarities for boosting twitter sentiment analysis[C]. Workshop on Issues of Sentiment Discovery and Opinion Mining, New York, NY, USA, 2013: 1-9. doi: 10.1145/2502069.2502071. XU K, FENG Y, HUANG S, et al. Semantic relation classification via convolutional neural networks with simple negative sampling[J]. Computer Science, 2015, 71(7): 941-950. doi: 10.18653/v1/D15-1062. SANTOS C N D and GATTIT M. Deep convolutional neural networks for sentiment analysis of short texts[C]. International Conference on Computational Linguistics, Dublin, Ireland, 2014: 69-78. ZHAI S and ZHANG Z. Semisupervised autoencoder for sentiment analysis[J]. Computer Science, 2015, 64(8): 1570-1582. doi: 10.1080/03081087.2015.1107020. SOCHER R, HUVAL B, MANNING D, et al. Semantic compositionality through recursive matrix-vector spaces[C]. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Jeju Island, Korea, 2012: 1201-1211. MIDHUN M E, NAIR S R, PRABHAKAR V T N, et al. Deep model for classification of hyperspectral image using restricted Boltzmann machine[C]. International Conference on Interdisciplinary Advances in Applied Computing, New York, NY, USA, 2014: 1-7. doi: 10.1145/2660859.2660946. WANG Y, ZHAO S, QU D, et al. Using conditional restricted Boltzmann machines for spectral envelope modeling in speech bandwidth extension[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, China, 2016: 5930-5934. doi: 10.1109/ICASSP.2016.7472815. CHEN F, WU Y, BU Y, et al. Spectral classification using restricted Boltzmann machine[J]. Publications of the Astronomical Society of Australia, 2014, 31(31): 386-406. doi: 10.1017/pasa.2013.38. TRIPATHY A, AGRAWAL A, and RATH S K. Classification of sentiment reviews using n-gram machine learning approach[J]. Expert Systems with Applications, 2016, 57: 117-126. doi: 10.1016/j.eswa.2016.03.028. -
計(jì)量
- 文章訪問數(shù): 1758
- HTML全文瀏覽量: 232
- PDF下載量: 680
- 被引次數(shù): 0