基于信令數(shù)據(jù)的軌跡駐留點(diǎn)識(shí)別算法研究
doi: 10.11999/JEIT190914
-
1.
重慶郵電大學(xué)通信與信息工程學(xué)院 重慶 400065
-
2.
重慶郵電大學(xué)電子信息與網(wǎng)絡(luò)工程研究院 重慶 400065
Research of Track Resident Point Identification Algorithm Based on Signaling Data
-
1.
Institute of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
-
2.
Electronic Information and Networking Research Institute, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
-
摘要: 針對(duì)密度聚類(lèi)算法只能識(shí)別密度相近的簇類(lèi)且計(jì)算復(fù)雜度高等問(wèn)題,該文提出一種基于信令數(shù)據(jù)中時(shí)空軌跡信息的密度峰值快速聚類(lèi)(ST-CFSFDP)算法。首先對(duì)低采樣密度的信令數(shù)據(jù)進(jìn)行預(yù)處理,消除軌跡震蕩現(xiàn)象;然后基于密度峰值快速聚類(lèi)(CFSFDP)算法顯式地增加時(shí)間維度限制,將局部密度由2維擴(kuò)展到3維,并提出高密度時(shí)間間隔以表征簇中心在時(shí)間維度上的數(shù)據(jù)特征;接著設(shè)計(jì)篩選策略以選取聚類(lèi)中心;最后識(shí)別用戶出行軌跡中的駐留點(diǎn),完成出行鏈的劃分。實(shí)驗(yàn)結(jié)果表明,所提算法適用于采樣密度低且定位精度差的信令數(shù)據(jù),相比CFSFDP算法更適用于時(shí)空數(shù)據(jù),相比基于密度的時(shí)空聚類(lèi)算法(ST-DBSCAN)召回率提升14%,準(zhǔn)確率提升8%,同時(shí)降低計(jì)算復(fù)雜度。
-
關(guān)鍵詞:
- 信令數(shù)據(jù) /
- 時(shí)空聚類(lèi) /
- 密度峰值快速聚類(lèi)算法 /
- 駐留點(diǎn)識(shí)別 /
- 出行鏈
Abstract: For the problem that the density-based clustering algorithm can only identify clusters with similar density and high computational complexity, a Clustering by Fast Search and Find of Density Peaks based on Spatio-Temporal trajectory information in mobile phone signaling data, namely ST-CFSFDP, is proposed. Firstly, the low sampling density signaling data are pre-processed to eliminate the trajectory oscillation phenomenon in the data. Then, based on the Clustering by Fast Search and Find of Density Peaks(CFSFDP) algorithm, the time dimension limitation is explicitly increased, and the local density is extended from two-dimension to three-dimension. Moreover, in order to characterize the cluster center point in the time dimension, the concept of high-density time interval is defined. Secondly, the suitable cluster center screening strategy is developed to select automatically the appropriate cluster center. Finally, the resident points are identified in the travel trajectory of individual users over a period of time and the division of the travel chains is completed. The experimental results show that the algorithm is suitable for signaling data with low sampling density and poor positioning accuracy. It is more suitable for spatio-temporal data than CFSFDP algorithm. Compared with Density-Based Spatial Clustering of Applications with Noise based on Spatio-Temporal data (ST-DBSCAN) algorithm, the recall rate is improved by 14%, the accuracy rate is increased by 8%, and the computational complexity is also reduced. -
表 1 信令數(shù)據(jù)主要字段
字段名稱(chēng) 字段解釋 字段內(nèi)容(示例) user ID 用戶身份 0001A LAC_CID 基站位置區(qū)域碼與小區(qū)識(shí)別碼 13119-2056 TimeStamp 用戶接入時(shí)間戳 2019-10-23 17:42:09 CoverScene 當(dāng)前基站的覆蓋場(chǎng)景 道路/學(xué)校/火車(chē)站等 Lon_Lat 當(dāng)前基站經(jīng)度、緯度 (106.59767, 29.40709) 下載: 導(dǎo)出CSV
表 2 震蕩軌跡數(shù)據(jù)示例
軌跡 位置 時(shí)間 距離 (km) 切換速度 (km/h) ${D_0}$ ${L_0}$(106.607617, 29.530807) 08:19:35 / / ${D_1}$ ${L_1}$(106.602659, 29.545336) 08:20:14 1.6 147.6923 ${D_2}$ ${L_2}$(106.607617, 29.530807) 08:20:39 1.6 230.4000 下載: 導(dǎo)出CSV
表 3 基于時(shí)間窗的震蕩軌跡檢測(cè)方法
輸入:原始軌跡數(shù)據(jù)${{L} } = \left\{ { {L_1}{\rm{ } }···{\rm{ } }{L_i}{\rm{ } }{L_{i + 1} }{\rm{ } }···{\rm{ } }{L_{i + {N_w} } }{\rm{ } }···} \right\}$,軌跡序列切片中基站位置個(gè)數(shù)${N_w}$,震蕩數(shù)據(jù)最大時(shí)間閾值${T_{w\_\max }}$; 輸出:檢測(cè)到的震蕩軌跡數(shù)據(jù)${L_{\rm{osc}}}$; (1) 按順序截取原始數(shù)據(jù)${{L}}$中的前${N_w}$個(gè)位置組成序列${L_w}$; (2) 檢測(cè)${L_w}$中是否出現(xiàn)循環(huán)模式,如果出現(xiàn)則執(zhí)行步驟(3),否則序列點(diǎn)向前移1位,重新執(zhí)行步驟(1),截取后續(xù)${N_w}$個(gè)位置的序列片段; (3) 對(duì)檢測(cè)到的震蕩部分記為(${L_{\rm{beg} } }{\rm{ } }···{\rm{ } }{L_{\rm{end} } }$),判斷該部分序列的總時(shí)間是否小于${T_{w\_\max }}$,如果滿足,那么將該震蕩序列記為${L_{\rm{osc}}}$,同時(shí)序
列點(diǎn)向前移1位,返回步驟(1);如果不滿足,直接返回步驟(1),直至遍歷完${{L}}$內(nèi)所有軌跡點(diǎn)。算法結(jié)束 下載: 導(dǎo)出CSV
表 4 ST-CFSFDP聚類(lèi)算法
輸入:原始空間數(shù)據(jù)$P\left\langle {x\;y\;t\;d} \right\rangle $;截?cái)嗑嚯x${d_{\rm c}}$;截?cái)鄷r(shí)間${t_{\rm c}}$;數(shù)據(jù)點(diǎn)覆蓋場(chǎng)景的描述$d$ 輸出:該原始數(shù)據(jù)的聚類(lèi)集合${C_k}$, $k = 1,2, ··· ,n$; (1) 計(jì)算每一個(gè)數(shù)據(jù)點(diǎn)的局部時(shí)空密度${\rho _i}$; (2) 依照定義4與定義5計(jì)算每個(gè)數(shù)據(jù)點(diǎn)的高密度空間距離${\delta _i}$、高密度時(shí)間間隔${\tau _i}$; (3) 計(jì)算各個(gè)數(shù)據(jù)點(diǎn)的聚類(lèi)中心權(quán)值,將聚類(lèi)中心權(quán)值的平均值作為閾值,將大于該閾值的數(shù)據(jù)點(diǎn)放入聚類(lèi)中心候選點(diǎn)集合${C_{\rm c}}$中; (4) 合并候選點(diǎn)中覆蓋場(chǎng)景相同且空間距離小于${d_{\rm c}}$或時(shí)間間隔小于${t_{\rm c}}$的“近鄰數(shù)據(jù)點(diǎn)”,保留聚類(lèi)中心權(quán)重較高的點(diǎn); (5) 將剩余的數(shù)據(jù)點(diǎn),按照最近鄰思想分配到各個(gè)聚類(lèi)中心所代表的簇中。 算法結(jié)束 下載: 導(dǎo)出CSV
表 5 算法距離誤差對(duì)比(m)
編號(hào) 駐留點(diǎn)坐標(biāo)(Lon, Lat) CFSFDP算法的距離誤差 ST-DBSCAN算法的距離誤差 ST-CFSFDP算法的距離誤差 1 106.601230, 29.5339600 44.8 42.2 34.6 2 106.602061, 29.5343564 \ 43.5 48.3 3 106.496737, 29.6166844 48.8 35.3 37.4 4 106.496729, 29.6166840 \ \ 50.6 5 106.546322, 29.6203120 52.6 \ 46.4 下載: 導(dǎo)出CSV
-
陳鴻昶, 徐乾, 黃瑞陽(yáng), 等. 一種基于用戶軌跡的跨社交網(wǎng)絡(luò)用戶身份識(shí)別算法[J]. 電子與信息學(xué)報(bào), 2018, 40(11): 2758–2764. doi: 10.11999/JEIT180130CHEN Hongchang, XU Qian, HUANG Ruiyang, et al. User identification across social networks based on user trajectory[J]. Journal of Electronics &Information Technology, 2018, 40(11): 2758–2764. doi: 10.11999/JEIT180130 彭大芹, 羅裕楓, 江德潮, 等. 基于移動(dòng)信令數(shù)據(jù)的城市熱點(diǎn)識(shí)別方法[J]. 重慶郵電大學(xué)學(xué)報(bào): 自然科學(xué)版, 2019, 31(1): 95–102. doi: 10.3979/j.issn.1673-825X.2019.01.013PENG Daqin, LUO Yufeng, JIANG Dechao, et al. Urban hotspots identification method based on mobile signaling data[J]. Journal of Chongqing University of Posts and Telecommunications:Natural Science Edition, 2019, 31(1): 95–102. doi: 10.3979/j.issn.1673-825X.2019.01.013 羅孝羚, 蔣陽(yáng)升. 基于出租車(chē)運(yùn)營(yíng)數(shù)據(jù)和POI數(shù)據(jù)的出行目的識(shí)別[J]. 交通運(yùn)輸系統(tǒng)工程與信息, 2018, 18(5): 60–66. doi: 10.16097/j.cnki.1009-6744.2018.05.010LUO Xiaoling and JIANG Yangsheng. Trip-purpose-identification based on taxi operating data and POI data[J]. Journal of Transportation Systems Engineering and Information Technology, 2018, 18(5): 60–66. doi: 10.16097/j.cnki.1009-6744.2018.05.010 鮑冠文, 劉小明, 蔣源, 等. 基于改進(jìn)DBSCAN算法的出租車(chē)載客熱點(diǎn)區(qū)域挖掘研究[J]. 交通工程, 2019, 19(4): 62–69. doi: 10.13986/j.cnki.jote.2019.04.010BAO Guanwen, LIU Xiaoming, JIANG Yuan, et al. Research on mining taxi pick-up hotspots area[J]. Journal of Transportation Engineering, 2019, 19(4): 62–69. doi: 10.13986/j.cnki.jote.2019.04.010 李巖, 陳紅, 孫曉科, 等. 基于熱點(diǎn)探測(cè)模型的城市居民出行特征分析[J]. 交通信息與安全, 2019, 37(1): 128–136. doi: 10.3963/j.issn.1674-4861.2019.01.017LI Yan, CHEN Hong, SUN Xiaoke, et al. An analysis of travel characteristics of urban residents based on hot spot detection model[J]. Journal of Transport Information and Safety, 2019, 37(1): 128–136. doi: 10.3963/j.issn.1674-4861.2019.01.017 張海霞, 李腆腆, 李東陽(yáng), 等. 基于車(chē)輛行為分析的智能車(chē)聯(lián)網(wǎng)關(guān)鍵技術(shù)研究[J]. 電子與信息學(xué)報(bào), 2020, 42(1): 36–49. doi: 10.11999/JEIT190820ZHANG Haixia, LI Tiantian, LI Dongyang, et al. Research on vehicle behavior analysis based technologies for intelligent vehicular networks[J]. Journal of Electronics &Information Technology, 2020, 42(1): 36–49. doi: 10.11999/JEIT190820 李浩, 王旭智, 萬(wàn)旺根. 基于位置數(shù)據(jù)的居民出行時(shí)空特征研究——以上海市為例[J]. 電子測(cè)量技術(shù), 2019, 42(19): 25–30. doi: 10.19651/j.cnki.emt.1902923LI Hao, WANG Xuzhi, and WAN Wanggen. Research on temporal and spatial characteristics of residents’ travel based on location data—A case of Shanghai[J]. Electronic Measurement Technology, 2019, 42(19): 25–30. doi: 10.19651/j.cnki.emt.1902923 周洋, 楊超. 基于時(shí)空聚類(lèi)算法的軌跡停駐點(diǎn)識(shí)別研究[J]. 交通運(yùn)輸系統(tǒng)工程與信息, 2018, 18(4): 88–95. doi: 10.16097/j.cnki.1009-6744.2018.04.014ZHOU Yang and YANG Chao. Anchors identification in trajectory based on temporospatial clustering algorithm[J]. Journal of Transportation Systems Engineering and Information Technology, 2018, 18(4): 88–95. doi: 10.16097/j.cnki.1009-6744.2018.04.014 方琪, 王山東, 于大超, 等. 基于出租車(chē)軌跡的居民出行特征分析[J]. 地理空間信息, 2019, 17(5): 128–130. doi: 10.3969/j.issn.1672-4623.2019.05.034FANG Qi, WANG Shandong, YU Dachao, et al. Analysis of resident trip characteristics based on taxi trajectory[J]. Geospatial Information, 2019, 17(5): 128–130. doi: 10.3969/j.issn.1672-4623.2019.05.034 BIRANT D and KUT A. ST-DBSCAN: An algorithm for clustering spatial–temporal data[J]. Data & Knowledge Engineering, 2007, 60(1): 208–221. doi: 10.1016/j.datak.2006.01.013 RODRIGUEZ A and LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492–1496. doi: 10.1126/science.1242072 WANG Feilong and CHEN C. On data processing required to derive mobility patterns from passively-generated mobile phone data[J]. Transportation Research Part C: Emerging Technologies, 2018, 87: 58–74. doi: 10.1016/j.trc.2017.12.003 CHEN C, BIAN Ling, and MA Jingtao. From traces to trajectories: How well can we guess activity locations from mobile phone traces?[J]. Transportation Research Part C: Emerging Technologies, 2014, 46: 326–337. doi: 10.1016/j.trc.2014.07.001 HARD E, CHIGOY B, SONGCHITRUKSA P, et al. Synopsis of new methods and technologies to collect Origin-Destination (O-D) data[R]. FHWA-HEP-16-083, 2016. LEE J K and HOU J C. Modeling steady-state and transient behaviors of user mobility: Formulation, analysis, and application[C]. The 7th ACM International Symposium on Mobile Ad Hoc Networking and Computing, Florence, Italy, 2006: 85–96. -