大規(guī)模并行高階矩量法的容錯(cuò)算法研究
doi: 10.11999/JEIT161308
國家自然科學(xué)基金(61301069),教育部新世紀(jì)優(yōu)秀人才支持計(jì)劃(NCET-13-0949),中央高校基本科研業(yè)務(wù)費(fèi)(JB160218),國家863計(jì)劃項(xiàng)目(2012AA01A308)
Fault Tolerant Algorithm of Higher-order Method of Moments
The National Natural Science Foundation of China (61301069), The Program for New Century Excellent Talents in University of China (NCET-13-0949), The Fundamental Research Funds for the Central Universities (JB160218), The National 863 Program of China (2012AA01A308)
-
摘要: 基于超級(jí)計(jì)算機(jī)的大規(guī)模并行電磁計(jì)算對(duì)于解決實(shí)際工程中的復(fù)雜電磁難題具有重要意義,但超級(jí)計(jì)算機(jī)中由節(jié)點(diǎn)故障導(dǎo)致的進(jìn)程崩潰事件的概率遠(yuǎn)遠(yuǎn)高于普通計(jì)算機(jī)。該文針對(duì)傳統(tǒng)電磁計(jì)算難以有效應(yīng)對(duì)進(jìn)程崩潰的現(xiàn)狀,提出一種高效的、適用于大規(guī)模并行高階矩量法的容錯(cuò)算法。在現(xiàn)有并行高階矩量法的基礎(chǔ)上,基于硬盤緩存和直接內(nèi)存讀取設(shè)計(jì)高效率、高可靠性的現(xiàn)場保護(hù)算法,同時(shí)設(shè)計(jì)了高效的斷點(diǎn)恢復(fù)算法。算法的有效性主要在于固定的現(xiàn)場保護(hù)點(diǎn)這一特點(diǎn),它使得算法在有故障的情況下仍然可以正常有序地進(jìn)行;而原算法每次碰到故障,則只能從頭計(jì)算。數(shù)值仿真實(shí)驗(yàn)驗(yàn)證了容錯(cuò)算法在應(yīng)對(duì)進(jìn)程崩潰事件時(shí)的有效性,大幅提高了大規(guī)模并行高階矩量法的可靠性。
-
關(guān)鍵詞:
- 超級(jí)計(jì)算機(jī) /
- 并行矩量法 /
- 容錯(cuò)算法 /
- 現(xiàn)場保護(hù) /
- 可靠性
Abstract: The large scale parallel electromagnetic computation based on the supercomputer is of great significance for solving complicate electromagnetic problems in practical engineering. However, the probability of the process crash event caused by node failure in the supercomputer is much higher than that in the regular computer. Considering the incapable action for traditional electromagnetic computation to overcome the process crash event, an efficient fault-tolerance algorithm for large scale parallel high order Method of Moments (MoM) is proposed in this paper. According to the parallel higher order method of moments algorithm available, a scene protection algorithm and a scene recovery algorithm with high efficiency and reliability are designed, based on the disk cache and direct memory access technique. The efficiency of this algorithm lies on the feature of the fixed site protection, which makes it possible for the algorithm to work normal and ordered even encountering crash failure, while the original algorithm can only restart from the beginning. The numerical simulations demonstrate the efficiency of the fault-tolerant algorithm in dealing with the process crash, which improves greatly the reliability of the large scale parallel high order MoM. -
王長清. 現(xiàn)代計(jì)算電磁學(xué)基礎(chǔ)[M]. 北京: 北京大學(xué)出版社, 2005: 116-157. HARRINGTON R F. Field Computation by Moment Methods[M]. New York: IEEE Press, 1993. WANG C. Computational Advanced Electromagnetics[M]. Beijing: Peking University Press, 2005: 116-157. 張玉, 趙勛旺, 陳巖, 等. 計(jì)算電磁學(xué)中的大規(guī)模并行矩量法[M]. 西安: 西安電子科技大學(xué)出版社, 2016: 112-171. ZHANG Y, ZHAO X, CHEN Y, et al. Massively Parallel Method of Moment in Computational Electromagnetics[M]. Xian: Xidian University Press, 2016: 112-171. 張玉, 王萌, 梁昌洪, 等. PC集群系統(tǒng)中MPI并行矩量法研究[J]. 電子與信息學(xué)報(bào), 2005, 27(4): 647-650. ZHANG Y, WANG M, LIANG C H, et al. Study of parallel MoM on PC clusters[J]. Journal of Electronics Information Technology, 2005, 27(4): 647-650. 徐曉飛, 曹祥玉, 高軍, 等. 基于矩量法的電大目標(biāo)RCS核外并行計(jì)算[J]. 電子與信息學(xué)報(bào), 2011, 33(3): 758-762. doi: 10.3724/SP.J.1146.2010.00519. XU X F, CAO X Y, GAO J, et al. Parallel out-of-core calculation of electrically large objects RCS based on MoM [J]. Journal of Electronics Information Technology, 2011, 33(3): 758-762. doi: 10.3724/SP.J.1146.2010.00519. Zhang Y and Sarkar T K. Parallel Solution of Integral Equation Based EM Problems in the Frequency Domain[M]. Hoboken, NJ: Wiley-IEEE, 2009: 107-136. doi: 10.1002/ 9780470495094. 林中朝, 陳巖, 張玉, 等. 國產(chǎn)CPU平臺(tái)中并行高階矩量法研究[J]. 西安電子科技大學(xué)學(xué)報(bào), 2015, 42(3): 43-47. doi: 10.3969/j.issn.1001-2400.2015.03.008. LIN Z, CHEN Y, ZHANG Y, et al. Study of the parallel higher-order MoM on a domestically-made CPU platform[J]. Journal of Xidian University, 2015, 42(3): 43-47. doi: 10. 3969/j.issn.1001-2400.2015.03.008. ZHANG Y, LIN Z, ZHAO X, et al. Performance of a massively parallel higher-order method of moment code using thousands of CPUs and its applications[J]. IEEE Transactions on Antennas and Propagation, 2014, 62(12): 6317-6324. doi: 10.1109/TAP.2014.2361135. 林中朝, 陳巖, 張玉, 等. 高階矩量法的超級(jí)電磁計(jì)算研究[J]. 科研信息化技術(shù)與應(yīng)用, 2015, 6(4): 20-28. doi: 10.11871/ j.issn.1674-9480.2015.04.003. LIN Z, CHEN Y, ZHANG Y, et al. Study of super electromagnetic computing for higher-order MoM[J]. e-Science Technology Application, 2015, 6(4): 20-28. doi: 10.11871/j.issn.1674-9480.2015.04.003. CHEN Y, ZHANG Y, ZHANG G, et al. Hybrid MIC/CPU parallel implementation of MoM on MIC cluster for electromagnetic problems[J]. IEICE Transactions on Electronics, 2016, 99(7): 735-743. doi: 10.1587/transele.E99. C.735. 王少剛, 關(guān)鑫璞, 王黨衛(wèi), 等. 求解電場積分方程的高階矩量法[J]. 電子與信息學(xué)報(bào), 2007, 29(9): 2265-2268. Wang S, Guan X, Wang D, et al. Solution of the electric field integral equation using higher-order method of moments[J]. Journal of Electronics Information Technology, 2007, 29(9): 2265-2268. -
計(jì)量
- 文章訪問數(shù): 1166
- HTML全文瀏覽量: 94
- PDF下載量: 206
- 被引次數(shù): 0