一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級(jí)搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問(wèn)題, 您可以本頁(yè)添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號(hào)碼
標(biāo)題
留言內(nèi)容
驗(yàn)證碼

一種面向AV1粗模式?jīng)Q策的高吞吐量硬件設(shè)計(jì)方法

盛慶華 陶澤浩 黃小芳 賴昌材 黃曉峰 殷海兵 董哲康

盛慶華, 陶澤浩, 黃小芳, 賴昌材, 黃曉峰, 殷海兵, 董哲康. 一種面向AV1粗模式?jīng)Q策的高吞吐量硬件設(shè)計(jì)方法[J]. 電子與信息學(xué)報(bào). doi: 10.11999/JEIT240823
引用本文: 盛慶華, 陶澤浩, 黃小芳, 賴昌材, 黃曉峰, 殷海兵, 董哲康. 一種面向AV1粗模式?jīng)Q策的高吞吐量硬件設(shè)計(jì)方法[J]. 電子與信息學(xué)報(bào). doi: 10.11999/JEIT240823
SHENG Qinghua, TAO Zehao, HUANG Xiaofang, LAI Changcai, HUANG Xiaofeng, YIN Haibin, DONG Zhekang. A High-Throughput Hardware Design for AV1 Rough Mode Decision[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240823
Citation: SHENG Qinghua, TAO Zehao, HUANG Xiaofang, LAI Changcai, HUANG Xiaofeng, YIN Haibin, DONG Zhekang. A High-Throughput Hardware Design for AV1 Rough Mode Decision[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240823

一種面向AV1粗模式?jīng)Q策的高吞吐量硬件設(shè)計(jì)方法

doi: 10.11999/JEIT240823
基金項(xiàng)目: 國(guó)家重點(diǎn)研發(fā)計(jì)劃(2023YFB4502804)
詳細(xì)信息
    作者簡(jiǎn)介:

    盛慶華:男,副教授,研究方向?yàn)橐曨l編碼、FPGA硬件加速、電子系統(tǒng)集成等

    陶澤浩:男,碩士生,研究方向?yàn)橐曨l編解碼、FPGA硬件加速等

    黃小芳:女,講師,研究方向?yàn)橐曨l編碼、嵌入式應(yīng)用等

    賴昌材:男,高級(jí)工程師,研究方向?yàn)閳D像視頻壓縮、智能處理及其軟硬件加速實(shí)現(xiàn)等

    黃曉峰:男,副教授,研究方向?yàn)橐曨l編解碼與芯片架構(gòu)設(shè)計(jì)等

    殷海兵:男,教授,研究方向?yàn)閿?shù)字視頻編解碼、多媒體信號(hào)處理、芯片結(jié)構(gòu)設(shè)計(jì)驗(yàn)證等

    董哲康:男,副教授,研究方向?yàn)閼涀杵骷皯涀柘到y(tǒng)、人工神經(jīng)網(wǎng)絡(luò)等

    通訊作者:

    黃小芳 20221016@hdu.edu.cn

  • 中圖分類號(hào): TN919.8

A High-Throughput Hardware Design for AV1 Rough Mode Decision

Funds: The National Key R&D Program of China (2023YFB4502804)
  • 摘要: 隨著視頻編碼標(biāo)準(zhǔn)的不斷更新迭代,開(kāi)放媒體聯(lián)盟(AOM)發(fā)布最新視頻編碼標(biāo)準(zhǔn)開(kāi)放媒體視頻編碼標(biāo)準(zhǔn)(AV1)。其中,幀內(nèi)編碼技術(shù)采用更加豐富的預(yù)測(cè)模式來(lái)提高預(yù)測(cè)效率,預(yù)測(cè)種類從VP9中的10種擴(kuò)展至61種。為了應(yīng)對(duì)預(yù)測(cè)種類增加的變化并提高硬件的處理吞吐能力,該文提出基于全流水線結(jié)構(gòu)的AV1粗模式?jīng)Q策硬件架構(gòu)設(shè)計(jì)。在算法層面,以4×4塊為最小處理單元,按照Z(yǔ)順序?qū)?4×64編碼樹(shù)單元(CTU)中不同尺寸的預(yù)測(cè)單元(PUs)進(jìn)行粗模式?jīng)Q策,同時(shí)采用基于1:1 PU的代價(jià)累加近似方法來(lái)完成1:2, 1:4, 2:1和4:1 PU的代價(jià)計(jì)算,以減少計(jì)算復(fù)雜度;在硬件層面,設(shè)計(jì)兼容4×4至32×32等多尺寸PU的粗模式?jīng)Q策電路,取代為不同尺寸PU單獨(dú)設(shè)計(jì)電路的方法,有效減少邏輯資源的閑置。實(shí)驗(yàn)結(jié)果表明,在全幀內(nèi)(AI)配置下,提出的改進(jìn)算法相較于AV1標(biāo)準(zhǔn)算法平均節(jié)省了45.78%的時(shí)間,提高了1.94% BD-Rate。同時(shí),提出的硬件架構(gòu)設(shè)計(jì)能夠在1057個(gè)時(shí)鐘周期內(nèi)完成64×64 CTU的粗模式?jīng)Q策,使用Synopsys公司的Design Compiler 2016工具及UMC 28 nm工藝庫(kù)對(duì)硬件設(shè)計(jì)綜合得到,該設(shè)計(jì)能夠在432.7 MHz工作頻率下實(shí)時(shí)處理8k@50.6fps的視頻。
  • 圖  1  RMD硬件總體架構(gòu)設(shè)計(jì)

    圖  2  硬件實(shí)現(xiàn)RMD流程圖

    圖  3  整體架構(gòu)時(shí)空?qǐng)D

    圖  4  4×4 PU參考像素填充情況

    圖  5  輸入順序示意圖

    圖  6  方向性模式硬件設(shè)計(jì)

    圖  7  DC模式硬件設(shè)計(jì)

    圖  8  平滑模式硬件設(shè)計(jì)

    圖  9  平滑模式權(quán)重PMCM硬件設(shè)計(jì)

    圖  10  Paeth模式硬件設(shè)計(jì)

    圖  11  4×4 PU的SATD代價(jià)計(jì)算硬件設(shè)計(jì)

    圖  12  長(zhǎng)度為8的亂序列雙調(diào)排序示例

    圖  13  輸入序列長(zhǎng)度為8的雙調(diào)排序硬件設(shè)計(jì)

    表  1  改進(jìn)算法與AV1標(biāo)準(zhǔn)算法的性能比較(%)

    測(cè)試序列BD-RateTS
    A1(UHD 4K)2.2149.2
    A2(UHD 4K)1.7746.4
    B(1080P)1.9348.1
    C(480P)2.2338.4
    E(720P)1.5646.8
    平均結(jié)果1.9445.78
    下載: 導(dǎo)出CSV

    表  2  本文改進(jìn)算法與現(xiàn)有工作比較(%)

    文獻(xiàn) BD-Rate TS
    [33] 1.28 29.80
    [34] 7.41 50.19
    [35] 0.60 15.36
    本文 1.94 45.78
    下載: 導(dǎo)出CSV

    表  3  基于ASIC實(shí)現(xiàn)的RMD相關(guān)硬件設(shè)計(jì)工作對(duì)比

    對(duì)比指標(biāo) 文獻(xiàn)[36] 文獻(xiàn)[37] 文獻(xiàn)[38] 文獻(xiàn)[39] 本文
    工藝 TSMC 40 nm TSMC 40 nm TSMC 40 nm TSMC 40 nm UMC 28 nm
    門電路(Kgates) 455.8 821.8 584.8 128.5 1011.3
    工作頻率(MHz) 1,296 1,902 1,296 648 432.7
    時(shí)鐘周期(Cycle) 7104 7104 7104 7104 1057
    功耗(mW) 40.9 1613.3 4110.0 65.5 1891.6
    吞吐量 4k@60fps 4k@60fps 4k@60fps 4k@30fps 8k@50.6fps
    吞吐量/面積(px/gate) 1091.85 605.55 850.93 1936.44 1660.03
    非方向性預(yù)測(cè) × × ×
    方向性預(yù)測(cè) ×
    模式?jīng)Q策 × × × ×
    下載: 導(dǎo)出CSV
  • [1] BENDER I, BORGES A, AGOSTINI L, et al. Complexity and compression efficiency analysis of libaom AV1 video codec[J]. Journal of Real-Time Image Processing, 2023, 20(3): 50. doi: 10.1007/s11554-023-01308-5.
    [2] REN Huiwen, WANG Shanshe, MA Siwei, et al. SVT-AVS3: An open-source high-performance AVS3 encoder with scalable video technology[J]. IEEE Transactions on Multimedia, 2024, 26: 3291–3301. doi: 10.1109/TMM.2023.3309549.
    [3] LEE M, SONG H J, PARK J, et al. Overview of versatile video coding (H. 266/VVC) and its coding performance analysis[J]. IEIE Transactions on Smart Processing & Computing, 2023, 12(2): 122–154. doi: 10.5573/IEIESPC.2023.12.2.122.
    [4] MUKHERJEE D, HAN Jingning, BANKOSKI J, et al. A technical overview of VP9—the latest open-source video codec[J]. SMPTE Motion Imaging Journal, 2015, 124(1): 44–54. doi: 10.5594/j18499.
    [5] 林浩, 饒豐. AV1視頻編碼標(biāo)準(zhǔn)在我國(guó)的發(fā)展趨勢(shì)分析[J]. 廣播電視信息, 2023, 30(2): 62–64. doi: 10.16045/j.cnki.rti.2023.02.022.

    LIN Hao and RAO Feng. Analysis on the development trend of AV1 video coding standard in China[J]. Radio & Television Information, 2023, 30(2): 62–64. doi: 10.16045/j.cnki.rti.2023.02.022.
    [6] 杜紅青. 下一代視頻編碼高效幀內(nèi)預(yù)測(cè)算法研究[D]. [碩士論文], 西安電子科技大學(xué), 2023. doi: 10.27389/d.cnki.gxadu.2023.001917.

    DU Hongqing. Research on high efficiency intra prediction algorithm for next generation video coding[D]. [Master dissertation], Xidian University, 2023. doi: 10.27389/d.cnki.gxadu.2023.001917.
    [7] GROIS D, GILADI A, CHOI K, et al. Performance comparison of emerging EVC and VVC video coding standards with HEVC and AV1[J]. SMPTE Motion Imaging Journal, 2021, 130(4): 1–12. doi: 10.5594/JMI.2021.3065442.
    [8] UHRINA M, SEVCIK L, BIENIK J, et al. Performance comparison of VVC, AV1, HEVC, and AVC for high resolutions[J]. Electronics, 2024, 13(5): 953. doi: 10.3390/electronics13050953.
    [9] 劉暢, 賈克斌, 劉鵬宇. 基于多分支網(wǎng)絡(luò)的深度圖幀內(nèi)編碼單元快速劃分算法[J]. 電子與信息學(xué)報(bào), 2022, 44(12): 4357–4366. doi: 10.11999/JEIT211010.

    LIU Chang, JIA Kebin, and LIU Pengyu. Fast partition algorithm in depth map intra-frame coding unit based on multi-branch network[J]. Journal of Electronics & Information Technology, 2022, 44(12): 4357–4366. doi: 10.11999/JEIT211010.
    [10] WANG Yizhao, ZHANG Chaobo, and SUN Songlin. Intra prediction fast algorithm in AVS3 based on image texture characteristics[C]. 2021 20th International Symposium on Communications and Information Technologies, Tottori, Japan, 2021: 6–10. doi: 10.1109/ISCIT52804.2021.9590620.
    [11] ZHANG Yongfei, LI Zhe, and LI Bo, et al. Gradient-based fast decision for intra prediction in HEVC[C]. 2012 Visual Communications and Image Processing, San Diego, USA, 2012: 1–6. doi: 10.1109/VCIP.2012.6410739.
    [12] ZHU Linwei, ZHANG Yun, Li Na, et al. Deep learning-based intra mode derivation for versatile video coding[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19(2s): 96. doi: 10.1145/356369.
    [13] DUARTE A, ZATT B, CORREA G, et al. Fast intra mode decision using machine learning for the versatile video coding standard[C]. 2023 IEEE International Symposium on Circuits and Systems, Monterey, USA, 2023: 1–5. doi: 10.1109/ISCAS46773.2023.10181769.
    [14] STORCH I, ROMA N, PALOMINO D, et al. GPU acceleration of MIP intra prediction in VVC[C]. 2023 31st European Signal Processing Conference, Helsinki, Finland, 2023: 600–604. doi: 10.23919/EUSIPCO58844.2023.10290037.
    [15] HAN Xu, WANG Shanshe, MA Siwei, et al. Optimization of motion compensation based on GPU and CPU for VVC decoding[C]. 2020 IEEE International Conference on Image Processing, Abu Dhabi, United Arab Emirates, 2020: 1196–1200. doi: 10.1109/ICIP40778.2020.9190708.
    [16] CORRêA M, WASKOW B, ZATT B, et al. High throughput hardware design for AV1 Paeth and smooth intra modes[C]. 2019 IEEE International Symposium on Circuits and Systems, Sapporo, Japan, 2019: 1–5. doi: 10.1109/ISCAS.2019.8702258.
    [17] CAI Zhanyuan and GAO Wei. Efficient fast algorithm and parallel hardware architecture for intra prediction of AVS3[C]. 2021 IEEE International Symposium on Circuits and Systems, Daegu, South Korea, 2021: 1–5. doi: 10.1109/ISCAS51556.2021.9401121.
    [18] HUANG Xiaofeng, JIA Huizhu, CAI Binbin, et al. Fast algorithms and VLSI architecture design for HEVC intra-mode decision[J]. Journal of Real-Time Image Processing, 2016, 12(2): 285–302. doi: 10.1007/s11554-015-0549-8.
    [19] CORRêA M, WASKOW B, GOEBEL J, et al. A high throughput hardware architecture targeting the AV1 Paeth intra predictor[C]. 2019 IEEE 10th Latin American Symposium on Circuits & System, Armenia, Colombia, 2019: 93–96. doi: 10.1109/LASCAS.2019.8667544.
    [20] 劉鵬宇, 張悅, 賈克斌, 等. 基于局部亮度直方圖的自適應(yīng)視頻幀類型決策算法[J]. 電子與信息學(xué)報(bào), 2023, 45(1): 300–307. doi: 10.11999/JEIT211199.

    LIU Pengyu, ZHANG Yue, JIA Kebin, et al. Adaptive video frame type decision algorithm based on local luminance histogram[J]. Journal of Electronics & Information Technology, 2023, 45(1): 300–307. doi: 10.11999/JEIT211199.
    [21] SU Weitong, XIANG Guoqing, HUANG Xiaofeng, et al. Fast algorithm and VLSI architecture design of rough mode decision for AVS3[C]. 2023 IEEE International Conference on Consumer Electronic, Las Vegas, USA, 2023: 1–4. doi: 10.1109/ICCE56470.2023.10043565.
    [22] 齊美彬, 陳秀麗, 楊艷芳, 等. 高效率視頻編碼幀內(nèi)預(yù)測(cè)編碼單元?jiǎng)澐挚焖偎惴╗J]. 電子與信息學(xué)報(bào), 2014, 36(7): 1699–1705. doi: 10.3724/SP.J.1146.2013.01148.

    QI Meibin, CHEN Xiuli, and YANG Yanfang. Fast coding unit splitting algorithm for high efficiency video coding intra prediction[J]. Journal of Electronics & Information Technology, 2014, 36(7): 1699–1705. doi: 10.3724/SP.J.1146.2013.01148.
    [23] CHEN Yue, MUKHERJEE D, HAN Jingning, et al. An overview of coding tools in AV1: The first video codec from the alliance for open media[J]. APSIPA Transactions on Signal and Information Processing, 2020, 9(1): e6. doi: 10.1017/ATSIP.2020.2.
    [24] HAKKENNES E A and VASSILIADIS S. Hardwired Paeth codec for portable network graphics (PNG)[C]. Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium, Milan, Italy, 1999: 318–325. doi: 10.1109/EURMIC.1999.794796.
    [25] PAETH A W. Image file compression made easy[M]. ARVO J. Graphics Gems II. Amsterdam: Elsevier, 1991: 93–100. doi: 10.1016/B978-0-08-050754-5.50029-3.
    [26] STORCH I, ROMA N, PALOMINO D, et al. Alternative reference samples to improve coding efficiency for parallel intra prediction solutions[C]. 2024 IEEE 15th Latin America Symposium on Circuits and Systems, Punta del Este, Uruguay, 2024: 1–5. doi: 10.1109/LASCAS60203.2024.10506142.
    [27] KUMM M. Multiple Constant Multiplication Optimizations for Field Programmable Gate Arrays[M]. Wiesbaden: Springer, 2016. doi: 10.1007/978-3-658-13323-8.
    [28] LIACHA A, OUDJIDA A K, BAKIRI M, et al. Radix-2r recoding with common subexpression elimination for multiple constant multiplication[J]. IET Circuits, Devices & Systems, 2020, 14(7): 990–994. doi: 10.1049/iet-cds.2020.0213.
    [29] MOHAMED H, ELLIETHY A, ABDELAZIZ A, et al. Real-time motion estimation based video steganography with preserved consistency and local optimality[J]. Multimedia Tools and Applications, 2024: 1–24. doi: 10.1007/s11042-024-18651-9.
    [30] CHEN Shushi, HUANG Leilei, LIU Jiahao, et al. An error-surface-based fractional motion estimation algorithm and hardware implementation for VVC[C]. 2023 IEEE International Symposium on Circuits and Systems, Monterey, USA, 2023: 1–5. doi: 10.1109/ISCAS46773.2023.10182170.
    [31] YANG Mouzhi, ZHANG Peng, FANG Jianbin, et al. thSORT: An efficient parallel sorting algorithm on multi-core DSPs[J]. CCF Transactions on High Performance Computing, 2024, 6(5): 503–518. doi: 10.1007/s42514-023-00175-7.
    [32] ESMAILI-DOKHT P, GUIOT M, RADOJKOVI? P, et al. O(n) key–value sort with active compute memory[J]. IEEE Transactions on Computers, 2024, 73(5): 1341–1356. doi: 10.1109/TC.2024.3371773.
    [33] CORRêA M M. Heuristic-based algorithms and hardware designs for fast intra-picture prediction in AV1 video coding[D]. [Ph. D. dissertation], Universidade Federal de Pelotas, 2023.
    [34] ROSA P, PALOMINO D, PORTO M, et al. GM-RF: An AV1 intra-frame fast decision based on random forest[C]. 2022 IEEE International Conference on Image Processing, Bordeaux, France, 2022: 3556–3560. doi: 10.1109/ICIP46576.2022.9897488.
    [35] CORRêA M, ROMA N, PALOMINO D, et al. Mode-adaptive subsampling of SAD/SSE operations for intra prediction cost reduction[C]. 2022 IEEE International Symposium on Circuits and Systems, Austin, USA, 2022: 1808–1812. doi: 10.1109/ISCAS48785.2022.9937507.
    [36] CORRěA M, NETO L, PALOMINO D, et al. ASIC solution for the directional intra prediction of the AV1 encoder targeting UHD 4K videos[C]. 2020 IEEE International Symposium on Circuits and Systems, Seville, Spain, 2020: 1–5. doi: 10.1109/ISCAS45731.2020.9180526.
    [37] NETO L, CORRêA M, PALOMINO D, et al. Directional intra frame prediction architecture with edge filter and upsampling for AV1 video coding[C]. 2020 33rd Symposium on Integrated Circuits and Systems Design, Campinas, Brazil, 2020: 1–6. doi: 10.1109/SBCCI50935.2020.9189902.
    [38] NETO L, CORREA M, PALOMINO D, et al. Exploring operation sharing in directional intra frame prediction of AV1 video coding[C]. 2021 IEEE 12th Latin America Symposium on Circuits and System, Arequipa, Peru, 2021: 1–4. doi: 10.1109/LASCAS51355.2021.9459136.
    [39] CORRêA M M, WASKOW B H, GOEBEL J W, et al. A high-throughput hardware architecture for AV1 non-directional intra modes[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2020, 67(5): 1481–1494. doi: 10.1109/TCSI.2020.2973031.
  • 加載中
圖(13) / 表(3)
計(jì)量
  • 文章訪問(wèn)數(shù):  102
  • HTML全文瀏覽量:  25
  • PDF下載量:  8
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2024-09-27
  • 修回日期:  2025-01-02
  • 網(wǎng)絡(luò)出版日期:  2025-01-09

目錄

    /

    返回文章
    返回