適用於即時高畫質HEVC之快速畫面間移動估測演算法與設計

標題:	適用於即時高畫質HEVC之快速畫面間移動估測演算法與設計 Fast motion estimation algorithm and design for real time QFHD high efficiency video coding
作者:	周效瑜 Jou, Shiaw-Yu 張添烜 Chang, Tian-Sheuan 電子工程學系電子研究所
關鍵字:	影像編碼;移動估測;HEVC;Motion estimation
公開日期:	2013
摘要:	在新一代的影像編碼HEVC之中，畫面間預測採用了遞迴式的編碼結構、支援更大的預測單元以及高度相依性的預測方法，這些改變不但使得即時影像編碼的難度大幅提升，同時許多傳統影像編碼發展出來的硬體設計方法也不再適用。為了滿足即時編碼的需求，這篇論文提出了一套適用於硬體設計的畫面間移動估測快速演算法，並規劃了不同於傳統架構的硬體設計，以解決高度相依性和記憶體頻寬的問題。演算法的部分，採用混合式分數移動估測快速演算法，針對PU64x64和PU32x32使用簡單的無內插式估算，PU16x16使用精確內插式估算並省略PU8x8的分數移動估測，以達到計算量節省與效能平衡。整數移動估測的部分則採用修改式PEPZS做法，比先前設計降低54.9%的搜尋點數量。硬體方面，傳統兩級管線化的移動估測設計無法適用於HEVC的高度遞迴相依性結構，因此我們規劃了一個聯結式架構，將整數、分數移動估測和率失真優化(RDO)合併於同一管線級，並利用移動預測候選解決高相依度預測的問題。同時為了提高快速演算法的資料重複利用和硬體的簡單性，我們也設計了16列快取記憶體特性的暫存器，有效的降低重複資料讀取所造成的頻寬增加。從結果可以得到我們針對高畫質設計的演算法和HM 9.0 對照BD-rate的表現，在YUV成分分別降低了4.6%，4.5%，及4.5%。我們設計的硬體若以TSMC 90nm的技術合成，需要 1090.6K邏輯閘數目及 24.64K位元組的晶片內建記憶體，在工作頻率為270MHz的情況下，可以支援每秒60張4Kx2K的畫面大小的影片。 The recursive coding structure, larger prediction unit size and high dependency prediction method in the latest HEVC coding standards brings better coding efficiency but also significant data dependency, complexity and memory bandwidth, especially for real time HD or larger video demand. To solve these problems, this thesis proposes fast hardware friendly inter prediction algorithms and its architecture. The fast inter prediction algorithm adopts the modified PEPZS IME algorithm to save 54.9% search points than the previous work. The FME part makes a tradeoff between complexity and performance by a PU size dependent FME that applies interpolation free FME to the low motion part, PU64x64 and PU32x32, full search for texture part, PU16x16, and skips PU8x8 due to less impact in the larger video size. The whole algorithm is combined with a hardware friendly cascade structure that cascades IME, FME and RDO into the single hardware stage with the recursive scheduling to solve the high dependency problem. Further dependency by the motion vector predction (MVP) is solved by early MVP candidate approach for selected blocks. The corresponding bandwidth problem is reduced by a 16 banks double z-scan indexed cache based buffer to simplify addressing and maximize data reuse. The simulation result compared to HEVC reference software HM 9.0rc-1 illustrates the BD-rate performance drop by 4.6%, 4.5% and 4.5% for Y, U, and V component separately. The proposed design cost 1090.6 logic gates and 24.64 Kbytes of on-chip memory with TSMC 90nm CMOS process. It could support 4Kx2K 60 fps video at the 270MHz operation frequency.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT070050203 http://hdl.handle.net/11536/73315
顯示於類別：	畢業論文