標題: 以程式軌跡支援開發X86指令集處理器前端並行方式
Exploiting X86 Front-end Parallelism with Program Trace Support
作者: 邱日清
Jih-ching Chiu
鍾崇斌
Chung-Ping Chung
資訊科學與工程研究所
關鍵字: 前端單元;X86超純量處理器;指令抓取;指令預取;不定長度的指令格式;指令位址儲列;指令流緩衝器;指令識別器;Front-end;X86 Superscalar processor;Instruction Fetch;Instruction Prefetching;Variable Length Instruction Format;Instruction address queue;Instruction Buffer;Instruction Identifier
公開日期: 2001
摘要: 對一個超純量處理器而言,所謂的前端單元含指令流緩衝器及指令擷取單元,是達成高指令頻寬的關鍵元件。但不定長度的指令格式和複雜的定址系統,使得X86超純量處理器在一個指令週期難以抓取多個指令。為達成高指令抓取頻寬的目的,我們深入思考著程式執行軌跡對於維持指令流的順暢及擴大指令抓取程度的影響。為建構高指令抓取頻寬的前端,在此論文中討論四個研究項目 : 1. 增加指令快取記憶體的命中率; 2. 在一個指令週期辨識並抓取多個指令; 3. 跨越基本指令區段的指令抓取; 4. 支援多個指令的位址儲存,以保留機器狀態,供意外狀態發生時的掌握。 對於第一個項目,增加指令快取記憶體的命中率,我們發展出一個新的借用指令分支預測支援快取記憶體預取的方式,稱之為BIB預取。由模擬的結果知道,BIB預取較傳統的預取方式好7% ,較其他以預測表為基礎的預取方式好17%. 當BTB 的設計技術愈來愈成熟精確度愈來愈高時,BIB預取則將會愈來愈有效率。對於第二個項目,在一個指令週期可辨識抓取多個指令。我們提出了以指令識別器預測指令長度並且將指令的指標以超純量群指示器的方式儲存起來。應用此方法突破了高指令數程度(>3)抓取的困難。依據模擬的結果,指令識別器之設計以64個表列最能達到效能與花費上的平衡選擇。對於第三個項目,跨越基本指令區段的指令抓取。我們結合分支指令預測單元來支援程式執行的軌跡資訊,以增進指令流緩衝器的效能。依據模擬的結果,由可跨越兩個指令區間的指令緩衝區中抓取指令,其指令抓取程度,平均最大可達8.42 個 X86指令程度。並且在效能與花費上的平衡選擇下,建議此跨越兩個指令區間的指令緩衝區,由兩個64-byte的指令表列來組合。與當下的指令緩衝區設計比較起來,此跨越兩個指令區間的指令緩衝區的效能優於他們達1.9 倍。對於第四個項目,支援多個指令的位址儲存,以保留機器狀態,供意外狀態發生時的掌握。我們設計了一個指令位址儲列,經由評估而定出此儲列的大小,以提供足夠掌握指令位址而不影響後端執行的效率的儲存空間大小。此指令位址儲列的設計乃考慮了兩個存在於 CISC 中的困擾因素,即不定長度的指令格式和複雜的定址系統,在指令取入程度為 5的 X86超純量處理器設計中,這項設計將節省1/3儲存空間的硬體浪費並且僅需花費近乎等量的時間延遲。 在完成本論文中決定性的上述項目研究後,一個高效率X86超存量處理器的前端則被實現了。
The front-end units, the instruction stream buffer and the fetcher, are the key elements for achieving high instruction bandwidth. However, in x86 superscalar processors, the variable-length instructions and the complex addressing system make fetching multiple instructions in a cycle difficult. To approach high instruction fetch bandwidth, keeping the streaming smooth and expanding the x86 instruction fetch degree are deeply considered with the relations of the program-execution trace. To build a high superscalar degree front-end to achieve this goal, four topics are studied in this dissertation: 1. Increasing fetch bandwidth at the front-end entrant; 2. Identifying multiple instructions in one clock cycle; 3. Fetching super basic block instructions; 4. Storing each instruction address for keeping processor states in high degree x86 instruction-fetched processors. In the first topic, increasing fetch bandwidth at the front-end entrant, we develop a new instruction prefetching method in which the prefetch is directed by the prediction on branches, called the branch instruction based (BIB) prefetching. Simulation results show that this design outperforms the traditional sequential prefetching by 7% and other prediction table based prefetching methods by 17% on average with the same BTB size. In the second topic, identifying multiple instructions in one clock cycle, we propose to use Instruction Identifier to predict instruction lengths and store the instruction pointers as superscalar instruction group indicators. Simulation results suggest that the Instruction Identifier with a 64-entry table is a good performance/cost choice. In the third topic, fetching super basic block instructions, we propose a design to improve instruction stream buffer performance by coupling it with the Branch Target Buffer (BTB) to support trace prediction. Compared with other existing designs, this instruction stream buffer can improve performance by 90% over current x86 processor instruction fetch rate on average. In the fourth topic, storing each instruction address for keeping processor states in high degree x86 instruction-fetched processors, we propose an instruction PC Offset Queue. Two CISC hazards in the x86 architectures have been considered in this design, which reduce by 1/3 the storage space for a degree-5 superscalar x86 processor with even smaller access latency. Having dealt with the critical topics discussed in this dissertation, an efficient front-end of a high superscalar degree x86 micro-architecture becomes practical.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT900392007
http://hdl.handle.net/11536/68420
顯示於類別:畢業論文