三浦/Paper Reading/Paper02
をテンプレートにして作成
[
トップ
] [
新規
|
一覧
|
単語検索
|
最終更新
|
ヘルプ
|
ログイン
]
開始行:
[[三浦/Paper Reading]]
last update:May 11, 2009
A comparison of three FPGA optimized NoC architectures
Date:April 2009
#contents
*Summary [#t407777a]
**Abstract [#qb11ee95]
-NoC has has seen as a possible solution to the on-chip interconnection problem.
-In this paper, 3 different network types optimized for FPGA architecture are described; a packet switched, a circuit switched and aminimalistic network without arbitration.
**I. Introduction [#d97dfcfe]
-The problem on how to connect components on chip efficientry.
-The traditional approach that use time multiplexed buses does't scale very well.
-The solution has been to add extra buses to increase the level of parallelism.
--If even more performance is requires, crossbars can be used to further increase the number of simultaneous transaction.
--A crossbar on the other hand will get prohibitively expensive as the number of ports increases.
-It is desirable to find a reusable alternative to a traditional bus or crossbar based solution with high performance and low development and verification cost.
#br
-NoC approach has been seen as such a solution.
-A typical NoC design consists of many small switches containing a small crossbar, control logic and varying amounts of data buffers.
-The underlying idea is to build a network of small switches with a limited number of ports so that the size of the crossbar in these switches can be kept small.
**II. Background [#ca3ea5c9]
-There are number of different approaches to NoCs.
-The most important design decision is to decide how connections are handled.
--In circut switched network, a connection has to be setup between the sender and receiver before any data can be transferred reliably.
--In packet switched network, a packet can be buffered in the network itself and the sender may be able to inject a complete packet into the network before the sender has received the first flit.
-Another important design decision is how the routing decisions are made.
--If source routing is used the sender specifies the entire path the packet will take through the network.
--If distributed routing is used, the switches will decide which path the packet should take.
--The advantage of source routing is that there is no need for a routing table in the switches.
--The advantage of distributed routing is that there is no need to reserve space in the packet for source routing data.
#br
-The topology is also important.
-The most used topology in NoC publications is a 2D mesh network.
--Another topology; ring, 2D toruses and 3D toruses.
#br
-Another important decision is how to avoid deadlocks.
--Deadlocks may occur if dependencies exist in the network so that two trasactions are waiting for resources held by the other resource.
--If there are no circular dependencies in the network a deadlock will not occur.
**III. Implementation [#r5361d80]
A. Packet Switched NoC
-The packet switched NoC is based on the fire and forget principle.
--As soon as a flit has left the sending endpoint and entered the network, the sender does not have to take any further action for this flit.
--The network guarantees by design that the flit will eventually be deliverd to its destination endpoint.
-The result of the route look-up indecates which output node this packet should be sent to using one hot coding.
-To reduce the critical path of the switch, the route look-up is actually performed for the path to take in the next switch while the path to take in the current switch is one hot coded on the SEL_* signals.
-the empty check block makes sure that no_SELsignal is sent to any arbiter if the shift register does not contain any data.
--By doing this, the arbiter will be simplified as compared to having separate direction and data available signals.
--It is cheaper to identify the case where only one input port wants to send a packet to the output port and send the packet immediately without any arbitration delay.
-The block which generates the read enable signal to the input FIFO has to consider a large number of signals.
--It is crucial to implement that block efficiently and place it so the routing delay is minimized.
-The READY signal is adjusted for pipeline latency so that the sender has a chance to stop sending before the FIFO is full.
-Each output port in a four port switch can only select from one of three input ports since there should be no need to route a packet back to where it came from in most topologies.
--If more than one input port needs to send a packet to the same output port, an arbiter in the output port uses round robin to select the port that may send.
--In our four port NoC switch, the output port is essentially a 3-to-1 mux controlled by the arbiter (or a 4-to-1 mux in the case of a five port switch).
-The latency of the switch when the input FIFO is empty and the output port is available is 3 clock cycles.
--If more than one input port has data for a certain output port, the latency is increased to 4 clock cycles due to an arbiter delay of one clock cycle for the packet that wins the arbitration.
-There are two critical paths in this switch.
--One path is caused by the read enable signal that is sent to the input FIFO.
--The other is from the FIFO to the route look-up due to the slow output of the SRL16 elements.
#br
B. Circuit Switched NoC
-The main difference between the circuit switched and the packet switced NoC is that there are no FIFOs in the input nodes.
--If the outputis occupied, a negative acknowledgment is instead sent back to transmitter.
-- In this case, the transmitter has to reissue the connection request at later time.
--Correspondingly, an acknowledgment is sent once the packet has reached the final destination.
-The disadvantage is that this network is not very suited for small transactions.
--Since the sender has to keep the connection alive until an acknowledgment has been receive which leads to allocated but otherwise unused network links.
-The overall design is similar to packet switched version with the exception of the input module.
-The arbiter is also different from the arbiter in the packet switched version.
--It has to arbirate immediately if two or more connections arrive simultaneously to one output port.
-The critical path in the FPGA implementation is the arbiter which has to decide immediately if a circut setup request should be accepted or rejected.
#br
C. Minimal NoC
-The main reason for including this architecture is to provide an upper bound on the achievable performance of an FPGA based NoC.
--Due to its low complexity, it should be hard to create a NoC with distributed routing that can run at higer clock frequency without making significant compromises on area and latency.
-This NoC architecture does not use arbitration as all.
--If two flits arrive at one output port at the same time, one of them is simply discarded.
--This means that such a network would have to be statically scheduled or use some other means of guaranteeing that flits does not colide, such as for example an operating system which handles allocation of network channels dinamically.
**IV. Results [#cdd6990c]
-We have verified that we can retain the same timing when synthesizing a 2×2 mesh with NoCs.
-In this case, we have used floorplanning constraints to make sure that the NoC switches are not located directly adjacent to each other.
&br;
LEFT:&ref(TableII.JPG);
#br
TableII:The performance of the different NoC architectures in different FPGAs.
#br
-TableII shows that clock frequency and resource usage of the NoC switches.
--The clock frequencies of the packet switched and circuit swiched networks are not too far from the upper limit provied by the minimalistic NoC switch, they can reach about 75% - 80% of the clock frequency.
--In term of area, the packet switched and circuit switched node uses about the same amount of slices whereas the minimalistic NoC switch uses about 75% of the area uesd by the former.
A. ASIC comparison
-A 4 ports version of all three switches were synthesized to a 90 nm process to get a rough gate count estimate.
-In order to synthesize the FPGA specific components, these were reimplemented using only behavioral RTL code.
-This means that these figures should mainly be seen as a comparison of the relative merits of each NoC when considered for an application that might be used in both as FPGA and an ASIC.
--Pacaket switched 4 ports switch: 29,000 gates.
--Circuit switched 4 ports switch: 4,700 gatas.
--Minimalistic 4 ports switch: 4,500 gates.
*References & keywords [#y70ad802]
-crossbar
--おそらくcrossbar switchのこと。くわしくは[[ここ>http://e-words.jp/w/E382AFE383ADE382B9E38390E383BCE382B9E382A4E38383E38381.html]]に。
-RLOC制約
--Xilinx製のFPGAボードの制約条件か?
--http://www.xilinx.com/itp/xilinx10j/books/docs/cgd/cgd.pdf (P263)
-source routing
--http://e-words.jp/w/E382BDE383BCE382B9E383ABE383BCE38386E382A3E383B3E382B0.html
-Virtex-4
--論文で使用されているFPGAボード。
--http://japan.xilinx.com/products/virtex4/index.htm
--http://focus.tij.co.jp/jp/analog/docs/refdesignovw.tsp?familyId=64&contentType=2&genContentId=34819
*P.S [#d03ff55e]
-Performance = size + frequency + throughput (By Ben-sensi)
--Althought Minimal NoC is better size(smaller than packet and circuit) and frequency of Minimal NoC is better (higher than that of packet and circut), the performance of Minimal NoC is worse than that of packet and circut because Minimal NoC discards input flits if more than one flit arrive at one output port at the same time.
終了行:
[[三浦/Paper Reading]]
last update:May 11, 2009
A comparison of three FPGA optimized NoC architectures
Date:April 2009
#contents
*Summary [#t407777a]
**Abstract [#qb11ee95]
-NoC has has seen as a possible solution to the on-chip interconnection problem.
-In this paper, 3 different network types optimized for FPGA architecture are described; a packet switched, a circuit switched and aminimalistic network without arbitration.
**I. Introduction [#d97dfcfe]
-The problem on how to connect components on chip efficientry.
-The traditional approach that use time multiplexed buses does't scale very well.
-The solution has been to add extra buses to increase the level of parallelism.
--If even more performance is requires, crossbars can be used to further increase the number of simultaneous transaction.
--A crossbar on the other hand will get prohibitively expensive as the number of ports increases.
-It is desirable to find a reusable alternative to a traditional bus or crossbar based solution with high performance and low development and verification cost.
#br
-NoC approach has been seen as such a solution.
-A typical NoC design consists of many small switches containing a small crossbar, control logic and varying amounts of data buffers.
-The underlying idea is to build a network of small switches with a limited number of ports so that the size of the crossbar in these switches can be kept small.
**II. Background [#ca3ea5c9]
-There are number of different approaches to NoCs.
-The most important design decision is to decide how connections are handled.
--In circut switched network, a connection has to be setup between the sender and receiver before any data can be transferred reliably.
--In packet switched network, a packet can be buffered in the network itself and the sender may be able to inject a complete packet into the network before the sender has received the first flit.
-Another important design decision is how the routing decisions are made.
--If source routing is used the sender specifies the entire path the packet will take through the network.
--If distributed routing is used, the switches will decide which path the packet should take.
--The advantage of source routing is that there is no need for a routing table in the switches.
--The advantage of distributed routing is that there is no need to reserve space in the packet for source routing data.
#br
-The topology is also important.
-The most used topology in NoC publications is a 2D mesh network.
--Another topology; ring, 2D toruses and 3D toruses.
#br
-Another important decision is how to avoid deadlocks.
--Deadlocks may occur if dependencies exist in the network so that two trasactions are waiting for resources held by the other resource.
--If there are no circular dependencies in the network a deadlock will not occur.
**III. Implementation [#r5361d80]
A. Packet Switched NoC
-The packet switched NoC is based on the fire and forget principle.
--As soon as a flit has left the sending endpoint and entered the network, the sender does not have to take any further action for this flit.
--The network guarantees by design that the flit will eventually be deliverd to its destination endpoint.
-The result of the route look-up indecates which output node this packet should be sent to using one hot coding.
-To reduce the critical path of the switch, the route look-up is actually performed for the path to take in the next switch while the path to take in the current switch is one hot coded on the SEL_* signals.
-the empty check block makes sure that no_SELsignal is sent to any arbiter if the shift register does not contain any data.
--By doing this, the arbiter will be simplified as compared to having separate direction and data available signals.
--It is cheaper to identify the case where only one input port wants to send a packet to the output port and send the packet immediately without any arbitration delay.
-The block which generates the read enable signal to the input FIFO has to consider a large number of signals.
--It is crucial to implement that block efficiently and place it so the routing delay is minimized.
-The READY signal is adjusted for pipeline latency so that the sender has a chance to stop sending before the FIFO is full.
-Each output port in a four port switch can only select from one of three input ports since there should be no need to route a packet back to where it came from in most topologies.
--If more than one input port needs to send a packet to the same output port, an arbiter in the output port uses round robin to select the port that may send.
--In our four port NoC switch, the output port is essentially a 3-to-1 mux controlled by the arbiter (or a 4-to-1 mux in the case of a five port switch).
-The latency of the switch when the input FIFO is empty and the output port is available is 3 clock cycles.
--If more than one input port has data for a certain output port, the latency is increased to 4 clock cycles due to an arbiter delay of one clock cycle for the packet that wins the arbitration.
-There are two critical paths in this switch.
--One path is caused by the read enable signal that is sent to the input FIFO.
--The other is from the FIFO to the route look-up due to the slow output of the SRL16 elements.
#br
B. Circuit Switched NoC
-The main difference between the circuit switched and the packet switced NoC is that there are no FIFOs in the input nodes.
--If the outputis occupied, a negative acknowledgment is instead sent back to transmitter.
-- In this case, the transmitter has to reissue the connection request at later time.
--Correspondingly, an acknowledgment is sent once the packet has reached the final destination.
-The disadvantage is that this network is not very suited for small transactions.
--Since the sender has to keep the connection alive until an acknowledgment has been receive which leads to allocated but otherwise unused network links.
-The overall design is similar to packet switched version with the exception of the input module.
-The arbiter is also different from the arbiter in the packet switched version.
--It has to arbirate immediately if two or more connections arrive simultaneously to one output port.
-The critical path in the FPGA implementation is the arbiter which has to decide immediately if a circut setup request should be accepted or rejected.
#br
C. Minimal NoC
-The main reason for including this architecture is to provide an upper bound on the achievable performance of an FPGA based NoC.
--Due to its low complexity, it should be hard to create a NoC with distributed routing that can run at higer clock frequency without making significant compromises on area and latency.
-This NoC architecture does not use arbitration as all.
--If two flits arrive at one output port at the same time, one of them is simply discarded.
--This means that such a network would have to be statically scheduled or use some other means of guaranteeing that flits does not colide, such as for example an operating system which handles allocation of network channels dinamically.
**IV. Results [#cdd6990c]
-We have verified that we can retain the same timing when synthesizing a 2×2 mesh with NoCs.
-In this case, we have used floorplanning constraints to make sure that the NoC switches are not located directly adjacent to each other.
&br;
LEFT:&ref(TableII.JPG);
#br
TableII:The performance of the different NoC architectures in different FPGAs.
#br
-TableII shows that clock frequency and resource usage of the NoC switches.
--The clock frequencies of the packet switched and circuit swiched networks are not too far from the upper limit provied by the minimalistic NoC switch, they can reach about 75% - 80% of the clock frequency.
--In term of area, the packet switched and circuit switched node uses about the same amount of slices whereas the minimalistic NoC switch uses about 75% of the area uesd by the former.
A. ASIC comparison
-A 4 ports version of all three switches were synthesized to a 90 nm process to get a rough gate count estimate.
-In order to synthesize the FPGA specific components, these were reimplemented using only behavioral RTL code.
-This means that these figures should mainly be seen as a comparison of the relative merits of each NoC when considered for an application that might be used in both as FPGA and an ASIC.
--Pacaket switched 4 ports switch: 29,000 gates.
--Circuit switched 4 ports switch: 4,700 gatas.
--Minimalistic 4 ports switch: 4,500 gates.
*References & keywords [#y70ad802]
-crossbar
--おそらくcrossbar switchのこと。くわしくは[[ここ>http://e-words.jp/w/E382AFE383ADE382B9E38390E383BCE382B9E382A4E38383E38381.html]]に。
-RLOC制約
--Xilinx製のFPGAボードの制約条件か?
--http://www.xilinx.com/itp/xilinx10j/books/docs/cgd/cgd.pdf (P263)
-source routing
--http://e-words.jp/w/E382BDE383BCE382B9E383ABE383BCE38386E382A3E383B3E382B0.html
-Virtex-4
--論文で使用されているFPGAボード。
--http://japan.xilinx.com/products/virtex4/index.htm
--http://focus.tij.co.jp/jp/analog/docs/refdesignovw.tsp?familyId=64&contentType=2&genContentId=34819
*P.S [#d03ff55e]
-Performance = size + frequency + throughput (By Ben-sensi)
--Althought Minimal NoC is better size(smaller than packet and circuit) and frequency of Minimal NoC is better (higher than that of packet and circut), the performance of Minimal NoC is worse than that of packet and circut because Minimal NoC discards input flits if more than one flit arrive at one output port at the same time.
ページ名: