¿¹/Benz/BANC/summary
¤ò¥Æ¥ó¥×¥ì¡¼¥È¤Ë¤·¤ÆºîÀ®
[
¥È¥Ã¥×
] [
¿·µ¬
|
°ìÍ÷
|
ñ¸ì¸¡º÷
|
ºÇ½ª¹¹¿·
|
¥Ø¥ë¥×
|
¥í¥°¥¤¥ó
]
³«»Ï¹Ô:
[[¿¹/Benz/BANC]]
*Summary [#o446c8cc]
³Æ¾Ï¡¢³ÆÃÊÍ¤È¤ËºÇ¤â½Ò¤Ù¤¿¤¤¤È»×¤ï¤ì¤ëʸ¤òÈ´¿è¤·¤Æ¤ß¤¿¡£¡Ê¤Þ¤È¤Þ¤Ã¤Æ¤Ê¤¤
----
#contents
**µ.Introduction [#ta2c62c0]
-Deep sub-micron processing technologies have enabled the
implementation of new application-specific embedded architectures
that integrate multiple software programmable processors
and dedicated hardware components together onto a
single chip.
-These application-specific architectures
are emerging as a key design solution for today¡Çs nonoelectronics
design problems.
+wireless communication
+broadband/distributed networking
+distributed computing
+multimedia computing
-From interconnection links level, wires are becoming increasingly
the bottleneck, making transistors play second role.
As results, wires dominate performance figures, power consumption
and area utilization.
-NoC is a scalable architectural platform with
huge potential to handle growing complexity and can provide
easy re-configurability.
-Packet switching supports asynchronous transfer of information.
-the NoC offers several promising features
+it transmits packets instead of words.
+transmission can be conducted in parallel
-This paper¡Çs major contribution is investigating the design
issues of a generic network on chip taking into account both
computation and communication parts and placing them on the
same footing.
**¶.ON CHIP INTERCONNECTION DESIGN ISSUES [#f5d7fcfa]
-On chip interconnection networks use layered approaches
-Each layer may include one or more closely related protocol functions, such as
data fragmentation, encoding and synchronization.
-the NoC interconnection paradigm is also characterized by its
topology, protocol, and flow control.
-There are several decisions that should be made to design such
system. These decisions should be made on communication
protocol, switching style, network topology, clock synchronization
method, signaling scheme, etc.
-Various type of interconnect architectures for MCSoC architectures
have been proposed so far. Most of them borrowed
ideas from the area of parallel computing with the consideration
of different set of constraints, such as power, complexity,
etc.
-The common desires are low latency and high throughput.
(1)Nodes interconnection - Topology
-The topology of a typical NoC system simply defines how the nodes are interconnected by links.
-We broadly classify them as direct and indirect topologies.
(2) Packets movement - Protocol
-There is a large protocol space to select from for NoCs.
-Circuit switching, packet switching, and wormhole switching are possible choices for NoC protocols.
-In circuit switching
--a physical path from the source to the
destination is reserved prior to the transmission of data.
--Once a transmission starts, the transmission is not corrupted by other transmission since packets are not stored in buffer as in packet switching.
--The advantage of circuit switching approach is that the network bandwidth is statically reserved for the whole duration of the data.
-The overhead of this approach is that the setting up of an end-to end path causes unnecessary delay.
--In summary, circuit switching can provide high performance but little flexibility.
--Every packet is composed of a control part, the header, and a data part (also named payload).
--Network switches inspect the headers of incoming packets to switch the packet to the appropriate output port. In this scheme, the need for sorting entire packets in a switch makes the buffer requirement very high.
-wormhole packets-switching
--each packet is further divided into flits (flow controlunit) and the input and output buffers are expected to store¡¡only a few flits.
--The header flit reserves the routing
channel of each switch, the biddy flits will then follow the
reserved channel, and the tail flit will later release the channel
reservation .
--The advantage of the wormhole routing is that
it does not require the complete packet to be stored in the
switch¡Çs buffer while waiting for the header flit to route to
the next stages.
--it requires much less buffer spaces
--One packet may occupy several intermediate switches at the
same time.
--The disadvantage is that by allowing a message to occupy
the buffers and channels, wormhole routing increases the
possibility of deadlock.
(3) Flow control
-This is used only for dynamic routing.
--Flow control determines how resources, such as buffers and
channels bandwidth are allocated and how packet collisions are
resolved.
(4) Packet Size Selection
-The packet size highly depends
on the characteristics of the application being used
--If a message has to be split in too many
small packets which have to be re-assembled at destination to
obtain the original message, the resulted overhead will be too
high.
--the correct packet size is also crucial to
make optimum use of the network resources.
(5) Virtual Channels Allocation
-The allocation issue is
considered when virtual channels are used.
-When a given flit first arrives at an input port, its type is first
decoded. If it is a header flit, then, according to its virtual
channel number (VCID) field, it is stored in the corresponding
virtual channel buffer.
-When the body flits arrive,
they are queued into the buffer of the input virtual channel
and inherit the particular output virtual channel reserved by
the header
A. Interconnection Complexities
-There are many performance depending
parameters that should be determined:
+the sizes of FIFO memories in each network interface
+the sizes of packet -> each packet needs a header with
routing information.
+Functions mapping -> When function
partitioning is done, the positioning of functional unit must
be done carefully in order to minimize routing path lengths.
+Timing performances in a NoC are not easily predictable.¡¡->¡¡Even if the bandwidth per link is high, traffic congestion in a node can create long latencies, which slows down the system.
**III. BANC SPECIFICATION AND BUILDING BLOCKS [#lb27439f]
-The BANC platform is based on S array, d dimensional mesh architecture built by its dimension d and array S.
-Where b is the one directional bandwidth. We have to note
that since this architecture achieves the simple connectional
scheme with the complexity order O(S^d), the shortest path
routing algorithm is mostly applied to it.
-The switch architecture has great impact on the costs and
on the performance of the whole network
-BANC architecture uses wormhole packet switching, and messages
are sent by means of packets (several flits). Therefore,
the switching has low latency, saves memory buffers and, with
appropriate routing algorithm, communication deadlock can be
avoided.
-When a packet is sent by a core attached to a switch port,
the master (sender) must include, in the packet header, the
coordinates of the switch at which the destination core (slave)
is attached.
-The coordinate is used by each switch in the packet
path, which compares the destination address with its own
address, so that it can determine the output channel of the
packet.
A. Communication
-BANC is based on the message passing communication
model.
--Cores communicate by sending and receiving request
and response signals.
--Every core (resource) has a unique
address and is connected to the BANC via a switch. It
communicates with the switch through its network adapter.
-The BANC¡Çs network interface specifies four protocol layers:
+Physical layer
+data link layer
+network layer
+Transport layer
B. BANC Packet format
-PBEG
--The header flit is the one in which PBEG equals 1, and
it is always the packet header. This header is composed by
routing bits and RES (reserved) fields.
---The routing bits are
information used by each SmartRoute (switch) to perform
the packet routing. That is, the coordinates of the destination
core.
-RES
--RES is reserved for future expansion, i.e., for the
implementation of protocols above the network layer (eg.
information for packet reordering), and it is not processed by
the routers.
-Data
--Data (Payload), which has an unlimited length, comes immediately after the header flit.
-PEND
--The last payload flit has PEND equals 1, which means this flit as the packet
trailer.
C. Switching Policy
-Each switch has five inputs and five outputs. One input/output
pair is for communication with the resource.
-When a challenge
of direction occurs, the simplest method is to let one input
choose first and then consequently let the remaining packets
in some predefined order choose.
1) Switch Architecture
-The BANC¡Çs switch hardware is
simple crossbar circuit. It has 5 special FIFOs with 8x32-bits
each and performs all the control flow and routing functions.
-The crossbar allows 5 different data to be routed at the
same time. Every output port has an associated arbiter using
simple Round-Robin scheduling scheme, with no priority.
-BANC switch ports include two unidirectional
channels, each one with its data, framing and flow control
signals.
-The flow control bits are used to validate data at the channel
and to acknowledge (ACK) the received data.
-The operation of a switch is to perform one or more tasks
depending on the format of the flit. If the flit contains a header,
the processing sequence is as follows
+input arbitration
+routing
+output arbitration
-The routing operation consists of four major tasks
+Receiving flits from a neighbor node
+transmitting flits to a neighboring node
+deciding the channel through which a flit
must be forwarded (routing), and (3) host resource NI, which
involves assembling of flits into messages (whole packet) and
disassembling a message into flits.
2) Network Adaptor
-The network adaptor provides the
conversion of the packet-based communication of the BANC
to the higher-level protocol that cores use.
-It implements
--via connections
--high-level services, such as transaction ordering,
--throughput
--atency guarantees
-- end-to-end flow control
-The network adapter also implements adapters to existing
on-chip protocols, such as AXI, OCP and DTL, to seamlessly
connect existing IP modules to the NoC.
-Decoupling computation from communication is key in managing
the complexity of designing chips with billions of transistors,
because it allows the IP modules and the interconnect
to be designed independently
**IV. DISCUSSION [#q0c67676]
-Before the real implementation of the BANC architecture,
we need estimation of several communication and computation
components design parameters.
-In our earlier work [26], we
proposed and designed a novel architecture (mainly related to
the computation component part),several promising
features needed in future MCSoCs systems
--low power
--low hardware complexity
Here in this work, we turn our focus into the communication part and particularly into the buffer design, which is the important part of the
switch.
-We simulated BANC system with 25 cores and 25 switches
organized as a 5x5 mesh grid. The connections between nodes
consist of duplex (simultaneously transfer in both ways) links
with adjustable bandwidth and delay.
-Figure 10
--Packet drop probability defines the probability of
a packet being lost in the network due to heavy traffic and
limited buffer capacity in the switches.
--The experiment shows that packet drop probability decreases
when the buffer size increases.
-Figure 11
--The result shows that the drop probability increases as the
communication load increases over some value of communication
load.
--We summarize that the drop probability is
more sensitive to the communication load than to the buffer
size.
-Figure 12
--packet delays for five different buffer sizes are nearly 1 ¦Ìs
when communication loads equals to 0.3019. This means that
the buffer is little utilized when the communication load is
somehow low.
--When the communication load increases, a
number of packets are dropped and the packet delay increases
from 2.9 ¦Ìs to 9.8 ¦Ìs with the buffer size increasing from 4
packets to 64 packets.
-We have to note that the packet delay
is almost constant if buffer size is constant. That is, packet
delay is not sensitive to the communication load when there
are some packets dropped but is affected by the queue delay.
#comment
½ªÎ»¹Ô:
[[¿¹/Benz/BANC]]
*Summary [#o446c8cc]
³Æ¾Ï¡¢³ÆÃÊÍ¤È¤ËºÇ¤â½Ò¤Ù¤¿¤¤¤È»×¤ï¤ì¤ëʸ¤òÈ´¿è¤·¤Æ¤ß¤¿¡£¡Ê¤Þ¤È¤Þ¤Ã¤Æ¤Ê¤¤
----
#contents
**µ.Introduction [#ta2c62c0]
-Deep sub-micron processing technologies have enabled the
implementation of new application-specific embedded architectures
that integrate multiple software programmable processors
and dedicated hardware components together onto a
single chip.
-These application-specific architectures
are emerging as a key design solution for today¡Çs nonoelectronics
design problems.
+wireless communication
+broadband/distributed networking
+distributed computing
+multimedia computing
-From interconnection links level, wires are becoming increasingly
the bottleneck, making transistors play second role.
As results, wires dominate performance figures, power consumption
and area utilization.
-NoC is a scalable architectural platform with
huge potential to handle growing complexity and can provide
easy re-configurability.
-Packet switching supports asynchronous transfer of information.
-the NoC offers several promising features
+it transmits packets instead of words.
+transmission can be conducted in parallel
-This paper¡Çs major contribution is investigating the design
issues of a generic network on chip taking into account both
computation and communication parts and placing them on the
same footing.
**¶.ON CHIP INTERCONNECTION DESIGN ISSUES [#f5d7fcfa]
-On chip interconnection networks use layered approaches
-Each layer may include one or more closely related protocol functions, such as
data fragmentation, encoding and synchronization.
-the NoC interconnection paradigm is also characterized by its
topology, protocol, and flow control.
-There are several decisions that should be made to design such
system. These decisions should be made on communication
protocol, switching style, network topology, clock synchronization
method, signaling scheme, etc.
-Various type of interconnect architectures for MCSoC architectures
have been proposed so far. Most of them borrowed
ideas from the area of parallel computing with the consideration
of different set of constraints, such as power, complexity,
etc.
-The common desires are low latency and high throughput.
(1)Nodes interconnection - Topology
-The topology of a typical NoC system simply defines how the nodes are interconnected by links.
-We broadly classify them as direct and indirect topologies.
(2) Packets movement - Protocol
-There is a large protocol space to select from for NoCs.
-Circuit switching, packet switching, and wormhole switching are possible choices for NoC protocols.
-In circuit switching
--a physical path from the source to the
destination is reserved prior to the transmission of data.
--Once a transmission starts, the transmission is not corrupted by other transmission since packets are not stored in buffer as in packet switching.
--The advantage of circuit switching approach is that the network bandwidth is statically reserved for the whole duration of the data.
-The overhead of this approach is that the setting up of an end-to end path causes unnecessary delay.
--In summary, circuit switching can provide high performance but little flexibility.
--Every packet is composed of a control part, the header, and a data part (also named payload).
--Network switches inspect the headers of incoming packets to switch the packet to the appropriate output port. In this scheme, the need for sorting entire packets in a switch makes the buffer requirement very high.
-wormhole packets-switching
--each packet is further divided into flits (flow controlunit) and the input and output buffers are expected to store¡¡only a few flits.
--The header flit reserves the routing
channel of each switch, the biddy flits will then follow the
reserved channel, and the tail flit will later release the channel
reservation .
--The advantage of the wormhole routing is that
it does not require the complete packet to be stored in the
switch¡Çs buffer while waiting for the header flit to route to
the next stages.
--it requires much less buffer spaces
--One packet may occupy several intermediate switches at the
same time.
--The disadvantage is that by allowing a message to occupy
the buffers and channels, wormhole routing increases the
possibility of deadlock.
(3) Flow control
-This is used only for dynamic routing.
--Flow control determines how resources, such as buffers and
channels bandwidth are allocated and how packet collisions are
resolved.
(4) Packet Size Selection
-The packet size highly depends
on the characteristics of the application being used
--If a message has to be split in too many
small packets which have to be re-assembled at destination to
obtain the original message, the resulted overhead will be too
high.
--the correct packet size is also crucial to
make optimum use of the network resources.
(5) Virtual Channels Allocation
-The allocation issue is
considered when virtual channels are used.
-When a given flit first arrives at an input port, its type is first
decoded. If it is a header flit, then, according to its virtual
channel number (VCID) field, it is stored in the corresponding
virtual channel buffer.
-When the body flits arrive,
they are queued into the buffer of the input virtual channel
and inherit the particular output virtual channel reserved by
the header
A. Interconnection Complexities
-There are many performance depending
parameters that should be determined:
+the sizes of FIFO memories in each network interface
+the sizes of packet -> each packet needs a header with
routing information.
+Functions mapping -> When function
partitioning is done, the positioning of functional unit must
be done carefully in order to minimize routing path lengths.
+Timing performances in a NoC are not easily predictable.¡¡->¡¡Even if the bandwidth per link is high, traffic congestion in a node can create long latencies, which slows down the system.
**III. BANC SPECIFICATION AND BUILDING BLOCKS [#lb27439f]
-The BANC platform is based on S array, d dimensional mesh architecture built by its dimension d and array S.
-Where b is the one directional bandwidth. We have to note
that since this architecture achieves the simple connectional
scheme with the complexity order O(S^d), the shortest path
routing algorithm is mostly applied to it.
-The switch architecture has great impact on the costs and
on the performance of the whole network
-BANC architecture uses wormhole packet switching, and messages
are sent by means of packets (several flits). Therefore,
the switching has low latency, saves memory buffers and, with
appropriate routing algorithm, communication deadlock can be
avoided.
-When a packet is sent by a core attached to a switch port,
the master (sender) must include, in the packet header, the
coordinates of the switch at which the destination core (slave)
is attached.
-The coordinate is used by each switch in the packet
path, which compares the destination address with its own
address, so that it can determine the output channel of the
packet.
A. Communication
-BANC is based on the message passing communication
model.
--Cores communicate by sending and receiving request
and response signals.
--Every core (resource) has a unique
address and is connected to the BANC via a switch. It
communicates with the switch through its network adapter.
-The BANC¡Çs network interface specifies four protocol layers:
+Physical layer
+data link layer
+network layer
+Transport layer
B. BANC Packet format
-PBEG
--The header flit is the one in which PBEG equals 1, and
it is always the packet header. This header is composed by
routing bits and RES (reserved) fields.
---The routing bits are
information used by each SmartRoute (switch) to perform
the packet routing. That is, the coordinates of the destination
core.
-RES
--RES is reserved for future expansion, i.e., for the
implementation of protocols above the network layer (eg.
information for packet reordering), and it is not processed by
the routers.
-Data
--Data (Payload), which has an unlimited length, comes immediately after the header flit.
-PEND
--The last payload flit has PEND equals 1, which means this flit as the packet
trailer.
C. Switching Policy
-Each switch has five inputs and five outputs. One input/output
pair is for communication with the resource.
-When a challenge
of direction occurs, the simplest method is to let one input
choose first and then consequently let the remaining packets
in some predefined order choose.
1) Switch Architecture
-The BANC¡Çs switch hardware is
simple crossbar circuit. It has 5 special FIFOs with 8x32-bits
each and performs all the control flow and routing functions.
-The crossbar allows 5 different data to be routed at the
same time. Every output port has an associated arbiter using
simple Round-Robin scheduling scheme, with no priority.
-BANC switch ports include two unidirectional
channels, each one with its data, framing and flow control
signals.
-The flow control bits are used to validate data at the channel
and to acknowledge (ACK) the received data.
-The operation of a switch is to perform one or more tasks
depending on the format of the flit. If the flit contains a header,
the processing sequence is as follows
+input arbitration
+routing
+output arbitration
-The routing operation consists of four major tasks
+Receiving flits from a neighbor node
+transmitting flits to a neighboring node
+deciding the channel through which a flit
must be forwarded (routing), and (3) host resource NI, which
involves assembling of flits into messages (whole packet) and
disassembling a message into flits.
2) Network Adaptor
-The network adaptor provides the
conversion of the packet-based communication of the BANC
to the higher-level protocol that cores use.
-It implements
--via connections
--high-level services, such as transaction ordering,
--throughput
--atency guarantees
-- end-to-end flow control
-The network adapter also implements adapters to existing
on-chip protocols, such as AXI, OCP and DTL, to seamlessly
connect existing IP modules to the NoC.
-Decoupling computation from communication is key in managing
the complexity of designing chips with billions of transistors,
because it allows the IP modules and the interconnect
to be designed independently
**IV. DISCUSSION [#q0c67676]
-Before the real implementation of the BANC architecture,
we need estimation of several communication and computation
components design parameters.
-In our earlier work [26], we
proposed and designed a novel architecture (mainly related to
the computation component part),several promising
features needed in future MCSoCs systems
--low power
--low hardware complexity
Here in this work, we turn our focus into the communication part and particularly into the buffer design, which is the important part of the
switch.
-We simulated BANC system with 25 cores and 25 switches
organized as a 5x5 mesh grid. The connections between nodes
consist of duplex (simultaneously transfer in both ways) links
with adjustable bandwidth and delay.
-Figure 10
--Packet drop probability defines the probability of
a packet being lost in the network due to heavy traffic and
limited buffer capacity in the switches.
--The experiment shows that packet drop probability decreases
when the buffer size increases.
-Figure 11
--The result shows that the drop probability increases as the
communication load increases over some value of communication
load.
--We summarize that the drop probability is
more sensitive to the communication load than to the buffer
size.
-Figure 12
--packet delays for five different buffer sizes are nearly 1 ¦Ìs
when communication loads equals to 0.3019. This means that
the buffer is little utilized when the communication load is
somehow low.
--When the communication load increases, a
number of packets are dropped and the packet delay increases
from 2.9 ¦Ìs to 9.8 ¦Ìs with the buffer size increasing from 4
packets to 64 packets.
-We have to note that the packet delay
is almost constant if buffer size is constant. That is, packet
delay is not sensitive to the communication load when there
are some packets dropped but is affected by the queue delay.
#comment
¥Ú¡¼¥¸Ì¾: