Theses
をテンプレートにして作成
[
トップ
] [
新規
|
一覧
|
単語検索
|
最終更新
|
ヘルプ
|
ログイン
]
開始行:
CENTER:[[ASL &ref(download.png,,30%); Templates for Technical Reports>http://webfs-int.u-aizu.ac.jp/~benab/publications/ASL-LATEX-TEMPLATE.zip]]
----
CENTER:[[&ref(ghat.jpg,,60%);>http://aslweb.u-aizu.ac.jp/benlab/index.php?Theses]]
#CONTENTS
***AY2020 Design of Hand Gesture Recognition based on Deep Neural Network [#j9344d7b]
-Author: AGEISHI Naoto
-Degree: BS
-[[Thesis>]]
-[[slides>]]
-[[Source Code>]]
***AY2020 Design of Interactive Software Interface for AI-Enabled Real-time Biomedical System [#j1fd1c28]
-Author:AOYAMA Naoki
-Degree: BS
-[[Thesis>]]
-[[slides>]]
-[[Source Code>]]
***AY2020 Hardware Acceleration of Convolution Neural Network on FPGA for Real-time Biomedical System [#c4e1bd46]
-Author:OKADA Yuki
-Degree: BS
-[[Thesis>]]
-[[slides>]]
-[[Source Code>]]
***AY2020 Research on Collaborative Learning Algorithm for AI-Enabled Real-time Biomedical System [#z9be34df]
-Author:PHEA Sinchhean
-Degree: BS
-[[Thesis>]]
-[[slides>]]
-[[Source Code>]]
***”Performance Study of Character Recognition with Feed-Forward Neural Network”, Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2018. [#i947aeb8]
-Author:Masaki Yamada,
-Degree: BS
-[[Thesis>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/37.s1220042_Yamada_Masaki-BS-18/s1220042_thesis.pdf]]
-[[slides>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/37.s1220042_Yamada_Masaki-BS-18/s1220042_slides.pptx]]
-[[Source Code>https://drive.google.com/drive/folders/1OKikrf5IU8kyokuvnxWFatTbCjH1NoCy?usp=sharing]]
-''Abstract'': Neural Networks (NNs) in embedded systems are usually implemented on microcontrollers. A NN implementation on a microcontroller lacks the
performance enhancement of parallel design. The decision of implementing an NN architecture on FPGA benefits from the parallelization and
configurability. Feed Forward Neural Networks (FFNN) with floating point (FP) precision performs a large number of primary products and sums. Also, for
each neuron of FFNN within the hidden layers, a non-linear function computation is required to determine the activation value of the neuron. Without
a specialized FP hardware, such computations can reduce the performance of the system. In this thesis, I present a performance study of character recognition
with Feed-Forward Neural Network towards efficient hardware implementation on FPGA (Fig. 3).
***”Design of a Leaky, Integrate and Fire (LIF) Neuron Core for NASH System”, Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2018. [#x191b349]
-Author: Kanta Suzuki
-Degree: BS
-[[Thesis>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/38-s1220215_Suzuki_Kanta-BS-18/s1220215.pdf]]
-[[slides>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/38-s1220215_Suzuki_Kanta-BS-18/s1220215_slides.pptx]]
-[[Source Code>https://drive.google.com/drive/folders/1vrKp6e_SlpVLZCXpXjd8W0gCg-j7vBF7?usp=sharing]]
-''Abstract'': Current interest in neuromorphic computer architectures is enormous, due to its conceptual attractiveness
and its potential applications to sensor networks, robotics, computer vision and other field. Certain types of neural networks (e.g.,CNNs) are so resource intensive
that it requires sever class computers to model, train and implement them.So their efficiency is not even close to those of the biological brain. Therefore
we develop the Neuro-inspired ArchitectureS in Hardware (NASH) System which use Spiking model. My research goal design of Leaky,Integrate and Fire Neuron
Core for NASH System.
***”Study of a Neuro-inspired Architecture in Hardware”, Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2018. [#c9d08113]
-Author: Kosuke Takakuwa,
-Degree: BS
-[[Thesis>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/39-s1220236_Takakuwa_Kosuke-BS-18/thesis.pdf]]
-[[slides>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/39-s1220236_Takakuwa_Kosuke-BS-18/takakuwa_slides.pptx]]
-''Abstract'': Our brain is a low-power, fault-tolerant, and highperformance
machine. Spiking neural network (SNN) is considered to be the third generation of neuron network
models. It can mimic the key functions of the human brain. SNN simulations are a flexible and powerful
method for investigating the behavior of neuronal systems. However, simulation of the spiking neural
networks in software is slow. This thesis studies our proposed neuro-inspired architecture for implementing SNN in hardware
***Title: 高性能なメニーコアシステムオンチップの為のマイクロリングの障害耐性を持つ光通信オンチップネットワーク [Micro-ring Fault-resilient Photonic On-chip Network for Reliable High-performance Many-core Systems-on-Chip] [#e3734b91]
-Author: Michael Conrad Meyer
-Degree: PhD Thesis, Graduate School of Computer Science and Engineering, University of Aizu, March 2017.
-[[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Mike-PhD-17/MichaelConradMeyer_PhD_Thesis_2017.pdf]];
[[Slides>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Mike-PhD-17/MichaelConradMeyer_PhD_Slides_2017.pdf]]
-Abstract-J: 人々はコンピューティング・システムにおいて、常により高度な機能を求め続け、 その結果、技術の規模を飛躍的に拡大してきました。しかし、この状況にも変化の
兆しが見えてきました。一つのIC チップ上で消費される電力は増加の一途をたど り、近年ではCPU のコアの能力そのものと同等に、そこで消費される電力の効率性 も重要視されてきました。現在、電子ネットワークオンチップ(NoCs)は様々な要 因によって性能の限界に近づいてきています。 オンチップ光ネットワーク(PNoCs)は現在、研究対象として最も注目されている
ものの一つです。PNoCs はこれまでの電子NoCs に比べ幾つかの点で優位性を示し ております。それらは、高帯域幅でのサポート、距離に依存しない消費電力、レイ
テンシや1ワットあたりの性能などです。波長分割多重通信は複数の並列での光の 流れを一つの導波管で行うことができ、MR は40GHz の速度での切り替えを可能
で、波長選択変調器またはスイッチとして使用することができます。このことは、 複数ビットのデータを同じ導波管で同時に伝達することが可能ということで、一つ
の導線に1ビットという制限のある電子回路とは異なるものです。また、他の利点 としては、一度経路が形成されてしまえば、データは起点から終点まで一気に到達
することが挙げられます。つまり、データのバッファーを何度も行う必要がないと いうことであり、消費電力を抑えることにもつながります。
フォトニック領域は放熱によって瞬間的な故障を引き起こすことはありませんが、 経年劣化はもちろん工程変動(PV)や熱変動(TV)には影響を受けます。劣化は動
的な構成品及び温度変化の高い部分で発生します。オプティカル領域では、MR や 導波管、ルータなどで故障が発生します。動的機器の光検出器などは、導波管など
の受動的機器に比べ故障発生の割合が高くなります。また、PNoC に高い脆弱性が あるとされた場合、故障により単一障害が発生する可能性や、故障したMR がメッ
セージの誤伝達や喪失を起こす可能性があります。 本論文では、将来のオンチップ光ネットワークのための、新たな光回路アルゴリズ
ムとその構造を提案します。 第一に、複数の故障したMR を処理する能力を持つ、新たな障害耐性のある光スイ ッチを提案します。このスイッチは5つのノンブロッキングポートを持つ光ルータ
をベースにしています。これは、MR が反対方向に進まないように(東から西へま たは北から南へ)する必要があります。またこのスイッチはPHENIC で使用された ハイブリッド空間スイッチィングも取り扱うことが可能です。
第二に、MR の故障をチェックし、適切なMR を割り当てる障害耐性パス設定アルゴ リズムを提案します。これは以前の二つのMRST の一つが異常状態にあるというこ とを意味しています。さらに、これには2つのMR 設定テーブルを使用しなければ なりません。一つは通常使用のため、もう一つはバックアップパスのためとなりま す。これにより、全てのルーティングの決定が一つの光スイッチの中で行われま
す。 第三に、オプティカル層での電力見積りのスキームを提案します。これにはルート 決定に使用できるほどの速さが必要です。計算のスピードが要求されるため、計算
そのものが単純でなければなりません。 最後に、ネットワークが「歪み」に基づいたルーティングの決定を可能とする回路 アルゴリズムとその構造を提案します。この歪み値は故障したMR の数とノードの
光力に基づきます。高温のノードや故障数の多いノード、高通信量を避けることに より、ネットワークの信頼性と性能が進展することになります。 提案したアルゴリズムと構造を、複数のフォトニック部品を組み込んだ詳細な物理
モデルの離散型シミュレーターで評価を行い、その結果、性能と電力において最小 の犠牲で、高い信頼性を得ることができました。完成したシステムはオプティカル 構成品における工程変動および熱変動の問題にも対処でき、これまでに存在したも のよりも信頼できるシステムとなりました。
-Abstract-E: Humans continue to demand higher performance from their computing systems,and as a result we have had aggressive increases in the scaling of technology, but
this is showing signs of change. The power consumed by a chip is ever increasing,and recently the power efficiency of communications has become as important as
the computational power of the cores. Typical electronic Networks-on-Chip (NoCs) are reaching their performance limitations thanks to various factors.
One highly sought after technology is Photonic Networks-on-Chip (PNoCs). PNoCs offer several benefits over conventional electrical NoCs, such as high-bandwidth
support, distance independent power consumption, lower latency, and improved performance-per-watt. Wavelength Division Multiplexing allows for multiple parallel
optical streams of data to concurrently transfer through a single waveguide and MRs can be switched at speeds as high as 40 GHz to realize wavelength-selective
modulators or switches. These technologies allow for multiple bits of data to travel concurrently through the same waveguide, which contradicts the one bit per wire
limitation of electronic circuits. Another benefit is that data is transferred in an end-to-end fashion once a path is configured, meaning that the data does not need
to be buffered multiple times, and thus saving power. The photonic domain is immune to transient faults caused by radiation, but is still susceptible to process variation (PV) and thermal variation (TV) as well as
aging. The aging typically occurs faster in active components as well as elements that have high thermal variation. In the optical domain, faults can occur in MRs,
waveguides, routers, etc. Active components, such as photodetectors, have higher failure rates than passive components, e.g. waveguides. Moreover, when paired with
the fact that a PNoC is highly vulnerable, as a fault may expose the single-point failure, a faulty MR can cause a message to misdelivered or lost. In this dissertation,
a set of novel photonic routing algorithms and architectures are proposed for future on-chip optical networks.
First, a new fault tolerant photonic switch, capable of handling multiple faulty MRs. The proposed switch is based on a non-blocking 5-port optical router. It
requires no MRs to travel in the opposite direction (e.g. East to West or North to South). The switch is also able to handle the previous hybrid spatial switching used
in PHENIC. Second, a fault tolerant Path configuration algorithm, which checks for MR faults and allocates the proper MRs to be used. This means that our previous 2 state
MRST must also have a faulty state. Additionally, the algorithm must use two MR Configuration Tables, one for standard use and another to be used for the backup
paths. This makes all of the routing decisions within a single optical switch. Third, a power estimation scheme for the optical layer, which is fast enough to be used for routing decisions. Because of the speed that the calculation must be
done, the calculation itself must be simple. Finally, I propose an architecture and routing algorithm pair, which allow for the network to make “strain” based decisions for the routing. This strain value is based
on the number of faulty MRs and the optical power of a node. This should improve the networks reliability and performance by avoiding nodes with high temperature, a high number of faulty nodes, or a lot of traffic.
The proposed architectures and algorithms were evaluated with a discrete-event simulator, which incorporates detailed physical models of the photonic components. Results show that the proposed system was able to achieve a higher reliability with
minimal sacrifices in the overall system performance and energy. The resulting system is able to address the problems of process variation as well as temperature variation in optical components, and is more reliable than previous existing systems.
***Reliable Real-time Multi-core Vision System-on-Chip based on OASIS NoC [#ne29f409]
-Author:Akihito Kajikawa
-RPS (Presentation Date: August 10, 2016) [[slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/31.Kajikawa-MS-2018/Kajikawa-RPS-August102016-Slides.pdf]]; [[report.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/31.Kajikawa-MS-2018/Kajikawa-RPS-August102016-Report.pdf]]
-Abstract: TBC
***High-performance, Scalable Photonics On-chip Network for Many-core Systems-on-Chip [#u5e7acd1]
-Author: Achraf Ben Ahmed,
-Degree: PhD Thesis, Graduate School of Computer Science and Engineering, University of Aizu, March 2016.
-[[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Achraf-PhD-16/AchrafBenAhmed_PhD_Thesis_2016.pdf]];
[[Slides>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Achraf-PhD-16/AchrafBenAhmed_PhD_Slides_2016.pdf]]
-Abstract: The continuous increasing demand for higher performance computing systems and aggressive technology scaling has driven the trend of integrating a large number of cores on a single chip. In future generations of high-performance many-core systems, the efficiency of the communication infrastructure is as important as the computation efficiency of individual cores. Conventional electrical Networks-on-Chip (NoCs) are expected to reach their limits with increasing core counts because of high power dissipation and reduced performance. As indicated in the latest version of ITRS roadmap, photonic wiring is a promising interconnect paradigm for future system-on-chip (SoC) designs that can provide broadband data transfer rates unmatchable by the existing metal interconnects. When combined with Wavelength Division Multiplexing (WDM), multiple parallel optical streams of data are concurrently transferred through a single waveguide. This contrasts with the Electronic Networks-on-Chip (ENoCs) that require a unique metal wire per bit stream. The key to saving power in on-chip photonic communication comes from the fact that once a photonic path is established, the optical data is transmitted in an end-to-end fashion without the need for buffering, repeating, or regenerating.
The photonic switching/routing techniques, configuration and routing algorithm directly affect the performance and power characteristics of future many-core on-chip Photonic communication. In particular, the control module and the path configuration algorithm, which orchestrate the different electrical control function, play a significant role on how both electrical and photonic resources are utilized. In this dissertation, a set of novel photonic routing algorithms and architectures are proposed for future on-chip optical networks.
First, a new low-latency, non-blocking photonic switch/router (NBPS) and its control module capable of handling all photonic communication configuration tasks is proposed. The proposed approach is based on a new hybrid spatial switching mechanism for the photonic data stream transfer and is done by manipulating the state of the broadband switching elements. In addition, the NBPS is based on a Wavelength-Selective-Switching (WSS) for handling all communication configuration tasks.
Second, a new contention-aware path configuration algorithm and architecture for Electro-Assisted Photonic Network-on-Chip (EA-PNoC) is proposed. In addition to the main configuration tasks, the algorithm also decouples the Electronic Control Network (ECN) from the Photonic Communication Network (PCN) in a manner that both photonic and electric domains work independently from each other. The proposed algorithm orchestrates the different path configuration packets processes and significantly alleviates the contention in the ECN.
Third, a low-complexity routing and configuration algorithm for EA-PNoC is proposed. The approach is mainly based on photonic components augmented with a simple electronic control module and a so-called wavelength-shifting mechanism. The main merit of this new approach is to configure the path using photonic devices instead of the typical power-hungry electronic router. The proposed architectures and algorithms were evaluated with a discrete-event simulator, which incorporates detailed physical models of the photonic components. Results show that we could achieve better energy efficiency, as well as a considerable reduction in the blocking occurrence, which is the main source of latency and bandwidth degradation in conventional EA-PNoCs.
***Evaluation of Error Detection Mechanism for 3D-OASIS-Network-on-Chip System, [#icf22be1]
-Author: Kajikawa Akihito,
-Degree: Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2016.
-[[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Kajikawa-BS-16/Kaikawa-BS-16-gt.pdf]] ,
[[Slides>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Kajikawa-BS-16/Kaikawa-BS-16-slides.pdf]],
[[Technical Report>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Kajikawa-BS-16/Kajikawa-TR-2016.pdf]]
-Abstract: During the past decade, 3D-Network-on-Chips (3D-NoCs) have been showing their advantages against 2D-NoC systems. At the same time, concerns about their reliability have grown as well due to the different kinds of faults that these systems may encounter. Therefore, 3D-NoC must be fault-tolerant to any kind of permanent failure or run-time malfunction. To achieve this goal, a fault-detection scheme is necessary to discover the presence of fault before the propagation of the fault into the entire system and cause its collapse. Previously, 3D-Fault-Tolerant-OASIS (3D-FTO) has been designed. 3D-FTO is able to recover from a large number of faults that can occur at links, input-buffers, and crossbar. However in this system, a fault detection mechanism is absent and the diagnosis of faults rely on assuming the presence of faults at a certain period of time. This make the fault recovery less efficient and diminish the reliability of the system.
***Design and Analysis of Electrical Control Router for Hybrid Photonics NoC System [#j59bbe00]
-Author: Saito Ken,
-Degree: Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2016.
-[[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/KenSaito-BS-16/KenSaito-BS-16-gt.pdf]] ,
[[Slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/KenSaito-BS-16/KenSaito-BS-16-slides.pdf]],
// (This should be checked again) [[TR.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/KenSaito-BS-16/KenSaito-BS-16-TR.pdf]]
-Abstract: Despite the huge bandwidth that we can get from the hybrid Photonics Network-on-Chip (PNoC) architectures, the Electronic Control Network (ECN) is considered as the main source of latency and power consumption. This overhead might be caused by the use of an inappropriate message size or a non-optimized physical channel width. In this thesis, we design and we evaluate a light-weight electrical control router for hybrid PNoC system. The proposed router is optimized for the latency rather than the bandwidth, with an adequate configuration packet size and physical channel width.
***Power and Performance Comparison of Electronic 2D-NoC and Opto-Electronic 2D-NoC, [#rf432ca4]
-Author: Okada Ryoga,
-Degree: Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2016.
-[[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Okada-BS-16/Okada-BS-16-GT.pdf]],
[[Slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Okada-BS-16/Okada-BS-16-slides.pdf]],
// (This should be checked again) [[TR.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Okada-BS-16/Okada-BS-16-TR.pdf]],
-Abstract: Nowadays, increasing emerging application complexity and improvement in process technology have enabled the design of many-core processors with tens to hundreds of cores on a single chip. Photonic Network-on-Chips (PNoCs) have recently been proposed as an alternative approach with high performance-per-watt characteristics for intra-chip communication. In this thesis, we present a performance exploration of Hybrid Photonic Network-on-chip and conventional electronic Network-on-Chip to show the benefit of using photonic technologies.
***High-throughput Architecture and Routing Algorithms Towards the Design of Reliable Mesh-based Many-Core Network-on-Chip Systems, [#m661012c]
-Author: Akram Ben Ahmed,
-Degree: PhD Thesis, Graduate School of Computer Science and Engineering, University of Aizu, March 2015.
-[[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram-PhD-15/BenAhmed_Akram_DoctorThesis_March2015.pdf]], [[Slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram-PhD-15/BenAhmed_Akram_DoctorPresentation_01142015.pdf]]
-Abstract: Global interconnects are becoming the principal performance bottleneck for high
performance Systems-on-Chips (SoCs). Since the main purpose for these systems is
to shrink the size of the chip as smaller as possible while seeking at the same time
for more scalability, higher bandwidth, and lower latency. Conventional bus-basedsystems
are no longer reliable architecture for SoCs due to the lack of scalability
and parallelism integration, high latency and power dissipation, and low throughput.
During this last decade, Network-on-Chip (NoC) interconnect has been proposed as a
promising solution for future SoC designs. It offers more scalability than the sharedbus
based interconnection and allows more processors to operate concurrently.
Despite the higher scalability and parallelism integration offered by NoC over
traditional shared-bus based systems, it is still not an ideal solution for future large
scale SoCs. This is due to some limitations such as high power consumption, high
cost communication, and low throughput. Recently, merging NoC to the third
dimension (3D-NoCs) has been proposed to deal with those problems, as it was a
solution offering lower power consumption and higher speed.
As 3D-NoC architectures started to show their outperformance and energy ef-
ficiency against 2D-NoC systems, questions about their reliability to sustain their
performance growth begun to arise. This is mainly due to challenges inherited from
both 3D-ICs and NoCs: On one side, the complex nature of 3D-IC fabrics and the continuing shrinkage of semiconductor components. Furthermore, the significant
heterogeneity in 3D chips which are likely to mix logic layers with memory layers
and even more complex technologies increases the fault’s probability in a system.
On the other side, the single-point-failure nature of NoC introduces a big concern
to their reliability as they are the sole communication medium. As a result, 3DNoC
systems are becoming susceptible to a variety of faults caused by crosstalk,
electromagnetic interferences, impact of radiations, oxide breakdown, and so on. A
simple failure in a single transistor caused by one of these factors may compromise
the entire system reliability where the failure can be illustrated in corrupted message
delivery, time requirements unsatisfactory, or even sometimes the entire system
collapse.
In this thesis, we propose 3D-Fault-Tolerant-OASIS (3D-FTO), a robust faulttolerant
3D-NoC router architecture endorsed with reliable and graceful routing
algorithms. The proposed design handles a large number of faults in the inputbuffer,
crossbar, and links (which are the most susceptible components to faults
in 3D-NoC systems) leveraging the inherent structural redundancy in the architecture
to work around errors. Contrary to previous works, the proposed system
tolerates multiple faults in a single crossbar with no considerable performance degradation.
In addition, the used algorithms are always minimal (as long as there exist
one minimal path) and with the aid of Random-Access-Buffer (RAB) mechanism,
deadlock-freedom is ensured with no significant area nor power overhead.
The proposed 3D-FTO system was synthesized using Synopsys Design Compiler
at 45nm technology CMOS process technology and its layout is obtained using
Cadence SoC Encounter. The evaluation results showed the ability of 3D-FTO
to work around different kinds of faults ensuring graceful performance degradation
while minimizing the additional hardware complexity and remaining power-efficient.
***Architecture and Design of an Efficient Router for OASIS 3D Network-on-Chip System, [#h6efd877]
-Author: Mitsunari Ishii
-Degree: Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2015.
-[[slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Ishi-BS-2015/GT2015_Ishii_Final_Slide.pdf]], [[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Ishi-BS-2015/GT2015_Ishii_Thesis_Final.pdf]]; [[Technical Report>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Ishi-BS-2015/MitsunariIshii_TR2014.pdf]]
-Abstract: Reliability has become one of the main problems
that three-dimensional Networks-on-Chips (3D-NoCs)
designers are trying to deal with. This is mainly caused
by their complex nature making them easily
vulnerable to failures. As a consequence, a lot of
research has been conducted in order to make these
systems fault-tolerant while minimizing the
performance degradation as much as possible.
Previously, a reliable 3D Network-on-Chip system
was designed in our research group, named 3DOASIS-NoC
(3D-ONoC). However, in 3D-ONoC a
fault detection mechanism is absent and the diagnosis
of faults relies on assuming their presence o at a certain
period of time. In this thesis, we propose a reliable
Network Interface (NI) for 3D-ONoC system where an
efficient Error Detection mechanism is implemented.
From the evaluation results, we verified that the
proposed NI can correctly detect the presence of faults
and efficiently resend the faulty flits to their
destination
***Design and Evaluation of Efficient Error Detection Mechanism for OASIS 3D-NoC [#qa684f30]
-Author: Yuuki Tanaka
-Degree: Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2015.
-[[slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Tanaka-BS-2015/GT2015_YukiTanaka_slides_Final.pdf]], [[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Tanaka-BS-2015/GT2015_YuukiTanaka_Final.pdf]], [[technicalReport.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/treport/YukiTanaka-TR2015.pdf]], [[slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Tanaka-BS-2015/GT2015_YukiTanaka_slides_Final.pdf]]
-Abstract:
High-performance 3D Network-on-Chips (3DNoCs)
have become viable solutions for future many
core systems. Through-Silicon-Via (TSV) is a
prominent element of 3D-NoC design to support good
performance and low power consumption. 3D-OASIS
Network-on-Chips (3D-ONoC) was previously
proposed in our laboratory. In this thesis, we integrate
TSV connections in order to obtain a reliable 3DONoC
router. We found out from the performance
evaluation that the proposed router correctly delivers
messages via the integrated TSV connections without
observing any timing violations. We also noticed that
the area of the proposed router is 51939μm, the power
is 403μW, and the speed is 490.1MHz.
***Towards the Design of Dependable Real-Time System for Remote Health Monitoring of Elderly People [#xdc86a1d]
-''Author'': Yumiko Kimezawa
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering,The University of Aizu, Feb. 2013, Ref. 23YK-MT12.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Kimesawa-MS-12/m5151117_MS_thesis_slides.pdf]]; [[thesis>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Kimesawa-MS-12/m5151117_MS_thesis.pdf]]
-''Abstract'': Recent technological advances in wireless networking, microelectronics and the Internet
allow computer and biomedical scientists to fundamentally modernize and change
the way health care services are deployed.
Electrocardiography is a commonly used, non-invasive procedure for recording
electrical changes in the heart. The record, which is called an electrocardiogram (ECG or EKG), shows the series of waves that relate to the electrical impulses which occur during each beat of the heart. An effective approach to speed up this and other biomedical operations is to integrate a very high number of processing elements in a single chip so that the massive scale of fine-grain parallelism inherent in several biomedical applications can be exploited efficiently.
This thesis exploits parallel processing approach to process multi-lead ECG electrocardiography computational kernels in parallel. Our idea is to implement the traditional multi-lead bulky electrocardiogram on a programmable embedded multicore SoC which is small and more efficient. The implemented multicore SoC system is a high performance device that incorporate multiple building blocks from multiple sources.
The presented solution in this thesis paves the way for real-time processing diagnosis of heart-related diseases. The proposed system was designed in hardware and evaluated with several ECG data-set. Prototyping of Multicore SoC on FPGA involves building a functional system model that lets the designer evaluate various aspects of a design, and provide a realistic projection about the final product implementation.
*** Interactive Real-time Interface for Smart Remote Health Monitoring and Analysis [#b5b64f3b]
-''Author'': Achraf Ben Ahmed
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering,The University of Aizu, Feb. 2013, 'Ref. 22ABA-MT12.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Achraf-MS-12/m5151161_MS_thesis_slides.pdf]]; [[thesis>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Achraf-MS-12/m5151161_2012_MS_thesis.pdf]]
-''Abstract'': Recent technological advances in sensors, low-power microelectronics, and wireless networking enabled the development of single-chip solutions for computationally intensive biomedical applications with potential health benefits for a large number of
individuals.
One important application, in this respect, is the real-time remote and accurate
analysis of human heart activity, which has always been a challenging problem for
biomedical engineers. Despite the decreased mortality rate, heart disorders (like Cardiovascular Disease) are one of the main causes of death around the world. As a result,
detection of irregularities in the rhythms of the heart is a growing concern in medical
research. In addition to the detection, the collection and the visualization process for
such kind of data is crucial due to the huge amount of data that produce an electrocardiogram especially when the records are made for a long time. Another concern is the
real time monitoring and the capability to monitor and analyze the produced data at
real time.
In this thesis, an interactive real-time (IRT) interface, integrated with a multi-lead
period-peak detection (PPD) algorithm for ECG processing, has been designed to overcome the limitations of existing ECG monitoring systems.
The evaluation results show that our proposed monitoring platform is characterized
to be scalable by the capability to handle huge data coming from many nodes at the
same time. Moreover, the interface was built following the MVC pattern (ModelsViews-Controllers) and using a SSL protocol (Secure -Sockets-Layer) for the data exchange, which made our interface secure and the data privacy of the nodes is ensured.The simulation shows also that the real time monitoring can be ensured by our tool
which contains a live visualization mode which is activated as soon as there is new incoming data.
*** A Quantitative Performance Study of Shared Memory Multicore System [#a24ac5fa]
-''Author'': Takayuki Ochi
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering,The University of Aizu, Feb. 2013, Ref. 21TO-GT12.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Ouchi-BS-12/s1160053_GT2012_slides.pdf]]
-''Abstract'':Systems-on-chip (SoCs) have evolved from fairly simple
unicore, single memory designs to complex heterogeneous
multicore SoC architectures consisting of
large number of IP blocks on the same silicon. To meet
high computational demands posed by latest consumer
electronic devices, most current systems are based on
such paradigm, which represents a real revolution in
many aspects in computing. However, multicore systems
are more complex and there are a lot of parameters
and tradeoffs that affect the overall performance.
Those parameters are important for software and hardware
developers and should be carefully studied and
selected.
In this thesis, I evaluate a multicore architecture
performance by software simulation. We focus on
cache size, bus width of interconnect, frequency of
processor, core/interconnect bus and cache schemes.
Moreover, we study the performance of several interconnect
models as they are one of the most crucial factors
on multicore architectures. The simulation results
and analysis show the influence of such parameters on
multicore architectures.
*** Hardware Prototyping and Evaluation of Distributed Routing Core Network-Interface for OASIS NoC Architecture [#gddc29fc]
-''Author'': Shuu Endou
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering,The University of Aizu, Feb. 2013, Ref. 20SE-GT12.
-[[&ref(pdf-download.gif,,40%); slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Endou-BS-12/s1170180_GT2012_slides.pdf]]
-''Abstract'':Network-on-chip (NoC) has been presented as a promising solution for System-on-Chip (SoC) interconnects bottleneck problem. One of the important NoC's components is the Network Interface (NI). The main functions of a Network Interface (NI) are flitization, conversion of a packet into flits, and deflitization, conversion of flits into a packet. They help data transmission from source core to destination one by adding some control information to packets.
This thesis proposes a design and evaluation of a Core-Network-Interface (CNI) for distributed routing into a real NoC architecture. The CNI was prototyped into a Cyclone II Altera FPGA board. The CNI was also evaluated it terms of complexity and performance on a 2x2 mesh NoC configuration.
*** OASIS Network-on-Chip Prototyping on FPGA [#n228e7d6]
-''Author'': Kenichi Mori
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering,The University of Aizu, Feb. 2012, Ref. 19KM-MT11.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Mori-MS-11/m5141120_2011_MS_slides.pdf]]
; [[thesis>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Mori-MS-11/m5141120_2011_MS_thesis.pdf]];
[[Technical Report>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Mori-MS-11/m5141120_2011_MS_tr.pdf]]
-''Abstract'': Network-on-Chip (NoC) architectures provide a good way of realizing efficient
interconnections and largely alleviate the limitations of bus-based solutions.
NoC has emerged as a solution to problems exhibited by the shared bus
communication approach in System-On-Chip (SoC) implementations. This includes the lack
of scalability, clock skew, lack of support for concurrent communication, and power
consumption.
Network-on-Chip communication is realized by packet, the communication
requirement of this paradigm is affected by architecture parameters selection such as
topology, mapping, routing, buffer size etc... Focusing on topology, there are 2 selections: regular topology and custom topology. These choices also affect hardware
complexity and performance.
The challenges in this thesis are the designing of a prototype NoC with a real ap-
plication including parallel execution and inserting Short Pass Link (SPL) to regular
2D-mesh topology NoC as an optimization technique and prototyping on FPGA to
evaluate its performance and hardware complexity accurately. The optimization is ex-
ecuted based on our designed NoC named OASIS, JPEG encoder, which is selected
as a target real application, is divided into 8 tasks to be mapped to each node. Then,
design appropriate Network Interface (NI) for the application tasks, and improve the
communication delay by employing the SPL insertion algorithm. Finally, I evaluated
its communication performance improvement and accurate hardware utilization vari-
ance.
I prototyped the system in hardware and I evaluated its performance in terms of la-
tency and power using Dimension reversal transaction, Hotspot transaction, and a par-
allelized JPEG encoder. From the performance evaluation results, I concluded that the
Dimension reversal execution time in ONoC with SPL decreased by 29.7%, when com-
pared to the original base architecture, Hotspot execution time decreased by 16.9%,
and JPEG encoder decreased by 43.7%. The area of these three test benches in ONoC
with SPL increased under 5%, the Dimension reversal increased by 2.93%, the Hotspot
increased by 2.60%, and the JPEG encoder increased by 4.28%. The power consump-
tion slightly increased by 0.49% on average. The results indicate that the architecture
is effective in balancing the power and performance of NoC design.
*** On the Design of a 3D Network-on-Chip for Many-core SoC [#k7419422]
-''Author'': Akram Ben Ahmed
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering,The University of Aizu, Feb. 2012, Ref. 18ABA-MT11.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Akram-MS-11/m51411532011_MS_thesis_slides.pdf]]
; [[thesis>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram-MS-11/m5141153_2011_MS_thesis.pdf]];
[[Technical Report>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Akram-MS-11/m5141153_2011_MS_tr.pdf]]
-''Abstract'': Global interconnects are becoming the principal performance bottleneck for high
performance Systems-on-Chip (SoCs). Since the main purpose for this system is to
shrink the size of the chip as smaller as possible while seeking at the same time for
more scalability, higher bandwidth and lower latency. Conventional bus-based-systems
are no longer reliable architecture for SoC due to a lack of scalability and parallelism
integration, high latency and power dissipation, and low throughput. During this last
decade, Network-on-Chip (NoC) has been proposed as a promising solution for future
systems on chip design. It offers more scalability than the shared-bus based intercon-
nection, allows more processors to operate concurrently.
Despite the higher scalability and parallelism integration offered by the Network-
on-Chip (NoC) over the traditional shared-bus based systems, it’s still not an ideal
solution for future large scale Systems-on-Chip (SoCs), due to some limitations such
as high power consumption, high cost communication, and low throughput. Recently,
merging NoC to the third dimension (3D-Noc) has been proposed to deal with those
problems, as it was a solution offering lower power consumption and higher speed.
In this this thesis, a 3D-NoC named OASIS (in short 3D-ONoC) has been designed
to overcome the limitations of 2D-OASIS previously made in our research group. In
this dissertation we describe the 3D OASIS-NoC architecture in a fair amount of detail
and present evaluation results and comparison between 3D and 2D OASIS.
Evaluation results show that despite the increasing hardware complexity, 3D ONoC
reduces the number of hops by 40% and also the average stall count by 74%. As a result
the execution time improved by 36%. By increasing the traffic load with the Matrix application, the execution time could be further enhanced from 36% obtained with one
matrix multiplication to more than 41% with 1, 2, 3 and 4 matrix multiplications.
*** Design of Parametrizable Network-on-Chip [#g46f826b]
-''Author'': Shohei Miura
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering,The University of Aizu, Feb. 2012, Ref. 17SM-MT11.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Miura-MS-11/m5141118_2011_MS_slides.pdf]]
; [[thesis>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Miura-MS-11/m5141118_2011_MS_thesis.pdf]]
-''Abstract'': Since deep sub-micron processing technologies have advanced, LSI designers can integrate a lot of IP cores on a single chip as System-on-Chip (SoC). Many application cores work on a single chip concurrently, and a parallel processing has grown in importance. Network-on-Chip (NoC) is proposed as a better scalable architecture and gains a better performance in parallel processing than traditional multicore interconnection architecture. One important design challenge of NoC is the buffer design. Buffers usually occupy most significant portion of NoC area and consume large power. As well, they have an impact on a performance of NoC. Therefore, buffer resources should be allocated to each buffer on demand. That means heavy traffic channels get large buffers, and the others should reduce the buffer overhead. Since traffic load depends on target applications, methodologies to monitor traffic load on NoC-based SoC are necessary and also target applications connected to NoC should be measured whenever the applications are changed.
In this thesis, we propose architecture and design of a Parameterizable Network-on-Chip system (PNoC), which monitors traffic and focuses on buffer depth.
In PNoC, monitoring probes are embedded in routers and record traffic load and congestion information during execution time. After simulation, we obtain traffic information and compute new buffer depth, then reconfigure buffer depth. Using the walkthrough, we can gain low buffer overhead to reduce NoC area without a drop in performance. The proposed architecture is implemented on Altera Stratix III development board. Target application cores are several Altera Nios II processors and the processors communicate between each other using several traffic patterns. The proposed architecture is evaluated on application processing time and system area. As a result, comparing NoC with wormhole type and virtual cut through (VCT) type with NoC having optimal buffers, application processing time is the similar to VCT type and better than wormhole type, and system area is reduced by 22% compared to VCT type. Therefore, PNoC, which is the proposed architecture, can gain low buffer overhead and NoC design parameters become more optimized for target application
*** Architecture and Design of Core Network Interface for Distributed Routing in OASIS NoC [#sf0a04d4]
-''Author'': Ryuya Okada
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2012, Ref. 16RO-GT09
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Okada-BS-11/s1160048_GT2011-slides.pdf]]
-[[&ref(pdf-download.gif,,40%); Technical Report>http://webfs-int.u-aizu.ac.jp/~benab/publications/treport/RyuyaOkada-TR2011.pdf]]
-''Abstract'': The main functions of a Network Interface (NI) are flitization,
i.e., conversion of a packet into flits, and deflitization,
i.e., conversion of flits into a packet. They
help data transmission from source core to destination
one by adding some control information to packets.
There are 2 types of NIs such as source routing NI
and distributed routing NI. The former has a path information
table for the complete route information. In
the latter, a packet header is compact. We proposed an
architecture and a design of a Core Network Interface
(NI) for distributed routing on an Altera FPGA board.
The designed NI occupies less than 1% of area utilization
of Cyclone II FPGA. In addition, the data transmission
delay from core to router and from router to
core path is equal to 3 + 4(N - 1) clock cycles / packet
(N = Number of flits).
*** Performance and Complexity Study of Multi-QueueCore Systems [#c632ac33]
-''Author'': Tomotaka Kasahara
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2012, Ref. 15TK-GT09
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Kasahara-BS-11/s1160056_GT2011.pdf]]
-''Abstract'': A multi-core system can improve hardware performance,
but such system have some problem. When
several cores try access to shared peripheral at same
time, they can not get at sharedmemory through shared
bus. To use “Network” on chip solve this problem. In
this work, we analyze a Multi-core system designed
by Qsys tool. Qsys tool can use Network on Chip to
connect the processors each other.
*** Development of Parallel Queue Processor Architecture and its Integrated Development Environment [#nb21487b]
-''Author'': Hiroki Hoshino
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering, The University of Aizu, Feb. 2011, Ref. 14HH-MT10.
-[[&ref(pdf-download.gif,,40%); Slides>http://www.u-aizu.ac.jp/~benab/publications/theses/Hoshino-MS-10/m5131139_2010_MS_Presentation.pdf]]
; [[thesis>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Hoshino-MS-10/m5131139_2010_MS_thesis.pdf]]
-''Abstract'': The high performance processors have been required long time. The instruction
level parallelism (ILP) is one of the important essences to enable processors to be high
performance processors. There are two main techniques to exploit ILP, VLIW and
superscalar. The complex compiler to generate the VLIW instruction is needed in the
aspect of software. The superscalar scheme needs large area for the finding ILP with
big instruction window and register renaming techniques.
The queue instruction set processor architecture is the approach to get ILP with
simple techniques. The intermediate results are saved in the queue register, which
follows the first-in first-out rule, instead of the random access register. The instruction
reads the operand from the head of the queue register implicitly. The execution result
is written into the tail of the queue implicitly. There is no need to specify the register
number in the instructions. The instructions for the queue processor are generated by
traversing the data acyclic graph in level ordermanner. There are promising advantages
in the queue processor; high ILP and short instruction width.
In this thesis, the superscalar, out-of-order, produced order parallel queue processor
(QC-3) is designed. The design is implemented by Verilog-HDL to get high accurate
evaluation result with synthesis tools provided by a company. And also the queue
compiler, the assembler and its simulator are proposed. These design suites are useful
to design the programs for the queue processor.
***Design and Evaluation of Dual Mode Processor Architecture [#p5e33c44]
-''Author'': Taichi Maekawa
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering, The University of Aizu, Feb. 2011, Ref. 13TM-MT10.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Maekawa-MS-10/m5131144_Maekawa_MS_presentation.pdf]]
; [[thesis>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Maekawa-MS-10/m5131144_Maekawa_MS_thesis.pdf]]
-''Abstract'': Current processor architecture are implemented many techniques and mechanisms
including register renaming, scheduling windows, and so on to achieve a high performance
using some kinds of parallelisms. However added or modified modules which
realize the techniques and mechanisms increase an architecture area and power consumption.
The queue processors have been developed to overcome these challenges
and generated as a high performance, small area and low power consumption processor.
However the queue processors can not execute existed programs which are made
by predecessors. Because the queue processors do not have a compatibility with programs
made for other processors. At this time, user must make the queue programs
from scratch and can not use all existed programs. This thesis presents the 32bits dual
execution processor (DEP32) which can execute queue programs and Java byte code
without a considerable hardware increase to the base queue processor to reduce a incompatibility
of the queue processor The most feature of Java programs is that Java
programs use Java byte codes. Since almost Java byte codes are supported by the
Java virtual machine except for Java processors, almost Java programs are executed on
many processors. And there are many advantages in Java programs, an object-oriented
paradigm, a robustness, a security. Thus, Java programs have been used in the internet.
Recently Java programs are also used in an embedded systems, because Java programs
also have portability. To support Java byte code can reduce a incompatibility of the
queue processors. In addition, two computation models of both queue and Java have
have many analogies. The DEP32 can overcome a incompatibility of queue processors
using a support of the Java computation model without considerable hardware increase
to base queue processor. The DEP32 is designed using the Verilog HDL and evaluated
by Altera tools. From the design and evaluation results, the DEP32 core which executes
both queue and Java computation model increase 7% hardware area and 9%
power consumption to the base queue processor.
*** Produced Order Queue Compiler Design [#x1b7d493]
-''Author'': Masashi Masuda
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering, The University of Aizu, Feb. 2011, Ref. 12MM-MT10
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Masuda-MS-10/m5131145_Masashi_MS_Presentation.pdf]]
; [[thesis>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Masuda-MS-10/m5131145_Masashi_MS_thesis.pdf]]
-''Abstract'': Queue processor arranges high-speed registers in a first-in-first-out queue. All read
accesses are performed in the head of the queue and all writes accesses are performed
in the tail of the queue. This characteristic allows the exploitation of maximum par-
allelism and improves code density. Compiling for the QueueCore requires a new
approach since the concept of registers disappears. We propose a new efficient code
generation algorithm for the QueueCore.
Our queue compiler translates any programs written in C language into queue proces-
sor’s assembly codes. The queue compiler design is completely different from any
other existing compiler, this due to the special characteristics of queue computing.
First, compiler generates the data flow graphs from the input program. Then a set
of custom analysis and transformations are performed to compute the offset reference
values. Finally, the data flow graphs are scheduled in a level-order manner to generate
the final instruction sequence for the target queue machine. The queue compiler can
compile any program into queue object code.
For a set of numerical benchmark programs our compiler extracts more parallelism
than the optimizing compiler for a RISC machine by a factor of 1.38. Through the use
of QueueCore’s reduced instruction set, we are able to generate 20% and 26% denser
code than two embedded RISC processors.
*** OASIS NoC Topology Optimization with ShortPath Link [#r937f7e8]
-''Author'': Takahiro Uesaka
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2011, Ref. 11TU-GT10.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Uesaka-BS-10/s1150030_GT2010.pdf]]
-''Abstract'':
*** Shared Memory MultiQueueCore Processor Design' [#de002c94]
-''Author'': Shunichi Kato
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2011, Ref. 10SK-GT10.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Kato-BS-10/s1150059_GT2010.pdf]]
;[[''Thesis''>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Kato-BS-10/graduation_thesis_final_edition.pdf]]
-''Abstract'': The multi-core systems have been proposed because a
processor performance cannot be achieved by simply increasing
clock frequency. The issues of synchronization
mechanisms and memory arbitration are very important
in constructing the multi-core system. We implemented
the Bus Arbitration to control the memory accesses in a
multi-core system. All processor cores in the system are
connected via a shared bus and communicate using the
shared memory. The number of the memory accesses
affects improving performance in our shared memory
multi-queue core system. In this thesis, we discuss a
design of Bus Arbitrator Mechanism and performance
results.
*** Multicore SoC Architecture for Realtime Data Intensive ECG Processing [#g858d419]
-''Author'': Yumiko Kimezawa
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2011, Ref. 9UK-GT10.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Kimezawa-BS-10/s1150072_GT2010_Feb122011.pdf]]
-''Abstract'':
*** Development Environment for Single Chip Computer intended for Queue Computing Development and Education [#ic077e8a]
-''Author'': Yuuki Omoto
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2010, Ref. 8YO-GT09
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Omoto-BS-09/GT2009_Yuuki_Omoto_final.pdf]]
-''Abstract'': Queue based system architecture oers an attractive
advantages, especially for the development of embedded
computer systems, because it features high performance,
low complexity, and low power consumption. In
this research, we present architecture and designed of a
complete simple computer systems (named QSoC) intended
for development and education of Queue computing.
The QSoC architecture was synthesized for a
functional FPGA device equipped with several peripheral
components. We present architecture description
and evaluation results.
*** Architecture and Design of Application Specific Multicore SoC [#x3b36438]
-''Author'': Haga Yasuyoshi
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2010, Ref. 7HY-GT09.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Haga-BS-09/GT2009_YasuyoshiHaga_final.pdf]]
-''Abstract'': Electrocardiography is an interpretation of the electrical
activity of the heart over time captured and externally
recorded by electrodes. It is an essential practice in heart
medicine, which faces computational challenges, especially
with 12 lead signals or more. In this research, we
exploit parallel processing techniques to process electrocardiography
computation kernels in parallel. This work
is part of a project named BANSMOM project 1. This
thesis presents a hardware implementation of the electrocardiogram
(ECG) processing system. Our system is
based on MultiCore System on a Chip (MCSoC) architecture.
We provided prototype system for 1-lead ECG
signal processing. This system is implemented in FPGA
(target device is Altera Stratix III). The result of logic
synthesis is 14% of logic utilization. In addition, this
system has been simulated on a FPGA board using actual
data from PhysioBank database.
*** Development of User Friendly Assembler for Queue Computers [#hfa70628]
-''Author'': Reo Honjoya
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2010, Ref. 6RO-GT09.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Reo-BS-09/GT2009_ReoHonjoya_final.pdf]]
-''Abstract'':
*** Optimizations Techniques and FPGA Prototyping of OASIS Network-on-Chip [#b924e273]
-''Author'': Kenichi Mori
-''Degree and Year'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2010, Ref. 5MK-GT09.
- [[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Mori-BS-09/GT2009_KenichiMori_final.pdf]]
-''Abstract'': Current Systems-on-Chip (SoCs) execute applications
which demand extensive parallel processing. Networkson-
Chip (NoC) provide a good way of realizing efficient
interconnections, and largely alleviate the limitations of
bus-based solutions. In this paper, we propose an optimized
NoC (ONoC) which is optimized to transmit accurate
data. We verified RTL level simulation and estimated
hardware performance to evaluate hardware cost,
accuracy and speed.
*** Architecture and Design of Parameterizable Network-on-Chip [#m23b887b]
-''Author'': Shoehi Miura
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2010, Ref. 4MS-MT09.
-[[&ref(pdf-download.gif,,40%); Sides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Miura-BS-09/GT2009_ShoheiMiura_final.pdf]]
-''Abstract'': Network-on-Chip (NoC) largely overcomes the limitations
of the bus-based architecture. A NoC system consumes
a large silicon area and a lot of power because it
tends to become a larger system than a normal Systemon-
Chip (SoC). Reducing the area is necessary in order
to improve performance of a NoC system. Buffers in
the switches occupy the most area and power consumption.
The appropriate buffer size depends on traffic load
through channels in a system; however, the estimate environment
for NoC that monitors traffic information and
seeks hot spots has not yet been established.
This paper presents Parameterizable Network-on-Chip
(PNoC), which monitors traffic information and traffic
load and finds the appropriate buffer size to target applications.
With several traffic patterns, including a huge
amount of traffic and a small amount of traffic, this simulation
result shows that the appropriate buffer size for
each traffic pattern and each amount of traffic are different.
As a result, when comparing normal NoC systems
and PNoC systems, area and power consumption
of PNoC are reduced, and system performance is not decreased;
ALUT counts reduced up to 63%, and register
counts is also decreased by up to 84%.
*** Graph Transformation Methods and Theoretical Performance Evaluation of Queue Computation Models [#jce863a7]
-''Author'': Masashi Masuda
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2009, Ref. 3MM-GT08.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Masuda-BS-08/masuda-pres-2008.pdf]]
-''Abstract'': Queue is used to store intermediate calculation results
into a First In First Out (FIFO) data structure. A
Queue system can be classified into three main models
according to the rules of enqueuering and dequeuring.
These models are called: The Produced-Consumed
Order Queue Computation Model, the Consumed Order
Queue Computation Model, and the Produced Order
Queue Computation Model. There are problems in
making programming Queue, and these problems are
named the Multiple Data Produced problem, the Cross
Arc problem, and the Instruction Hole problem. This
thesis presents solutions for these problems and a comparison
of three queue computation models’ fundamental
characteristics (instruction number, instruction level
parallelism, execution cycles, program size).
*** Advanced Hardware Optimization Algorithms for High Performance Queue Processor Architecture [#le7e4395]
-''Author'': Hiroki Hoshino
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2009, Ref. 2HH-GT08.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Hoshino-BS-08/hoshino-pres-2008.pdf]]
-''Abstract'': Instruction level parallelism (ILP) is important to improve
performance of general processors. ILP allows
the instruction of a sequential program to be executed in
parallel. However aggressive optimization of the compiler
and some bigger hardware mechanisms are needed
to find and exploit ILP.
In this research the queue based instruction set architecture
is used. This architecture oers an attractive option
in the design of embedded systems. Instructions
based on queue machine are generated using level order
traversal that allows us to find all available parallelism
in programs. Thus the hardware executes instructions in
parallel with little eort.
Some optimization and design issues for queue processor
architecture (QC) have been proposed. This processor
implemented the oset references, the memory
extension instruction, the pipelined structure and the
floating point execution unit. However the optimized
QC cannot reuse data in the queue register. Also, if the
queue register is full of available data, there is the critical
problem that the processor cannot execute any more.
This research describes the solution of problems such as
the reusing data problem and the queue register overflow
problem.
*** Research on Hardware Design of Dual-Mode Processor Architecture [#l898c890]
-''Author'': Tachi Maekawa
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2009, Ref. 1TM-GT08.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Maekawa-BS-08/maekwawa_GT2008_presentation.pdf]]
-''Abstract'': I present the architecture and preliminary evaluation results
of a novel dual-mode processor architecture which
supports queue and stack computation models in a single
core. The core is highly adaptable in both functionality
and configuration. It is based on a reduced bit produced
order queue computation instruction set architecture and
functions into Queue or Stack execution models. This is
achieved via a so called dynamic switching mechanism
implemented in hardware.
The current design focuses on the ability to execute
Queue programs and also to support Stack based programs
without considerable increase in hardware to the
base architecture. The architecture description and design
results are presented in a fair amount of detail.
***[[MIT Theses>http://www-mtl.mit.edu/researchgroups/icsystems/theses.html]] [#r6e4b43b]
*Technical Reports [#h83accbe]
-https://adaptive.u-aizu.ac.jp/?page_id=6646
終了行:
CENTER:[[ASL &ref(download.png,,30%); Templates for Technical Reports>http://webfs-int.u-aizu.ac.jp/~benab/publications/ASL-LATEX-TEMPLATE.zip]]
----
CENTER:[[&ref(ghat.jpg,,60%);>http://aslweb.u-aizu.ac.jp/benlab/index.php?Theses]]
#CONTENTS
***AY2020 Design of Hand Gesture Recognition based on Deep Neural Network [#j9344d7b]
-Author: AGEISHI Naoto
-Degree: BS
-[[Thesis>]]
-[[slides>]]
-[[Source Code>]]
***AY2020 Design of Interactive Software Interface for AI-Enabled Real-time Biomedical System [#j1fd1c28]
-Author:AOYAMA Naoki
-Degree: BS
-[[Thesis>]]
-[[slides>]]
-[[Source Code>]]
***AY2020 Hardware Acceleration of Convolution Neural Network on FPGA for Real-time Biomedical System [#c4e1bd46]
-Author:OKADA Yuki
-Degree: BS
-[[Thesis>]]
-[[slides>]]
-[[Source Code>]]
***AY2020 Research on Collaborative Learning Algorithm for AI-Enabled Real-time Biomedical System [#z9be34df]
-Author:PHEA Sinchhean
-Degree: BS
-[[Thesis>]]
-[[slides>]]
-[[Source Code>]]
***”Performance Study of Character Recognition with Feed-Forward Neural Network”, Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2018. [#i947aeb8]
-Author:Masaki Yamada,
-Degree: BS
-[[Thesis>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/37.s1220042_Yamada_Masaki-BS-18/s1220042_thesis.pdf]]
-[[slides>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/37.s1220042_Yamada_Masaki-BS-18/s1220042_slides.pptx]]
-[[Source Code>https://drive.google.com/drive/folders/1OKikrf5IU8kyokuvnxWFatTbCjH1NoCy?usp=sharing]]
-''Abstract'': Neural Networks (NNs) in embedded systems are usually implemented on microcontrollers. A NN implementation on a microcontroller lacks the
performance enhancement of parallel design. The decision of implementing an NN architecture on FPGA benefits from the parallelization and
configurability. Feed Forward Neural Networks (FFNN) with floating point (FP) precision performs a large number of primary products and sums. Also, for
each neuron of FFNN within the hidden layers, a non-linear function computation is required to determine the activation value of the neuron. Without
a specialized FP hardware, such computations can reduce the performance of the system. In this thesis, I present a performance study of character recognition
with Feed-Forward Neural Network towards efficient hardware implementation on FPGA (Fig. 3).
***”Design of a Leaky, Integrate and Fire (LIF) Neuron Core for NASH System”, Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2018. [#x191b349]
-Author: Kanta Suzuki
-Degree: BS
-[[Thesis>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/38-s1220215_Suzuki_Kanta-BS-18/s1220215.pdf]]
-[[slides>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/38-s1220215_Suzuki_Kanta-BS-18/s1220215_slides.pptx]]
-[[Source Code>https://drive.google.com/drive/folders/1vrKp6e_SlpVLZCXpXjd8W0gCg-j7vBF7?usp=sharing]]
-''Abstract'': Current interest in neuromorphic computer architectures is enormous, due to its conceptual attractiveness
and its potential applications to sensor networks, robotics, computer vision and other field. Certain types of neural networks (e.g.,CNNs) are so resource intensive
that it requires sever class computers to model, train and implement them.So their efficiency is not even close to those of the biological brain. Therefore
we develop the Neuro-inspired ArchitectureS in Hardware (NASH) System which use Spiking model. My research goal design of Leaky,Integrate and Fire Neuron
Core for NASH System.
***”Study of a Neuro-inspired Architecture in Hardware”, Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2018. [#c9d08113]
-Author: Kosuke Takakuwa,
-Degree: BS
-[[Thesis>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/39-s1220236_Takakuwa_Kosuke-BS-18/thesis.pdf]]
-[[slides>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/39-s1220236_Takakuwa_Kosuke-BS-18/takakuwa_slides.pptx]]
-''Abstract'': Our brain is a low-power, fault-tolerant, and highperformance
machine. Spiking neural network (SNN) is considered to be the third generation of neuron network
models. It can mimic the key functions of the human brain. SNN simulations are a flexible and powerful
method for investigating the behavior of neuronal systems. However, simulation of the spiking neural
networks in software is slow. This thesis studies our proposed neuro-inspired architecture for implementing SNN in hardware
***Title: 高性能なメニーコアシステムオンチップの為のマイクロリングの障害耐性を持つ光通信オンチップネットワーク [Micro-ring Fault-resilient Photonic On-chip Network for Reliable High-performance Many-core Systems-on-Chip] [#e3734b91]
-Author: Michael Conrad Meyer
-Degree: PhD Thesis, Graduate School of Computer Science and Engineering, University of Aizu, March 2017.
-[[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Mike-PhD-17/MichaelConradMeyer_PhD_Thesis_2017.pdf]];
[[Slides>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Mike-PhD-17/MichaelConradMeyer_PhD_Slides_2017.pdf]]
-Abstract-J: 人々はコンピューティング・システムにおいて、常により高度な機能を求め続け、 その結果、技術の規模を飛躍的に拡大してきました。しかし、この状況にも変化の
兆しが見えてきました。一つのIC チップ上で消費される電力は増加の一途をたど り、近年ではCPU のコアの能力そのものと同等に、そこで消費される電力の効率性 も重要視されてきました。現在、電子ネットワークオンチップ(NoCs)は様々な要 因によって性能の限界に近づいてきています。 オンチップ光ネットワーク(PNoCs)は現在、研究対象として最も注目されている
ものの一つです。PNoCs はこれまでの電子NoCs に比べ幾つかの点で優位性を示し ております。それらは、高帯域幅でのサポート、距離に依存しない消費電力、レイ
テンシや1ワットあたりの性能などです。波長分割多重通信は複数の並列での光の 流れを一つの導波管で行うことができ、MR は40GHz の速度での切り替えを可能
で、波長選択変調器またはスイッチとして使用することができます。このことは、 複数ビットのデータを同じ導波管で同時に伝達することが可能ということで、一つ
の導線に1ビットという制限のある電子回路とは異なるものです。また、他の利点 としては、一度経路が形成されてしまえば、データは起点から終点まで一気に到達
することが挙げられます。つまり、データのバッファーを何度も行う必要がないと いうことであり、消費電力を抑えることにもつながります。
フォトニック領域は放熱によって瞬間的な故障を引き起こすことはありませんが、 経年劣化はもちろん工程変動(PV)や熱変動(TV)には影響を受けます。劣化は動
的な構成品及び温度変化の高い部分で発生します。オプティカル領域では、MR や 導波管、ルータなどで故障が発生します。動的機器の光検出器などは、導波管など
の受動的機器に比べ故障発生の割合が高くなります。また、PNoC に高い脆弱性が あるとされた場合、故障により単一障害が発生する可能性や、故障したMR がメッ
セージの誤伝達や喪失を起こす可能性があります。 本論文では、将来のオンチップ光ネットワークのための、新たな光回路アルゴリズ
ムとその構造を提案します。 第一に、複数の故障したMR を処理する能力を持つ、新たな障害耐性のある光スイ ッチを提案します。このスイッチは5つのノンブロッキングポートを持つ光ルータ
をベースにしています。これは、MR が反対方向に進まないように(東から西へま たは北から南へ)する必要があります。またこのスイッチはPHENIC で使用された ハイブリッド空間スイッチィングも取り扱うことが可能です。
第二に、MR の故障をチェックし、適切なMR を割り当てる障害耐性パス設定アルゴ リズムを提案します。これは以前の二つのMRST の一つが異常状態にあるというこ とを意味しています。さらに、これには2つのMR 設定テーブルを使用しなければ なりません。一つは通常使用のため、もう一つはバックアップパスのためとなりま す。これにより、全てのルーティングの決定が一つの光スイッチの中で行われま
す。 第三に、オプティカル層での電力見積りのスキームを提案します。これにはルート 決定に使用できるほどの速さが必要です。計算のスピードが要求されるため、計算
そのものが単純でなければなりません。 最後に、ネットワークが「歪み」に基づいたルーティングの決定を可能とする回路 アルゴリズムとその構造を提案します。この歪み値は故障したMR の数とノードの
光力に基づきます。高温のノードや故障数の多いノード、高通信量を避けることに より、ネットワークの信頼性と性能が進展することになります。 提案したアルゴリズムと構造を、複数のフォトニック部品を組み込んだ詳細な物理
モデルの離散型シミュレーターで評価を行い、その結果、性能と電力において最小 の犠牲で、高い信頼性を得ることができました。完成したシステムはオプティカル 構成品における工程変動および熱変動の問題にも対処でき、これまでに存在したも のよりも信頼できるシステムとなりました。
-Abstract-E: Humans continue to demand higher performance from their computing systems,and as a result we have had aggressive increases in the scaling of technology, but
this is showing signs of change. The power consumed by a chip is ever increasing,and recently the power efficiency of communications has become as important as
the computational power of the cores. Typical electronic Networks-on-Chip (NoCs) are reaching their performance limitations thanks to various factors.
One highly sought after technology is Photonic Networks-on-Chip (PNoCs). PNoCs offer several benefits over conventional electrical NoCs, such as high-bandwidth
support, distance independent power consumption, lower latency, and improved performance-per-watt. Wavelength Division Multiplexing allows for multiple parallel
optical streams of data to concurrently transfer through a single waveguide and MRs can be switched at speeds as high as 40 GHz to realize wavelength-selective
modulators or switches. These technologies allow for multiple bits of data to travel concurrently through the same waveguide, which contradicts the one bit per wire
limitation of electronic circuits. Another benefit is that data is transferred in an end-to-end fashion once a path is configured, meaning that the data does not need
to be buffered multiple times, and thus saving power. The photonic domain is immune to transient faults caused by radiation, but is still susceptible to process variation (PV) and thermal variation (TV) as well as
aging. The aging typically occurs faster in active components as well as elements that have high thermal variation. In the optical domain, faults can occur in MRs,
waveguides, routers, etc. Active components, such as photodetectors, have higher failure rates than passive components, e.g. waveguides. Moreover, when paired with
the fact that a PNoC is highly vulnerable, as a fault may expose the single-point failure, a faulty MR can cause a message to misdelivered or lost. In this dissertation,
a set of novel photonic routing algorithms and architectures are proposed for future on-chip optical networks.
First, a new fault tolerant photonic switch, capable of handling multiple faulty MRs. The proposed switch is based on a non-blocking 5-port optical router. It
requires no MRs to travel in the opposite direction (e.g. East to West or North to South). The switch is also able to handle the previous hybrid spatial switching used
in PHENIC. Second, a fault tolerant Path configuration algorithm, which checks for MR faults and allocates the proper MRs to be used. This means that our previous 2 state
MRST must also have a faulty state. Additionally, the algorithm must use two MR Configuration Tables, one for standard use and another to be used for the backup
paths. This makes all of the routing decisions within a single optical switch. Third, a power estimation scheme for the optical layer, which is fast enough to be used for routing decisions. Because of the speed that the calculation must be
done, the calculation itself must be simple. Finally, I propose an architecture and routing algorithm pair, which allow for the network to make “strain” based decisions for the routing. This strain value is based
on the number of faulty MRs and the optical power of a node. This should improve the networks reliability and performance by avoiding nodes with high temperature, a high number of faulty nodes, or a lot of traffic.
The proposed architectures and algorithms were evaluated with a discrete-event simulator, which incorporates detailed physical models of the photonic components. Results show that the proposed system was able to achieve a higher reliability with
minimal sacrifices in the overall system performance and energy. The resulting system is able to address the problems of process variation as well as temperature variation in optical components, and is more reliable than previous existing systems.
***Reliable Real-time Multi-core Vision System-on-Chip based on OASIS NoC [#ne29f409]
-Author:Akihito Kajikawa
-RPS (Presentation Date: August 10, 2016) [[slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/31.Kajikawa-MS-2018/Kajikawa-RPS-August102016-Slides.pdf]]; [[report.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/31.Kajikawa-MS-2018/Kajikawa-RPS-August102016-Report.pdf]]
-Abstract: TBC
***High-performance, Scalable Photonics On-chip Network for Many-core Systems-on-Chip [#u5e7acd1]
-Author: Achraf Ben Ahmed,
-Degree: PhD Thesis, Graduate School of Computer Science and Engineering, University of Aizu, March 2016.
-[[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Achraf-PhD-16/AchrafBenAhmed_PhD_Thesis_2016.pdf]];
[[Slides>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Achraf-PhD-16/AchrafBenAhmed_PhD_Slides_2016.pdf]]
-Abstract: The continuous increasing demand for higher performance computing systems and aggressive technology scaling has driven the trend of integrating a large number of cores on a single chip. In future generations of high-performance many-core systems, the efficiency of the communication infrastructure is as important as the computation efficiency of individual cores. Conventional electrical Networks-on-Chip (NoCs) are expected to reach their limits with increasing core counts because of high power dissipation and reduced performance. As indicated in the latest version of ITRS roadmap, photonic wiring is a promising interconnect paradigm for future system-on-chip (SoC) designs that can provide broadband data transfer rates unmatchable by the existing metal interconnects. When combined with Wavelength Division Multiplexing (WDM), multiple parallel optical streams of data are concurrently transferred through a single waveguide. This contrasts with the Electronic Networks-on-Chip (ENoCs) that require a unique metal wire per bit stream. The key to saving power in on-chip photonic communication comes from the fact that once a photonic path is established, the optical data is transmitted in an end-to-end fashion without the need for buffering, repeating, or regenerating.
The photonic switching/routing techniques, configuration and routing algorithm directly affect the performance and power characteristics of future many-core on-chip Photonic communication. In particular, the control module and the path configuration algorithm, which orchestrate the different electrical control function, play a significant role on how both electrical and photonic resources are utilized. In this dissertation, a set of novel photonic routing algorithms and architectures are proposed for future on-chip optical networks.
First, a new low-latency, non-blocking photonic switch/router (NBPS) and its control module capable of handling all photonic communication configuration tasks is proposed. The proposed approach is based on a new hybrid spatial switching mechanism for the photonic data stream transfer and is done by manipulating the state of the broadband switching elements. In addition, the NBPS is based on a Wavelength-Selective-Switching (WSS) for handling all communication configuration tasks.
Second, a new contention-aware path configuration algorithm and architecture for Electro-Assisted Photonic Network-on-Chip (EA-PNoC) is proposed. In addition to the main configuration tasks, the algorithm also decouples the Electronic Control Network (ECN) from the Photonic Communication Network (PCN) in a manner that both photonic and electric domains work independently from each other. The proposed algorithm orchestrates the different path configuration packets processes and significantly alleviates the contention in the ECN.
Third, a low-complexity routing and configuration algorithm for EA-PNoC is proposed. The approach is mainly based on photonic components augmented with a simple electronic control module and a so-called wavelength-shifting mechanism. The main merit of this new approach is to configure the path using photonic devices instead of the typical power-hungry electronic router. The proposed architectures and algorithms were evaluated with a discrete-event simulator, which incorporates detailed physical models of the photonic components. Results show that we could achieve better energy efficiency, as well as a considerable reduction in the blocking occurrence, which is the main source of latency and bandwidth degradation in conventional EA-PNoCs.
***Evaluation of Error Detection Mechanism for 3D-OASIS-Network-on-Chip System, [#icf22be1]
-Author: Kajikawa Akihito,
-Degree: Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2016.
-[[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Kajikawa-BS-16/Kaikawa-BS-16-gt.pdf]] ,
[[Slides>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Kajikawa-BS-16/Kaikawa-BS-16-slides.pdf]],
[[Technical Report>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Kajikawa-BS-16/Kajikawa-TR-2016.pdf]]
-Abstract: During the past decade, 3D-Network-on-Chips (3D-NoCs) have been showing their advantages against 2D-NoC systems. At the same time, concerns about their reliability have grown as well due to the different kinds of faults that these systems may encounter. Therefore, 3D-NoC must be fault-tolerant to any kind of permanent failure or run-time malfunction. To achieve this goal, a fault-detection scheme is necessary to discover the presence of fault before the propagation of the fault into the entire system and cause its collapse. Previously, 3D-Fault-Tolerant-OASIS (3D-FTO) has been designed. 3D-FTO is able to recover from a large number of faults that can occur at links, input-buffers, and crossbar. However in this system, a fault detection mechanism is absent and the diagnosis of faults rely on assuming the presence of faults at a certain period of time. This make the fault recovery less efficient and diminish the reliability of the system.
***Design and Analysis of Electrical Control Router for Hybrid Photonics NoC System [#j59bbe00]
-Author: Saito Ken,
-Degree: Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2016.
-[[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/KenSaito-BS-16/KenSaito-BS-16-gt.pdf]] ,
[[Slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/KenSaito-BS-16/KenSaito-BS-16-slides.pdf]],
// (This should be checked again) [[TR.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/KenSaito-BS-16/KenSaito-BS-16-TR.pdf]]
-Abstract: Despite the huge bandwidth that we can get from the hybrid Photonics Network-on-Chip (PNoC) architectures, the Electronic Control Network (ECN) is considered as the main source of latency and power consumption. This overhead might be caused by the use of an inappropriate message size or a non-optimized physical channel width. In this thesis, we design and we evaluate a light-weight electrical control router for hybrid PNoC system. The proposed router is optimized for the latency rather than the bandwidth, with an adequate configuration packet size and physical channel width.
***Power and Performance Comparison of Electronic 2D-NoC and Opto-Electronic 2D-NoC, [#rf432ca4]
-Author: Okada Ryoga,
-Degree: Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2016.
-[[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Okada-BS-16/Okada-BS-16-GT.pdf]],
[[Slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Okada-BS-16/Okada-BS-16-slides.pdf]],
// (This should be checked again) [[TR.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Okada-BS-16/Okada-BS-16-TR.pdf]],
-Abstract: Nowadays, increasing emerging application complexity and improvement in process technology have enabled the design of many-core processors with tens to hundreds of cores on a single chip. Photonic Network-on-Chips (PNoCs) have recently been proposed as an alternative approach with high performance-per-watt characteristics for intra-chip communication. In this thesis, we present a performance exploration of Hybrid Photonic Network-on-chip and conventional electronic Network-on-Chip to show the benefit of using photonic technologies.
***High-throughput Architecture and Routing Algorithms Towards the Design of Reliable Mesh-based Many-Core Network-on-Chip Systems, [#m661012c]
-Author: Akram Ben Ahmed,
-Degree: PhD Thesis, Graduate School of Computer Science and Engineering, University of Aizu, March 2015.
-[[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram-PhD-15/BenAhmed_Akram_DoctorThesis_March2015.pdf]], [[Slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram-PhD-15/BenAhmed_Akram_DoctorPresentation_01142015.pdf]]
-Abstract: Global interconnects are becoming the principal performance bottleneck for high
performance Systems-on-Chips (SoCs). Since the main purpose for these systems is
to shrink the size of the chip as smaller as possible while seeking at the same time
for more scalability, higher bandwidth, and lower latency. Conventional bus-basedsystems
are no longer reliable architecture for SoCs due to the lack of scalability
and parallelism integration, high latency and power dissipation, and low throughput.
During this last decade, Network-on-Chip (NoC) interconnect has been proposed as a
promising solution for future SoC designs. It offers more scalability than the sharedbus
based interconnection and allows more processors to operate concurrently.
Despite the higher scalability and parallelism integration offered by NoC over
traditional shared-bus based systems, it is still not an ideal solution for future large
scale SoCs. This is due to some limitations such as high power consumption, high
cost communication, and low throughput. Recently, merging NoC to the third
dimension (3D-NoCs) has been proposed to deal with those problems, as it was a
solution offering lower power consumption and higher speed.
As 3D-NoC architectures started to show their outperformance and energy ef-
ficiency against 2D-NoC systems, questions about their reliability to sustain their
performance growth begun to arise. This is mainly due to challenges inherited from
both 3D-ICs and NoCs: On one side, the complex nature of 3D-IC fabrics and the continuing shrinkage of semiconductor components. Furthermore, the significant
heterogeneity in 3D chips which are likely to mix logic layers with memory layers
and even more complex technologies increases the fault’s probability in a system.
On the other side, the single-point-failure nature of NoC introduces a big concern
to their reliability as they are the sole communication medium. As a result, 3DNoC
systems are becoming susceptible to a variety of faults caused by crosstalk,
electromagnetic interferences, impact of radiations, oxide breakdown, and so on. A
simple failure in a single transistor caused by one of these factors may compromise
the entire system reliability where the failure can be illustrated in corrupted message
delivery, time requirements unsatisfactory, or even sometimes the entire system
collapse.
In this thesis, we propose 3D-Fault-Tolerant-OASIS (3D-FTO), a robust faulttolerant
3D-NoC router architecture endorsed with reliable and graceful routing
algorithms. The proposed design handles a large number of faults in the inputbuffer,
crossbar, and links (which are the most susceptible components to faults
in 3D-NoC systems) leveraging the inherent structural redundancy in the architecture
to work around errors. Contrary to previous works, the proposed system
tolerates multiple faults in a single crossbar with no considerable performance degradation.
In addition, the used algorithms are always minimal (as long as there exist
one minimal path) and with the aid of Random-Access-Buffer (RAB) mechanism,
deadlock-freedom is ensured with no significant area nor power overhead.
The proposed 3D-FTO system was synthesized using Synopsys Design Compiler
at 45nm technology CMOS process technology and its layout is obtained using
Cadence SoC Encounter. The evaluation results showed the ability of 3D-FTO
to work around different kinds of faults ensuring graceful performance degradation
while minimizing the additional hardware complexity and remaining power-efficient.
***Architecture and Design of an Efficient Router for OASIS 3D Network-on-Chip System, [#h6efd877]
-Author: Mitsunari Ishii
-Degree: Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2015.
-[[slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Ishi-BS-2015/GT2015_Ishii_Final_Slide.pdf]], [[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Ishi-BS-2015/GT2015_Ishii_Thesis_Final.pdf]]; [[Technical Report>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Ishi-BS-2015/MitsunariIshii_TR2014.pdf]]
-Abstract: Reliability has become one of the main problems
that three-dimensional Networks-on-Chips (3D-NoCs)
designers are trying to deal with. This is mainly caused
by their complex nature making them easily
vulnerable to failures. As a consequence, a lot of
research has been conducted in order to make these
systems fault-tolerant while minimizing the
performance degradation as much as possible.
Previously, a reliable 3D Network-on-Chip system
was designed in our research group, named 3DOASIS-NoC
(3D-ONoC). However, in 3D-ONoC a
fault detection mechanism is absent and the diagnosis
of faults relies on assuming their presence o at a certain
period of time. In this thesis, we propose a reliable
Network Interface (NI) for 3D-ONoC system where an
efficient Error Detection mechanism is implemented.
From the evaluation results, we verified that the
proposed NI can correctly detect the presence of faults
and efficiently resend the faulty flits to their
destination
***Design and Evaluation of Efficient Error Detection Mechanism for OASIS 3D-NoC [#qa684f30]
-Author: Yuuki Tanaka
-Degree: Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, March 2015.
-[[slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Tanaka-BS-2015/GT2015_YukiTanaka_slides_Final.pdf]], [[Thesis.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Tanaka-BS-2015/GT2015_YuukiTanaka_Final.pdf]], [[technicalReport.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/treport/YukiTanaka-TR2015.pdf]], [[slides.pdf>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Tanaka-BS-2015/GT2015_YukiTanaka_slides_Final.pdf]]
-Abstract:
High-performance 3D Network-on-Chips (3DNoCs)
have become viable solutions for future many
core systems. Through-Silicon-Via (TSV) is a
prominent element of 3D-NoC design to support good
performance and low power consumption. 3D-OASIS
Network-on-Chips (3D-ONoC) was previously
proposed in our laboratory. In this thesis, we integrate
TSV connections in order to obtain a reliable 3DONoC
router. We found out from the performance
evaluation that the proposed router correctly delivers
messages via the integrated TSV connections without
observing any timing violations. We also noticed that
the area of the proposed router is 51939μm, the power
is 403μW, and the speed is 490.1MHz.
***Towards the Design of Dependable Real-Time System for Remote Health Monitoring of Elderly People [#xdc86a1d]
-''Author'': Yumiko Kimezawa
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering,The University of Aizu, Feb. 2013, Ref. 23YK-MT12.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Kimesawa-MS-12/m5151117_MS_thesis_slides.pdf]]; [[thesis>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Kimesawa-MS-12/m5151117_MS_thesis.pdf]]
-''Abstract'': Recent technological advances in wireless networking, microelectronics and the Internet
allow computer and biomedical scientists to fundamentally modernize and change
the way health care services are deployed.
Electrocardiography is a commonly used, non-invasive procedure for recording
electrical changes in the heart. The record, which is called an electrocardiogram (ECG or EKG), shows the series of waves that relate to the electrical impulses which occur during each beat of the heart. An effective approach to speed up this and other biomedical operations is to integrate a very high number of processing elements in a single chip so that the massive scale of fine-grain parallelism inherent in several biomedical applications can be exploited efficiently.
This thesis exploits parallel processing approach to process multi-lead ECG electrocardiography computational kernels in parallel. Our idea is to implement the traditional multi-lead bulky electrocardiogram on a programmable embedded multicore SoC which is small and more efficient. The implemented multicore SoC system is a high performance device that incorporate multiple building blocks from multiple sources.
The presented solution in this thesis paves the way for real-time processing diagnosis of heart-related diseases. The proposed system was designed in hardware and evaluated with several ECG data-set. Prototyping of Multicore SoC on FPGA involves building a functional system model that lets the designer evaluate various aspects of a design, and provide a realistic projection about the final product implementation.
*** Interactive Real-time Interface for Smart Remote Health Monitoring and Analysis [#b5b64f3b]
-''Author'': Achraf Ben Ahmed
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering,The University of Aizu, Feb. 2013, 'Ref. 22ABA-MT12.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Achraf-MS-12/m5151161_MS_thesis_slides.pdf]]; [[thesis>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Achraf-MS-12/m5151161_2012_MS_thesis.pdf]]
-''Abstract'': Recent technological advances in sensors, low-power microelectronics, and wireless networking enabled the development of single-chip solutions for computationally intensive biomedical applications with potential health benefits for a large number of
individuals.
One important application, in this respect, is the real-time remote and accurate
analysis of human heart activity, which has always been a challenging problem for
biomedical engineers. Despite the decreased mortality rate, heart disorders (like Cardiovascular Disease) are one of the main causes of death around the world. As a result,
detection of irregularities in the rhythms of the heart is a growing concern in medical
research. In addition to the detection, the collection and the visualization process for
such kind of data is crucial due to the huge amount of data that produce an electrocardiogram especially when the records are made for a long time. Another concern is the
real time monitoring and the capability to monitor and analyze the produced data at
real time.
In this thesis, an interactive real-time (IRT) interface, integrated with a multi-lead
period-peak detection (PPD) algorithm for ECG processing, has been designed to overcome the limitations of existing ECG monitoring systems.
The evaluation results show that our proposed monitoring platform is characterized
to be scalable by the capability to handle huge data coming from many nodes at the
same time. Moreover, the interface was built following the MVC pattern (ModelsViews-Controllers) and using a SSL protocol (Secure -Sockets-Layer) for the data exchange, which made our interface secure and the data privacy of the nodes is ensured.The simulation shows also that the real time monitoring can be ensured by our tool
which contains a live visualization mode which is activated as soon as there is new incoming data.
*** A Quantitative Performance Study of Shared Memory Multicore System [#a24ac5fa]
-''Author'': Takayuki Ochi
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering,The University of Aizu, Feb. 2013, Ref. 21TO-GT12.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Ouchi-BS-12/s1160053_GT2012_slides.pdf]]
-''Abstract'':Systems-on-chip (SoCs) have evolved from fairly simple
unicore, single memory designs to complex heterogeneous
multicore SoC architectures consisting of
large number of IP blocks on the same silicon. To meet
high computational demands posed by latest consumer
electronic devices, most current systems are based on
such paradigm, which represents a real revolution in
many aspects in computing. However, multicore systems
are more complex and there are a lot of parameters
and tradeoffs that affect the overall performance.
Those parameters are important for software and hardware
developers and should be carefully studied and
selected.
In this thesis, I evaluate a multicore architecture
performance by software simulation. We focus on
cache size, bus width of interconnect, frequency of
processor, core/interconnect bus and cache schemes.
Moreover, we study the performance of several interconnect
models as they are one of the most crucial factors
on multicore architectures. The simulation results
and analysis show the influence of such parameters on
multicore architectures.
*** Hardware Prototyping and Evaluation of Distributed Routing Core Network-Interface for OASIS NoC Architecture [#gddc29fc]
-''Author'': Shuu Endou
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering,The University of Aizu, Feb. 2013, Ref. 20SE-GT12.
-[[&ref(pdf-download.gif,,40%); slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Endou-BS-12/s1170180_GT2012_slides.pdf]]
-''Abstract'':Network-on-chip (NoC) has been presented as a promising solution for System-on-Chip (SoC) interconnects bottleneck problem. One of the important NoC's components is the Network Interface (NI). The main functions of a Network Interface (NI) are flitization, conversion of a packet into flits, and deflitization, conversion of flits into a packet. They help data transmission from source core to destination one by adding some control information to packets.
This thesis proposes a design and evaluation of a Core-Network-Interface (CNI) for distributed routing into a real NoC architecture. The CNI was prototyped into a Cyclone II Altera FPGA board. The CNI was also evaluated it terms of complexity and performance on a 2x2 mesh NoC configuration.
*** OASIS Network-on-Chip Prototyping on FPGA [#n228e7d6]
-''Author'': Kenichi Mori
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering,The University of Aizu, Feb. 2012, Ref. 19KM-MT11.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Mori-MS-11/m5141120_2011_MS_slides.pdf]]
; [[thesis>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Mori-MS-11/m5141120_2011_MS_thesis.pdf]];
[[Technical Report>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Mori-MS-11/m5141120_2011_MS_tr.pdf]]
-''Abstract'': Network-on-Chip (NoC) architectures provide a good way of realizing efficient
interconnections and largely alleviate the limitations of bus-based solutions.
NoC has emerged as a solution to problems exhibited by the shared bus
communication approach in System-On-Chip (SoC) implementations. This includes the lack
of scalability, clock skew, lack of support for concurrent communication, and power
consumption.
Network-on-Chip communication is realized by packet, the communication
requirement of this paradigm is affected by architecture parameters selection such as
topology, mapping, routing, buffer size etc... Focusing on topology, there are 2 selections: regular topology and custom topology. These choices also affect hardware
complexity and performance.
The challenges in this thesis are the designing of a prototype NoC with a real ap-
plication including parallel execution and inserting Short Pass Link (SPL) to regular
2D-mesh topology NoC as an optimization technique and prototyping on FPGA to
evaluate its performance and hardware complexity accurately. The optimization is ex-
ecuted based on our designed NoC named OASIS, JPEG encoder, which is selected
as a target real application, is divided into 8 tasks to be mapped to each node. Then,
design appropriate Network Interface (NI) for the application tasks, and improve the
communication delay by employing the SPL insertion algorithm. Finally, I evaluated
its communication performance improvement and accurate hardware utilization vari-
ance.
I prototyped the system in hardware and I evaluated its performance in terms of la-
tency and power using Dimension reversal transaction, Hotspot transaction, and a par-
allelized JPEG encoder. From the performance evaluation results, I concluded that the
Dimension reversal execution time in ONoC with SPL decreased by 29.7%, when com-
pared to the original base architecture, Hotspot execution time decreased by 16.9%,
and JPEG encoder decreased by 43.7%. The area of these three test benches in ONoC
with SPL increased under 5%, the Dimension reversal increased by 2.93%, the Hotspot
increased by 2.60%, and the JPEG encoder increased by 4.28%. The power consump-
tion slightly increased by 0.49% on average. The results indicate that the architecture
is effective in balancing the power and performance of NoC design.
*** On the Design of a 3D Network-on-Chip for Many-core SoC [#k7419422]
-''Author'': Akram Ben Ahmed
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering,The University of Aizu, Feb. 2012, Ref. 18ABA-MT11.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Akram-MS-11/m51411532011_MS_thesis_slides.pdf]]
; [[thesis>http://web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram-MS-11/m5141153_2011_MS_thesis.pdf]];
[[Technical Report>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Akram-MS-11/m5141153_2011_MS_tr.pdf]]
-''Abstract'': Global interconnects are becoming the principal performance bottleneck for high
performance Systems-on-Chip (SoCs). Since the main purpose for this system is to
shrink the size of the chip as smaller as possible while seeking at the same time for
more scalability, higher bandwidth and lower latency. Conventional bus-based-systems
are no longer reliable architecture for SoC due to a lack of scalability and parallelism
integration, high latency and power dissipation, and low throughput. During this last
decade, Network-on-Chip (NoC) has been proposed as a promising solution for future
systems on chip design. It offers more scalability than the shared-bus based intercon-
nection, allows more processors to operate concurrently.
Despite the higher scalability and parallelism integration offered by the Network-
on-Chip (NoC) over the traditional shared-bus based systems, it’s still not an ideal
solution for future large scale Systems-on-Chip (SoCs), due to some limitations such
as high power consumption, high cost communication, and low throughput. Recently,
merging NoC to the third dimension (3D-Noc) has been proposed to deal with those
problems, as it was a solution offering lower power consumption and higher speed.
In this this thesis, a 3D-NoC named OASIS (in short 3D-ONoC) has been designed
to overcome the limitations of 2D-OASIS previously made in our research group. In
this dissertation we describe the 3D OASIS-NoC architecture in a fair amount of detail
and present evaluation results and comparison between 3D and 2D OASIS.
Evaluation results show that despite the increasing hardware complexity, 3D ONoC
reduces the number of hops by 40% and also the average stall count by 74%. As a result
the execution time improved by 36%. By increasing the traffic load with the Matrix application, the execution time could be further enhanced from 36% obtained with one
matrix multiplication to more than 41% with 1, 2, 3 and 4 matrix multiplications.
*** Design of Parametrizable Network-on-Chip [#g46f826b]
-''Author'': Shohei Miura
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering,The University of Aizu, Feb. 2012, Ref. 17SM-MT11.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Miura-MS-11/m5141118_2011_MS_slides.pdf]]
; [[thesis>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Miura-MS-11/m5141118_2011_MS_thesis.pdf]]
-''Abstract'': Since deep sub-micron processing technologies have advanced, LSI designers can integrate a lot of IP cores on a single chip as System-on-Chip (SoC). Many application cores work on a single chip concurrently, and a parallel processing has grown in importance. Network-on-Chip (NoC) is proposed as a better scalable architecture and gains a better performance in parallel processing than traditional multicore interconnection architecture. One important design challenge of NoC is the buffer design. Buffers usually occupy most significant portion of NoC area and consume large power. As well, they have an impact on a performance of NoC. Therefore, buffer resources should be allocated to each buffer on demand. That means heavy traffic channels get large buffers, and the others should reduce the buffer overhead. Since traffic load depends on target applications, methodologies to monitor traffic load on NoC-based SoC are necessary and also target applications connected to NoC should be measured whenever the applications are changed.
In this thesis, we propose architecture and design of a Parameterizable Network-on-Chip system (PNoC), which monitors traffic and focuses on buffer depth.
In PNoC, monitoring probes are embedded in routers and record traffic load and congestion information during execution time. After simulation, we obtain traffic information and compute new buffer depth, then reconfigure buffer depth. Using the walkthrough, we can gain low buffer overhead to reduce NoC area without a drop in performance. The proposed architecture is implemented on Altera Stratix III development board. Target application cores are several Altera Nios II processors and the processors communicate between each other using several traffic patterns. The proposed architecture is evaluated on application processing time and system area. As a result, comparing NoC with wormhole type and virtual cut through (VCT) type with NoC having optimal buffers, application processing time is the similar to VCT type and better than wormhole type, and system area is reduced by 22% compared to VCT type. Therefore, PNoC, which is the proposed architecture, can gain low buffer overhead and NoC design parameters become more optimized for target application
*** Architecture and Design of Core Network Interface for Distributed Routing in OASIS NoC [#sf0a04d4]
-''Author'': Ryuya Okada
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2012, Ref. 16RO-GT09
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Okada-BS-11/s1160048_GT2011-slides.pdf]]
-[[&ref(pdf-download.gif,,40%); Technical Report>http://webfs-int.u-aizu.ac.jp/~benab/publications/treport/RyuyaOkada-TR2011.pdf]]
-''Abstract'': The main functions of a Network Interface (NI) are flitization,
i.e., conversion of a packet into flits, and deflitization,
i.e., conversion of flits into a packet. They
help data transmission from source core to destination
one by adding some control information to packets.
There are 2 types of NIs such as source routing NI
and distributed routing NI. The former has a path information
table for the complete route information. In
the latter, a packet header is compact. We proposed an
architecture and a design of a Core Network Interface
(NI) for distributed routing on an Altera FPGA board.
The designed NI occupies less than 1% of area utilization
of Cyclone II FPGA. In addition, the data transmission
delay from core to router and from router to
core path is equal to 3 + 4(N - 1) clock cycles / packet
(N = Number of flits).
*** Performance and Complexity Study of Multi-QueueCore Systems [#c632ac33]
-''Author'': Tomotaka Kasahara
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2012, Ref. 15TK-GT09
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Kasahara-BS-11/s1160056_GT2011.pdf]]
-''Abstract'': A multi-core system can improve hardware performance,
but such system have some problem. When
several cores try access to shared peripheral at same
time, they can not get at sharedmemory through shared
bus. To use “Network” on chip solve this problem. In
this work, we analyze a Multi-core system designed
by Qsys tool. Qsys tool can use Network on Chip to
connect the processors each other.
*** Development of Parallel Queue Processor Architecture and its Integrated Development Environment [#nb21487b]
-''Author'': Hiroki Hoshino
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering, The University of Aizu, Feb. 2011, Ref. 14HH-MT10.
-[[&ref(pdf-download.gif,,40%); Slides>http://www.u-aizu.ac.jp/~benab/publications/theses/Hoshino-MS-10/m5131139_2010_MS_Presentation.pdf]]
; [[thesis>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Hoshino-MS-10/m5131139_2010_MS_thesis.pdf]]
-''Abstract'': The high performance processors have been required long time. The instruction
level parallelism (ILP) is one of the important essences to enable processors to be high
performance processors. There are two main techniques to exploit ILP, VLIW and
superscalar. The complex compiler to generate the VLIW instruction is needed in the
aspect of software. The superscalar scheme needs large area for the finding ILP with
big instruction window and register renaming techniques.
The queue instruction set processor architecture is the approach to get ILP with
simple techniques. The intermediate results are saved in the queue register, which
follows the first-in first-out rule, instead of the random access register. The instruction
reads the operand from the head of the queue register implicitly. The execution result
is written into the tail of the queue implicitly. There is no need to specify the register
number in the instructions. The instructions for the queue processor are generated by
traversing the data acyclic graph in level ordermanner. There are promising advantages
in the queue processor; high ILP and short instruction width.
In this thesis, the superscalar, out-of-order, produced order parallel queue processor
(QC-3) is designed. The design is implemented by Verilog-HDL to get high accurate
evaluation result with synthesis tools provided by a company. And also the queue
compiler, the assembler and its simulator are proposed. These design suites are useful
to design the programs for the queue processor.
***Design and Evaluation of Dual Mode Processor Architecture [#p5e33c44]
-''Author'': Taichi Maekawa
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering, The University of Aizu, Feb. 2011, Ref. 13TM-MT10.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Maekawa-MS-10/m5131144_Maekawa_MS_presentation.pdf]]
; [[thesis>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Maekawa-MS-10/m5131144_Maekawa_MS_thesis.pdf]]
-''Abstract'': Current processor architecture are implemented many techniques and mechanisms
including register renaming, scheduling windows, and so on to achieve a high performance
using some kinds of parallelisms. However added or modified modules which
realize the techniques and mechanisms increase an architecture area and power consumption.
The queue processors have been developed to overcome these challenges
and generated as a high performance, small area and low power consumption processor.
However the queue processors can not execute existed programs which are made
by predecessors. Because the queue processors do not have a compatibility with programs
made for other processors. At this time, user must make the queue programs
from scratch and can not use all existed programs. This thesis presents the 32bits dual
execution processor (DEP32) which can execute queue programs and Java byte code
without a considerable hardware increase to the base queue processor to reduce a incompatibility
of the queue processor The most feature of Java programs is that Java
programs use Java byte codes. Since almost Java byte codes are supported by the
Java virtual machine except for Java processors, almost Java programs are executed on
many processors. And there are many advantages in Java programs, an object-oriented
paradigm, a robustness, a security. Thus, Java programs have been used in the internet.
Recently Java programs are also used in an embedded systems, because Java programs
also have portability. To support Java byte code can reduce a incompatibility of the
queue processors. In addition, two computation models of both queue and Java have
have many analogies. The DEP32 can overcome a incompatibility of queue processors
using a support of the Java computation model without considerable hardware increase
to base queue processor. The DEP32 is designed using the Verilog HDL and evaluated
by Altera tools. From the design and evaluation results, the DEP32 core which executes
both queue and Java computation model increase 7% hardware area and 9%
power consumption to the base queue processor.
*** Produced Order Queue Compiler Design [#x1b7d493]
-''Author'': Masashi Masuda
-''Degree'': Master's Thesis, Graduate School of Computer Science and Engineering, The University of Aizu, Feb. 2011, Ref. 12MM-MT10
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Masuda-MS-10/m5131145_Masashi_MS_Presentation.pdf]]
; [[thesis>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Masuda-MS-10/m5131145_Masashi_MS_thesis.pdf]]
-''Abstract'': Queue processor arranges high-speed registers in a first-in-first-out queue. All read
accesses are performed in the head of the queue and all writes accesses are performed
in the tail of the queue. This characteristic allows the exploitation of maximum par-
allelism and improves code density. Compiling for the QueueCore requires a new
approach since the concept of registers disappears. We propose a new efficient code
generation algorithm for the QueueCore.
Our queue compiler translates any programs written in C language into queue proces-
sor’s assembly codes. The queue compiler design is completely different from any
other existing compiler, this due to the special characteristics of queue computing.
First, compiler generates the data flow graphs from the input program. Then a set
of custom analysis and transformations are performed to compute the offset reference
values. Finally, the data flow graphs are scheduled in a level-order manner to generate
the final instruction sequence for the target queue machine. The queue compiler can
compile any program into queue object code.
For a set of numerical benchmark programs our compiler extracts more parallelism
than the optimizing compiler for a RISC machine by a factor of 1.38. Through the use
of QueueCore’s reduced instruction set, we are able to generate 20% and 26% denser
code than two embedded RISC processors.
*** OASIS NoC Topology Optimization with ShortPath Link [#r937f7e8]
-''Author'': Takahiro Uesaka
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2011, Ref. 11TU-GT10.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Uesaka-BS-10/s1150030_GT2010.pdf]]
-''Abstract'':
*** Shared Memory MultiQueueCore Processor Design' [#de002c94]
-''Author'': Shunichi Kato
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2011, Ref. 10SK-GT10.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Kato-BS-10/s1150059_GT2010.pdf]]
;[[''Thesis''>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Kato-BS-10/graduation_thesis_final_edition.pdf]]
-''Abstract'': The multi-core systems have been proposed because a
processor performance cannot be achieved by simply increasing
clock frequency. The issues of synchronization
mechanisms and memory arbitration are very important
in constructing the multi-core system. We implemented
the Bus Arbitration to control the memory accesses in a
multi-core system. All processor cores in the system are
connected via a shared bus and communicate using the
shared memory. The number of the memory accesses
affects improving performance in our shared memory
multi-queue core system. In this thesis, we discuss a
design of Bus Arbitrator Mechanism and performance
results.
*** Multicore SoC Architecture for Realtime Data Intensive ECG Processing [#g858d419]
-''Author'': Yumiko Kimezawa
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2011, Ref. 9UK-GT10.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Kimezawa-BS-10/s1150072_GT2010_Feb122011.pdf]]
-''Abstract'':
*** Development Environment for Single Chip Computer intended for Queue Computing Development and Education [#ic077e8a]
-''Author'': Yuuki Omoto
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2010, Ref. 8YO-GT09
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Omoto-BS-09/GT2009_Yuuki_Omoto_final.pdf]]
-''Abstract'': Queue based system architecture oers an attractive
advantages, especially for the development of embedded
computer systems, because it features high performance,
low complexity, and low power consumption. In
this research, we present architecture and designed of a
complete simple computer systems (named QSoC) intended
for development and education of Queue computing.
The QSoC architecture was synthesized for a
functional FPGA device equipped with several peripheral
components. We present architecture description
and evaluation results.
*** Architecture and Design of Application Specific Multicore SoC [#x3b36438]
-''Author'': Haga Yasuyoshi
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2010, Ref. 7HY-GT09.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Haga-BS-09/GT2009_YasuyoshiHaga_final.pdf]]
-''Abstract'': Electrocardiography is an interpretation of the electrical
activity of the heart over time captured and externally
recorded by electrodes. It is an essential practice in heart
medicine, which faces computational challenges, especially
with 12 lead signals or more. In this research, we
exploit parallel processing techniques to process electrocardiography
computation kernels in parallel. This work
is part of a project named BANSMOM project 1. This
thesis presents a hardware implementation of the electrocardiogram
(ECG) processing system. Our system is
based on MultiCore System on a Chip (MCSoC) architecture.
We provided prototype system for 1-lead ECG
signal processing. This system is implemented in FPGA
(target device is Altera Stratix III). The result of logic
synthesis is 14% of logic utilization. In addition, this
system has been simulated on a FPGA board using actual
data from PhysioBank database.
*** Development of User Friendly Assembler for Queue Computers [#hfa70628]
-''Author'': Reo Honjoya
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2010, Ref. 6RO-GT09.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Reo-BS-09/GT2009_ReoHonjoya_final.pdf]]
-''Abstract'':
*** Optimizations Techniques and FPGA Prototyping of OASIS Network-on-Chip [#b924e273]
-''Author'': Kenichi Mori
-''Degree and Year'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2010, Ref. 5MK-GT09.
- [[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Mori-BS-09/GT2009_KenichiMori_final.pdf]]
-''Abstract'': Current Systems-on-Chip (SoCs) execute applications
which demand extensive parallel processing. Networkson-
Chip (NoC) provide a good way of realizing efficient
interconnections, and largely alleviate the limitations of
bus-based solutions. In this paper, we propose an optimized
NoC (ONoC) which is optimized to transmit accurate
data. We verified RTL level simulation and estimated
hardware performance to evaluate hardware cost,
accuracy and speed.
*** Architecture and Design of Parameterizable Network-on-Chip [#m23b887b]
-''Author'': Shoehi Miura
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2010, Ref. 4MS-MT09.
-[[&ref(pdf-download.gif,,40%); Sides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Miura-BS-09/GT2009_ShoheiMiura_final.pdf]]
-''Abstract'': Network-on-Chip (NoC) largely overcomes the limitations
of the bus-based architecture. A NoC system consumes
a large silicon area and a lot of power because it
tends to become a larger system than a normal Systemon-
Chip (SoC). Reducing the area is necessary in order
to improve performance of a NoC system. Buffers in
the switches occupy the most area and power consumption.
The appropriate buffer size depends on traffic load
through channels in a system; however, the estimate environment
for NoC that monitors traffic information and
seeks hot spots has not yet been established.
This paper presents Parameterizable Network-on-Chip
(PNoC), which monitors traffic information and traffic
load and finds the appropriate buffer size to target applications.
With several traffic patterns, including a huge
amount of traffic and a small amount of traffic, this simulation
result shows that the appropriate buffer size for
each traffic pattern and each amount of traffic are different.
As a result, when comparing normal NoC systems
and PNoC systems, area and power consumption
of PNoC are reduced, and system performance is not decreased;
ALUT counts reduced up to 63%, and register
counts is also decreased by up to 84%.
*** Graph Transformation Methods and Theoretical Performance Evaluation of Queue Computation Models [#jce863a7]
-''Author'': Masashi Masuda
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2009, Ref. 3MM-GT08.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Masuda-BS-08/masuda-pres-2008.pdf]]
-''Abstract'': Queue is used to store intermediate calculation results
into a First In First Out (FIFO) data structure. A
Queue system can be classified into three main models
according to the rules of enqueuering and dequeuring.
These models are called: The Produced-Consumed
Order Queue Computation Model, the Consumed Order
Queue Computation Model, and the Produced Order
Queue Computation Model. There are problems in
making programming Queue, and these problems are
named the Multiple Data Produced problem, the Cross
Arc problem, and the Instruction Hole problem. This
thesis presents solutions for these problems and a comparison
of three queue computation models’ fundamental
characteristics (instruction number, instruction level
parallelism, execution cycles, program size).
*** Advanced Hardware Optimization Algorithms for High Performance Queue Processor Architecture [#le7e4395]
-''Author'': Hiroki Hoshino
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2009, Ref. 2HH-GT08.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Hoshino-BS-08/hoshino-pres-2008.pdf]]
-''Abstract'': Instruction level parallelism (ILP) is important to improve
performance of general processors. ILP allows
the instruction of a sequential program to be executed in
parallel. However aggressive optimization of the compiler
and some bigger hardware mechanisms are needed
to find and exploit ILP.
In this research the queue based instruction set architecture
is used. This architecture oers an attractive option
in the design of embedded systems. Instructions
based on queue machine are generated using level order
traversal that allows us to find all available parallelism
in programs. Thus the hardware executes instructions in
parallel with little eort.
Some optimization and design issues for queue processor
architecture (QC) have been proposed. This processor
implemented the oset references, the memory
extension instruction, the pipelined structure and the
floating point execution unit. However the optimized
QC cannot reuse data in the queue register. Also, if the
queue register is full of available data, there is the critical
problem that the processor cannot execute any more.
This research describes the solution of problems such as
the reusing data problem and the queue register overflow
problem.
*** Research on Hardware Design of Dual-Mode Processor Architecture [#l898c890]
-''Author'': Tachi Maekawa
-''Degree'': Bachelor Thesis, School of Computer Science and Engineering, The University of Aizu, Feb. 2009, Ref. 1TM-GT08.
-[[&ref(pdf-download.gif,,40%); Slides>http://webfs-int.u-aizu.ac.jp/~benab/publications/theses/Maekawa-BS-08/maekwawa_GT2008_presentation.pdf]]
-''Abstract'': I present the architecture and preliminary evaluation results
of a novel dual-mode processor architecture which
supports queue and stack computation models in a single
core. The core is highly adaptable in both functionality
and configuration. It is based on a reduced bit produced
order queue computation instruction set architecture and
functions into Queue or Stack execution models. This is
achieved via a so called dynamic switching mechanism
implemented in hardware.
The current design focuses on the ability to execute
Queue programs and also to support Stack based programs
without considerable increase in hardware to the
base architecture. The architecture description and design
results are presented in a fair amount of detail.
***[[MIT Theses>http://www-mtl.mit.edu/researchgroups/icsystems/theses.html]] [#r6e4b43b]
*Technical Reports [#h83accbe]
-https://adaptive.u-aizu.ac.jp/?page_id=6646
ページ名: