README.md 4.79 KB
Newer Older
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
1
# FPGA_final_project
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
2
3
#### Sabyasachi Mondal , Ravi Yadav
fpga vs cpu performance comparison and fpga streamlining for computation intensive tasks
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
4

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
5
6
7
8
9
# Overview
We want to use FPGA for implementing an algorithm in hardware to perform computation more effeciently. CPU hardware is non-flexible so the code runs using the same set of registers and ALU , we cant optimize the harware as per our code. Our objective here is to harware a processing unit (something smilar to a flexible ALU using the CLBs) in the FPGA using High level code.


# Background
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
10
CPUs are known for their general purpose use, the same GPUs can power all kinds of applications. EINAC the first computer in a sense had programmable cards, taking days to reprogram but used general purpose computations, the limitation is code could be used to perform any tasks. CPU can run simulate any finite state machine but can't be reprogrammed as a hardware. 
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
11

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
12
13
14
15
16
For application specific needs like signal processing wiring a device to do particular computations can prove much more efficient. We may implement an multiplier on the hardware level if we want. Depending on the kind of data we are can implement an hardware that can entirely process the exact type of data much faster. In CPU the hardware is static so all data will get converted to the same set of specific instruction set that runs one at a time in CPU.

If we know we will be doing a matrix addition for 2 4x4 array we can simply implement a register to register connected adder that will always give us the result of addition in the next cycle the data is received. In CPU we cant simply do that!

In this case we are going to use the FPGA to implement a processing unit in hardware from High Level C code that will be able to compute :
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
17
1. *The weight matrix of a neural network* [Future Application to develop a hardware optimized neural network]
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
18

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
19
20
21
22
23
24
25
26
and 
compare how CPU performs in comparision to our FPGA hardware that is exactly wired up to work on the kind of data we expect to provide as input.

# Implementation Strategy
First we need to determine the type of data we would be using in our project. Based on that we need to decide the type of ports and hardware we can use in FPGA.

After this we need to determine a mental sketch of the hardware that if implemented can make the processing faster.

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
27
###### At this point we will do a project estimate analysis and select one of the above problem statement if needed (to fit within the time)
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
28
29

Next we need to know what high level functions transfer to which hardware component and write the code as per the hardware architechture we define in previous step.
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
30
31
32
33
34
35
36
37

After this we are goingto use the HLS tool Vitis to desgin and then use Vivado to generate our harware programmable bitstream for us, this bitstream configured will be used to process our data. We will be using python APIs to interact with our bitstream.

Next we implement the same algorithm in our python code that will obviously run on the cpu.

Then finally we can check the runtime and reach a conclusion on which is faster and why.

# Tasks
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
38
39
The Tasks and maximum estimated time:

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
40
41
42
43
44
45
46
1. Problem statement and solution Plan brainstorming and refresher on NN : *12 hrs*
2. Implementing the network in python CPU : *16 hrs*
3. Pseudo code and solution adjustment : *6 hrs*
4. Vivado study of other solutions, available tools, code and hardware correlation : *16 hrs*
5. Writting the code in Vivado : *6 hrs*
6. Implementing code and checking hardware features and making final adjustments : *16 hrs*
7. Bitstream generation python code for overlay : *2 hrs*
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
47
8. Drafting the report and Analysis : *4 hrs*
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
48
49


Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
50
# Resources used and Future thesis topics
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
51
52
53
A great resource for 32x32 image dataset: http://chaladze.com/l5/
Book to jumstart or serve as refresher: Programming Machine Learning (Perrotta, Paolo) [ISBN: , 9781680507720]
Hardware based Neural networks : https://users.ece.cmu.edu/~pgrover/teaching/files/NeuromorphicComputing.pdf
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
54
https://www.amiq.com/consulting/2018/12/14/how-to-implement-a-convolutional-neural-network-using-high-level-synthesis/
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
55
https://wiki.nus.edu.sg/display/ee4218/Hardware+Implementation+Flow
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
56

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
57
https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/hls_stream_library.html
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
58
59
60
https://dspace.mit.edu/bitstream/handle/1721.1/91829/894228451-MIT.pdf?sequence=2&isAllowed=y
https://www.xilinx.com/publications/events/developer-forum/2018-frankfurt/accelerating-databases-with-fpgas.pdf
https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/vitis_hls_process.html#djn1584047476918
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
61
62
63
64
65
66

# Errors and Issues encountered
 [BD 41-759] The input pins (listed below) are either not connected or do not have a source port, and they don't have a tie-off specified. These pins are tied-off to all 0's to avoid error in Implementation flow.
Please check your design and connect them as needed: 
/color_filter/ap_start
When ap_Ctrl = None not specified in design