README.md 8.81 KB
Newer Older
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
1
# FPGA_final_project
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
2
#### Group4: Sabyasachi Mondal , Ravi Yadav
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
3
fpga for streamlining of computation intensive tasks. In this case we take an hyperspectral image which is generally analysed by satellites or drones mostly consisting of single band image data. This can be used for both maritime and vehicular navigation.
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
4

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
5
6
7
8
9
# Overview
We want to use FPGA for implementing an algorithm in hardware to perform computation more effeciently. CPU hardware is non-flexible so the code runs using the same set of registers and ALU , we cant optimize the harware as per our code. Our objective here is to harware a processing unit (something smilar to a flexible ALU using the CLBs) in the FPGA using High level code.


# Background
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
10
CPUs are known for their general purpose use, the same GPUs can power all kinds of applications. CPU can run simulate any finite state machine but can't be reprogrammed as a hardware. In CPU the hardware is static so all data will get converted to the same set of specific instruction set that runs one at a time in CPU.
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
11

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
12
In FPGA for example may implement multiple multipliers or registers to work in parallel or in specifc order on the hardware level if we want. Depending on the kind of data we would receive we can implement an hardware that can entirely process the exact type of data much faster.
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
13

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
14
But We as software designers can develop our own algorithms bottom up from register levels to a high level code (python for example), which may prove immensely powerful for the task specific algorithm. In our case we use python as a host to drive our fpga.
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
15
16
17
18
19

# Objective

Our Objective is to develop better integrated code such that our hardware and software works hand in hand to deliver the best result. We start thinking of a algorithm in python and think how it can be optimized while running it in the FPGA's Logic Unit. We would develop the hardware in C++ and write/burn the hardware in FPGA and use our Py code to drive it.  

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
20
In this case we are going to use the FPGA to implement a processing unit in hardware from High Level C code that will be able to perform image processing (like inversion, color specific background sieve) at a much faster rate:
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
21

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
22
1. *Perform Image processing by using the registers, axi_streaming and DMA* [Future scope multi-agent control]
23
24
25
    
    *1.a Implement image inversion and build / test IP*

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
26
    *1.b Implement image layer extraction using modified convolution (Robert's operator).*
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
27

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
28
29
30
31
and 
compare how CPU performs in comparision to our FPGA hardware that is exactly wired up to work on the kind of data we expect to provide as input.

# Implementation Strategy
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
32
33
34
35
Previously we have seen the image resizer takes in the whole data DMA makes the data transfer rate much faster, but there were several instances where CPU performed better and faster specifically in a wider range of image dimension color and size.

We intend to implement the following:
1. make faster multichannel operations at a hardware level integrated with similar high level software constructs
36
37
38
39
40
    
    *1.a Highl Level Code structure to enable parallel operation and optimization*
    
    *1.b Maintain same level of parallelism (multiple data streams and logical processing constructs) in H/W level*

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
41
2. make the FPGA capable to process images in as wide range as our CPU supports
42
43
44
45

    *2.a CPU has large storage FPGA doesnot, we can make high level py code drive large data into DMA acceptable maximum chunks*

    *2.b Increase number of data channels into and out of FPGA for faster processing (higher utilization).*
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
46
47
48
49
50
51

This is how a typical openCV resizer works:
<Data Transfer Image>

We will notice this further if we study the resizer code that in the 2d image is fed to our DMA and internally the whole image is read row by row , col by col. Image array size is static becuase we are have finite space in FPGA.

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
52
This may be made more efficient and robust (accomodating any image width and video) if by implementing the following changes:
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
53
54
1. Multichannel image operation where we use parallel threads for processing. Each of this processing an logic entity (utilizing multiple CLBs) is expected to be faster.
2. By chunking and sending data in packets fromour high level code we can also ensure that our FPGA can process an image much larger than it's own memory or DMA allocation space.
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
55
3. Creating un-rolled loop for read write operations along with function calls.
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
56
57
58

We use two streams of data in each process with it's own processing unit in our IP , which can be schematically represented in:
<image for our Implementation>
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
59

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
60
61
In the background extraction technique we use a modified form of convolution to extract layer / feature to from the image for example IR bands which can be applied as nightvision references for navigations.
<Image modified convolution>
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
62
63

# Tasks
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
64
The Tasks and maximum actual time:
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
65

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
66
1. Problem statement and brainstorming for project selection : *24 hrs*
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
67
68
69
70
71
72
73
2. Design a basic model and build overlay : *6 hrs*
3. Python code adjustment and integration : *4 hrs*
4. Plan next stage of overlay design : *4 hrs*
5. Develop algorithm for FPGA using C++ : *4 hrs*
6. Optimize code and add synchronization of multiple channels : *24 hrs*
7. Implement block diagram : *4 hrs*
8. Upload code and test in IPy notebook : *3 hrs* 
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
74
75


Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
76
# Resources used and Future project topics
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
77
78

#### Resources used
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
79
0. Images: https://serc.carleton.edu/earth_analysis/image_analysis/introduction/day_4_part_2.html
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
80
81
82
83
84
85
86
87
88
89
1. Image segmentation : https://theailearner.com/2020/11/29/image-segmentation-with-watershed-algorithm/
2. Operation with stream: https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/hls_stream_library.html#ivv1539734234667__ad398476
3. Stream Interface : https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/managing_interface_synthesis.html#ariaid-title32
3. Specialized Constructs : https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/special_graph_constructs.html?hl=template
4. Vitis Examples : https://github.com/Xilinx/Vitis_Accel_Examples/blob/master/cpp_kernels/README.md
5. Running Accelerator : https://pynq.readthedocs.io/en/v2.6.1/pynq_alveo.html#running-accelerators
6. Pragma Interfaces : https://www.xilinx.com/html_docs/xilinx2017_4/sdaccel_doc/jit1504034365862.html
7. AXI4 : https://ch.mathworks.com/help/hdlcoder/ug/getting-started-with-axi4-stream-interface-in-zynq-workflow.html
8. Interface of Streaming : https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/managing_interface_synthesis.html#ariaid-title34
9. Database in FPGA : https://dspace.mit.edu/bitstream/handle/1721.1/91829/894228451-MIT.pdf, https://www.xilinx.com/publications/events/developer-forum/2018-frankfurt/accelerating-databases-with-fpgas.pdf, https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/vitis_hls_process.html#djn1584047476918
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
90
10. Muxed Stream : https://liu.diva-portal.org/smash/get/diva2:1057270/FULLTEXT01.pdf
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
91
92
11. RAW,WAR,WAW.. : https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/vitis_hls_optimization_techniques.html#wen1539734225565__aa1299615
12: Loop Pipelining Roll Unroll : https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/vitis_hls_optimization_techniques.html#kcq1539734224846
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
93
94
95
96

#### Future scope
The image processing can serve a stepping stone for controlling multi-agent systems. Where each streaming interface can be used for instruction input and output for each agent. Instead of using RTOS in each bot we can have multiple datastreams from each bots being processed in an IP designed to emulate a FSM for each agent and decide their action. This can lead to higher robustness and fault tolerance and lower costs. 

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
97

Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
98
# Errors Logs and Issues encountered
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
99
The input pins (listed below) are either not connected or do not have a source port, and they don't have a tie-off specified. These pins are tied-off to all 0's to avoid error in Implementation flow.
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
100
101
102
Please check your design and connect them as needed: 
/color_filter/ap_start
When ap_Ctrl = None not specified in design
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
103
104

Cant find custom IP in Vivado : add IP zip path, open IP Integrator view, from IP configure window manually add the IP
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
105
106
107

Cant connect hls::stream<> type object in IP : Note: The hls::stream class should always be passed between functions as a C++ reference argument. For example, &my_stream.
IMPORTANT: The hls::stream class is only used in C++ designs. Array of streams is not supported.
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
108
109

Non-Blocking write not-allowed in Non-FIFO Interfaces like axis instead try using FIFO m_axi
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
110
111

DMA size must be lesser than 16383 so we cant feed very large datasets directly to a single DMA.
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
112
113
114

WARNING: [HLS 200-786] Detected dataflow-on-top in function  'color_filter' (../project_3/color_filter.cpp:45)  with default interface mode 'ap_ctrl_hs'. Overlapped execution of successive kernel calls will not happen unless interface mode 'ap_ctrl_chain' is used (or 'ap_ctrl_none' for a purely data-driven design).
Resolution: For help on HLS 200-786 see www.xilinx.com/cgi-bin/docs/rdoc?v=2020.2;t=hls+guidance;d=200-786.html
Sabyasachi Mondal's avatar
Sabyasachi Mondal committed
115
116

DMA Stuck and not reponding