Commit f4722f1e authored by Sabyasachi Mondal's avatar Sabyasachi Mondal
Browse files

Update README.md

parent 0c68653a
......@@ -2,59 +2,61 @@
#### Group4: Sabyasachi Mondal , Ravi Yadav
fpga for streamlining of computation intensive tasks. In this case we take an hyperspectral image which is generally analysed by satellites or drones mostly consisting of single band image data. This can be used for both maritime and vehicular navigation.
# Overview
## Overview
We want to use FPGA for implementing an algorithm in hardware to perform computation more effeciently. CPU hardware is non-flexible so the code runs using the same set of registers and ALU , we cant optimize the harware as per our code. Our objective here is to harware a processing unit (something smilar to a flexible ALU using the CLBs) in the FPGA using High level code.
# Background
## Background
<b>*FPGA should be able to process multiple streams in synchronized manner. We want to process the streams coming from an image and process them through a convolution algorithm (Robert's matrix) and then use another function to filter out relevant parts*</b>
CPUs are known for their general purpose use, the same GPUs can power all kinds of applications. CPU can run simulate any finite state machine but can't be reprogrammed as a hardware. In CPU the hardware is static so all data will get converted to the same set of specific instruction set that runs one at a time in CPU.
In FPGA for example may implement multiple multipliers or registers to work in parallel or in specifc order on the hardware level if we want. Depending on the kind of data we would receive we can implement an hardware that can entirely process the exact type of data much faster.
But We as software designers can develop our own algorithms bottom up from register levels to a high level code (python for example), which may prove immensely powerful for the task specific algorithm. In our case we use python as a host to drive our fpga.
<b>*FPGA should be able to process multiple streams in synchronized manner. We want to process the streams coming from an image and process them through a convolution algorithm (Robert's matrix) and then use another function to filter out relevant parts*</b>
# Objective
## Objective
Our objective is to use enable continous data stream processing in a pipeline that runs faster using FPGA in comparison to CPU.
We try to implement a image-filter which works by taking data streams and processing them on fly, and the FPGA should work faster than CPU. Our objective is not to make the image-processing-algorithm fast.
We should be able to:
<b>
1. *Remove limitations on length and size of data so the structure can be adapted for real-time continous use*
1. <b>*Remove limitations on length and size of data so the structure can be adapted for real-time continous use*</b>
2. *Enable multiple data stream processing is parallel using the ctrategies used in FPGA for faster processing*
2. <b>*Enable multiple data stream processing is parallel using the ctrategies used in FPGA for faster processing*</b>
3. <b>*FPGA should be reasonably faster than our CPU for processing streams*</b>
3. *FPGA should be reasonably faster than our CPU for processing streams*
</b>
# Implementation Strategy
## Implementation Strategy
Previously we have seen the image resizer takes in the whole data DMA makes the data transfer rate much faster, but we cant process an image or stream of data that is infinitely received and require processing.
We intend to implement the following:
1. *fast multichannel stream operations at a hardware level integrated with similar high level software constructs in python*
1. <b>*fast multichannel stream operations at a hardware level integrated with similar high level software constructs in python*</b>
*1.a High Level Code structure to enable parallel operation and optimization in functions*
*1.b Maintain same level of parallelism (multiple processing streams) in unrolled loops*
2. *make the FPGA capable to process continous stream of data which is infeasible to be stored in a large space*
2. <b>*make the FPGA capable to process continous stream of data which is infeasible to be stored*</b>
*2.a CPU packs data and feed them to FPGA till the image is processed (but we can simply loop it forever for continous data)*
*2.b Synchronized operation between packets of each stream which is essential for processing multiple togather.*
We try to read each row in the image as a pack of 3 streams process it in 2 seperate block and return the output as an array.
[Schematic streaming rows and output]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/blob/main/HLSolution.JPG')
This would mean we can store real-time data in frames and feed them continously from our python code. The processing blocks consists of a 2x2 array each and they are the convolution weights added to our stream of data and we return the output.
[Convolution on streaming row]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/blob/main/RobertCross.JPG')
(DMA1 + DMA2) streams are processed in PU1 and (DMA2 + DMA3) streams in PU2. However becuase Robert's convolution algorithm needs data to be processed in a 2x2 array they must enter and get processed in Synchronized manner.
[CPU FPGA interconnection and data transfer]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/blob/main/CPU_FPGA.JPG')
On the higher level the interaction between CPU and FPGA looks like the schematic shown below:
On the higher level the interaction between CPU and FPGA looks like the schematic shown above.
We said we use two blocks to process the streams but that doesnot mean we use one thread we basically dont wait for Nth data to be read before we can start processing N+2nd data. Since the convolution algorithm needs two sequential data to be processed at a time 1st thread in unrolled loop can process N , N+1 data packet from Stream1 and Stream2 . But it's cloned thread can read N+2 and N+3 data packet. It looks something like this due to loop unrolling and parallel processing.
We use two blocks to process the streams but that doesnot mean we use one thread we basically dont wait for Nth set of data to be processed before we can start processing N+1 data. Since the convolution algorithm does not wait for processing it can start to read and process the next N+1 set of data from the stream as soon as Nth set has been read. It looks something like this due to loop unrolling and leads to parallel processing.
[Unravelling of streams in loop and parallel processing]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/blob/main/Parallel_process.JPG')
# What we achieved and the caveat :
## What we achieved and the caveat :
*We intended to build a architechture that can process multiple streams and process them in same parallel level and we were sucessful.*
*Our main goal is to ensure such a architechture runs faster in FPGA and it was reasonably fast; most importantly it can be scaled up to handle multiple streams.*
......@@ -63,7 +65,7 @@ It is not very suitable for image processing tasks as arrays stored in memory do
*CPU Average for images was at 10s and FPGA at about 6s*
#### Future scope
## Future scope
*This is a new idea and has no previous references except implementaton guides.*
*The image processing can serve a stepping stone for controlling multi-agent systems. Where each streaming interface can be used for instruction input and output for each agent.*
......@@ -71,7 +73,7 @@ It is not very suitable for image processing tasks as arrays stored in memory do
*We achieved good synchronization betwenn the input streams in terms of pixel processing. We can consider the real world environment as a array of pixels with each pixel representing the coordinates of each bot. In this scenario we can process all inputs (pixels) from each bots and implement collison avoidance and basic navigation using same architechture.*
# Tasks
#### Tasks
The Tasks and maximum actual time:
1. Problem statement and brainstorming for project selection : *24 hrs*
......@@ -84,9 +86,9 @@ The Tasks and maximum actual time:
8. Upload code and test in IPy notebook : *3 hrs*
# Resources used and Future project topics
#### Resources used and Future project topics
#### Resources used
##### Resources used
0. Images: https://serc.carleton.edu/earth_analysis/image_analysis/introduction/day_4_part_2.html
1. Image segmentation : https://theailearner.com/2020/11/29/image-segmentation-with-watershed-algorithm/
2. Operation with stream: https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/hls_stream_library.html#ivv1539734234667__ad398476
......@@ -103,7 +105,7 @@ The Tasks and maximum actual time:
12: Loop Pipelining Roll Unroll : https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/vitis_hls_optimization_techniques.html#kcq1539734224846
# Errors Logs and Issues encountered
### Errors Logs and Issues encountered
The input pins (listed below) are either not connected or do not have a source port, and they don't have a tie-off specified. These pins are tied-off to all 0's to avoid error in Implementation flow.
Please check your design and connect them as needed:
/color_filter/ap_start
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment