Commit d4532a6c authored by Sabyasachi Mondal's avatar Sabyasachi Mondal
Browse files

Update README.md

parent f4722f1e
...@@ -27,7 +27,7 @@ We should be able to: ...@@ -27,7 +27,7 @@ We should be able to:
3. <b>*FPGA should be reasonably faster than our CPU for processing streams*</b> 3. <b>*FPGA should be reasonably faster than our CPU for processing streams*</b>
## Implementation Strategy ## Implementation Strategy
Previously we have seen the image resizer takes in the whole data DMA makes the data transfer rate much faster, but we cant process an image or stream of data that is infinitely received and require processing. Previously we have seen the image resizer takes in the whole data DMA makes the data transfer rate much faster, but we cant pr``ocess an image or stream of data that is infinitely received and require processing.
We intend to implement the following: We intend to implement the following:
1. <b>*fast multichannel stream operations at a hardware level integrated with similar high level software constructs in python*</b> 1. <b>*fast multichannel stream operations at a hardware level integrated with similar high level software constructs in python*</b>
...@@ -43,32 +43,32 @@ We intend to implement the following: ...@@ -43,32 +43,32 @@ We intend to implement the following:
*2.b Synchronized operation between packets of each stream which is essential for processing multiple togather.* *2.b Synchronized operation between packets of each stream which is essential for processing multiple togather.*
We try to read each row in the image as a pack of 3 streams process it in 2 seperate block and return the output as an array. We try to read each row in the image as a pack of 3 streams process it in 2 seperate block and return the output as an array.
[Schematic streaming rows and output]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/blob/main/HLSolution.JPG') [Schematic streaming rows and output]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/raw/main/HLSolution.JPG')
This would mean we can store real-time data in frames and feed them continously from our python code. The processing blocks consists of a 2x2 array each and they are the convolution weights added to our stream of data and we return the output. This would mean we can store real-time data in frames and feed them continously from our python code. The processing blocks consists of a 2x2 array each and they are the convolution weights added to our stream of data and we return the output.
[Convolution on streaming row]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/blob/main/RobertCross.JPG') [Convolution on streaming row]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/raw/main/RobertCross.JPG')
(DMA1 + DMA2) streams are processed in PU1 and (DMA2 + DMA3) streams in PU2. However becuase Robert's convolution algorithm needs data to be processed in a 2x2 array they must enter and get processed in Synchronized manner. (DMA1 + DMA2) streams are processed in PU1 and (DMA2 + DMA3) streams in PU2. However becuase Robert's convolution algorithm needs data to be processed in a 2x2 array they must enter and get processed in Synchronized manner.
[CPU FPGA interconnection and data transfer]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/blob/main/CPU_FPGA.JPG') [CPU FPGA interconnection and data transfer]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/raw/main/CPU_FPGA.JPG')
On the higher level the interaction between CPU and FPGA looks like the schematic shown above. On the higher level the interaction between CPU and FPGA looks like the schematic shown above.
We use two blocks to process the streams but that doesnot mean we use one thread we basically dont wait for Nth set of data to be processed before we can start processing N+1 data. Since the convolution algorithm does not wait for processing it can start to read and process the next N+1 set of data from the stream as soon as Nth set has been read. It looks something like this due to loop unrolling and leads to parallel processing. We use two blocks to process the streams but that doesnot mean we use one thread we basically dont wait for Nth set of data to be processed before we can start processing N+1 data. Since the convolution algorithm does not wait for processing it can start to read and process the next N+1 set of data from the stream as soon as Nth set has been read. It looks something like this due to loop unrolling and leads to parallel processing.
[Unravelling of streams in loop and parallel processing]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/blob/main/Parallel_process.JPG') [Unravelling of streams in loop and parallel processing]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/raw/main/Parallel_process.JPG')
## What we achieved and the caveat : ## What we achieved and the caveat :
*We intended to build a architechture that can process multiple streams and process them in same parallel level and we were sucessful.* <b>*We intended to build a architechture that can process multiple streams and process them in same parallel level and we were sucessful.*</b>
*Our main goal is to ensure such a architechture runs faster in FPGA and it was reasonably fast; most importantly it can be scaled up to handle multiple streams.* <b>*Our main goal is to ensure such a architechture runs faster in FPGA and it was reasonably fast; most importantly it can be scaled up to handle multiple streams.*</b>
It is not very suitable for image processing tasks as arrays stored in memory does a better work in that, so a Robert's convolution algorithm is faster in an OpenCV library. It is not very suitable for image processing tasks as arrays stored in memory does a better work in that, so a Robert's convolution algorithm is faster in an OpenCV library.
*CPU Average for images was at 10s and FPGA at about 6s* <b>*CPU Average for images was at 10s and FPGA at about 6s*</b>
## Future scope ## Future scope
*This is a new idea and has no previous references except implementaton guides.* *This is a new idea and has no previous references except implementaton guides.*
*The image processing can serve a stepping stone for controlling multi-agent systems. Where each streaming interface can be used for instruction input and output for each agent.* <b>*The image processing can serve a stepping stone for controlling multi-agent systems. Where each streaming interface can be used for instruction input and output for each agent/bots.*</b>
*We achieved good synchronization betwenn the input streams in terms of pixel processing. We can consider the real world environment as a array of pixels with each pixel representing the coordinates of each bot. In this scenario we can process all inputs (pixels) from each bots and implement collison avoidance and basic navigation using same architechture.* *We achieved good synchronization betwenn the input streams in terms of pixel processing. We can consider the real world environment as a array of pixels with each pixel representing the coordinates of each bot. In this scenario we can process all inputs (pixels) from each bots and implement collison avoidance and basic navigation using same architechture.*
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment