@@ -43,18 +43,22 @@ We intend to implement the following:
*2.b Synchronized operation between packets of each stream which is essential for processing multiple togather.*
We try to read each row in the image as a pack of 3 streams process it in 2 seperate block and return the output as an array.

This would mean we can store real-time data in frames and feed them continously from our python code. The processing blocks consists of a 2x2 array each and they are the convolution weights added to our stream of data and we return the output.

(DMA1 + DMA2) streams are processed in PU1 and (DMA2 + DMA3) streams in PU2. However becuase Robert's convolution algorithm needs data to be processed in a 2x2 array they must enter and get processed in Synchronized manner.
[CPU FPGA interconnection and data transfer]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/raw/main/CPU_FPGA.JPG')

On the higher level the interaction between CPU and FPGA looks like the schematic shown above.
We use two blocks to process the streams but that doesnot mean we use one thread we basically dont wait for Nth set of data to be processed before we can start processing N+1 data. Since the convolution algorithm does not wait for processing it can start to read and process the next N+1 set of data from the stream as soon as Nth set has been read. It looks something like this due to loop unrolling and leads to parallel processing.
[Unravelling of streams in loop and parallel processing]('https://mygit.th-deg.de/sm11312/fpga_final_project/-/raw/main/Parallel_process.JPG')

## What we achieved and the caveat :
<b>*We intended to build a architechture that can process multiple streams and process them in same parallel level and we were sucessful.*</b>
...
...
@@ -65,6 +69,8 @@ It is not very suitable for image processing tasks as arrays stored in memory do
<b>*CPU Average for images was at 10s and FPGA at about 6s*</b>

## Future scope
*This is a new idea and has no previous references except implementaton guides.*