FastFlow is an open-source, structured parallel programming framework targeting shared memory multi-core and GPUs (using both CUDA and OpenCL). Support for FPGA and DSP accelerators is currently on-going within the REPARA FP7 project (http://repara-project.eu/).
FastFlow provides the parallel applications programmer with a set of ready-to-use, parametric algorithmic skeletons modelling the most common parallelism exploitation patterns. The skeletons provided may be almost freely nested to model more and more complex parallelism exploitation patterns.
The framework is provided as a set of header files. The last version of the FastFlow code can be download from the Sourceforge svn repository using the following command:
svn co https://svn.code.sf.net/p/mc-fastflow/code fastflow
The FastFlow project web site is: http://calvados.di.unipi.it/fastflow
- Linux operating system (it is possible to use also a Mac OS and Windows OSs but it is not recommended for the tutorial session).
- A compiler supporting C++11 (gcc > 4.7.x is ok)
- For running some of the tests provided in the tutorial session it is need OpenCV and ImageMagick and OpenCL SDK
Everything you need to compile and run the tests has been prepared in the Virtual Machine provided during the session (Linux VMs).
- Introduction to FastFlow
- Structured Parallel Programming
- Stream concept
- FastFlow's building blocks
- FastFlow's core streaming patterns: pipeline and task-farm
- High-level data-parallel patterns
- ParallelFor* and Map
- Sobel filter apllication example
- Mandelbrot set application example
- Proposed exercise: finding the minimum and the index of the minimum value in an array
- Sketch of other High-level pattern
- The ff_mdf (macro) data-flow pattern
- A simple parallel work-flow computation
- The StencilReduceLoop (pattern OpenCL version)
- The image filtering application running on the CPU and on the GPU (using OpenCL)
- Targeting distributed systems (basic concepts)
- The image filtering application executed on 2 machines
Exercise1 (pipeline computation)
- Modify the sequential code in order to implement a 3-stage pipeline:
- the first stage reads files from disk (file names are passed in the command line);
- the second stage compresses each input file in memory;
- the third stage writes the compressed memory file onto the disk.
Compile cmd: g++ -std=c++11 simplecomp.cpp -o simplecomp
To decompress the files you can use this simple code compdecomp.cpp
Possible solution here.
Exercise2 (pipeline and task-farm computation)
Solve the Exercise1 using a task-farm instead of a pipeline. The task-farm's Emitter reads files from the disk, the Workers compress them in parallel and finally the Collector stores the compressed memory file received from workers onto the disk.
Try experimenting with the default task-farm scheduling policy and with the auto-scheduling policy (ondemand scheduling).
Possible solution here
Exercise 3 (data-parallel computation)
Consider the arrayminindex.cpp file implementing a sequential computation on a vector of double elements whose size is given as input parameter. The code finds the minimum of the array and the index of the minimum value in the array. As an example, having in input the following array:
0 1 2 3 4 5 6 7 8 9 index -------------------------------- 31 52 11 13 3 12 23 64 2 12 values --------------------------------
the result is <2, 8>.
Modify the sequential code in order to execute the computation in parallel using a ParallelFor* pattern.
Compile cmd: g++ -std=c++11 -I ~/fastflow arrayminindex.cpp -o arrayminindex
Possible solution here.