Programming Multi-core using FastFlow
FastFlow
FastFlow is an open-source, structured parallel programming framework originally conceived to support efficient stream parallel computation while targeting shared memory multi-core. It provides the parallel applications programmer with a set of ready-to-use, parametric algorithmic skeletons modelling the most common parallelism exploitation patterns. The skeletons provided may be almost freely nested to model more and more complex parallelism exploitation patterns.
FastFlow is provided as a set of header files. The last version of the FastFlow code can be download from the Sourceforge svn repository this way:
svn co https://svn.code.sf.net/p/mc-fastflow/code fastflow
Project Home
The FastFlow project web site is: http://calvados.di.unipi.it/fastflow
Requirements
- Linux operating system (it is possible to use also a Mac OS and Windows OSs but it is not recommended for the tutorial session).
- A c++11 compiler (gcc 4.7.x or icc 13.x).
- For running some of the tests provided in the tutorial session it is need OpenCV and ImageMagick.
Everything you need to compile and run the tests is prepared in the Virtual Machine provided during the session.
VM access
ssh -p 24 fastflow@gks-XXX.scc.kit.edu
the password will be provided during the tutorial session.
VM machines
gks-137
gks-138
gks-139
gks-141
gks-143
gks-144
gks-145
gks-146
gks-196
Training materials and documentation
- The FastFlow tutorial.
- Tests and examples contained in the FastFlow tutorial are available here.
- On-line reference manual available at FastFlow project site.
Session Agenda
Hands-on slides available here.
- Introduction to FastFlow
- Stream concept
- FastFlow's building blocks
- FastFlow's core streaming patterns: pipeline and task-farm
- How to build a pipeline based application
- How to build a task-farm based application
- The image filtering application example using ImageMagick
- Proposed exercise: simple files compressor using miniz.c
- High-level data-parallel patterns
- ParallelFor ParallelForReduce and Map
- Sobel filter apllication example
- Proposed exercise: a simple matrix-based computation
- High-level data-flow pattern
- The ff_mdf pattern
- A simple parallel work-flow computation
- Proposed exercise: matrix multiplication using Strassen's algorithm.
Proposed Exercises
Exercise1 (stream parallel computation)
Consider the simplecomp.cpp file implementing a very naive file compressor using the minzip routines. It compresses the entire file in memory and then writes the compressed memory file into disk.
- Modify the sequential code in order to implement a 3-stage pipeline:
- the first stage reads files from disk (file names are passed in the command line);
- the second stage compresses each input file in memory;
- the third stage writes the compressed memory file onto the disk.
Exercise2 (task-farm computation)
Solve the Exercise1 using a task-farm instead of a pipeline. The task-farm's Emitter reads files from the disk, the Workers compress them in parallel and finally the Collector stores the compressed memory file received from workers onto the disk.
Try experimenting with the default task-farm scheduling policy and with the auto-scheduling policy (ondemand scheduling).sk-farm computation)
Exercise 3 (data-parallel computation)
Consider the matcomputation.cpp file implementing a sequential computation on a square matrix whose size is given as input parameter. As an example, having in input the following 3x3 matrix, the result is computed as:
--------- 1 2 3 4 5 6 1 + 2*4 + 3*7 + 5 + 6*8 + 9 = 92 7 8 9 ---------
elements on the diagonal are just summed up while corresponding elements on the rows and columns are multiplied.
Modify the sequential code in order to execute the computation in parallel using a ParallelForReduce.
Exercise 4 (task-parallel computation)
The so called Strassen's algorithm computes the matrix multiplication (C=AxB) by dividing the matrix in 4 blocks and computing the following instructions:
A= | A11 A12 | B= | B11 B12 | C= | C11 C12 | | A21 A22 | | B21 B22 | | C21 C22 |
S1 = A11 + A22 S2 = B11 + B22 P1 = S1 * S2 S3 = A21 + A22 P2 = S3 * B11 S4 = B12 - B22 P3 = A11 * S4 S5 = B21 - B11 P4 = A22 * S5 S6 = A11 + A12 P5 = S6 * B22 S7 = A21 - A11 S8 = B11 + B12 P6 = S7 * S8 S9 = A12 - A22 S10 = B21 + B22 P7 = S9*S10 C11 = P1 + P4 - P5 + P7 C12 = P3 + P5 C21 = P2 + P4 C22 = P1 - P2 + P3 + P6
The sequential code implementing Strassen's algorithm is provided in the strassen.cpp file. Write a parallel version of the sequential code using the macro-data-flow pattern (ff_mdf).