Programming Multi-core using FastFlow

From Gridkaschool


FastFlow is an open-source, structured parallel programming framework originally conceived to support efficient stream parallel computation while targeting shared memory multi-core. It provides the parallel applications programmer with a set of ready-to-use, parametric algorithmic skeletons modelling the most common parallelism exploitation patterns. The skeletons provided may be almost freely nested to model more and more complex parallelism exploitation patterns.

FastFlow is provided as a set of header files. The last version of the FastFlow code can be download from the Sourceforge svn repository this way:

svn co fastflow

Project Home

The FastFlow project web site is:


  • Linux operating system (it is possible to use also a Mac OS and Windows OSs but it is not recommended for the tutorial session).
  • A c++11 compiler (gcc 4.7.x or icc 13.x).
  • For running some of the tests provided in the tutorial session it is need OpenCV and ImageMagick.

Everything you need to compile and run the tests is prepared in the Virtual Machine provided during the session.

VM access

ssh -p 24

the password will be provided during the tutorial session.

VM machines










Training materials and documentation

Session Agenda

Hands-on slides available here.

  • Introduction to FastFlow
  • Stream concept
  • FastFlow's building blocks
  • FastFlow's core streaming patterns: pipeline and task-farm
    • How to build a pipeline based application
    • How to build a task-farm based application
    • The image filtering application example using ImageMagick
    • Proposed exercise: simple files compressor using miniz.c
  • High-level data-parallel patterns
    • ParallelFor ParallelForReduce and Map
    • Sobel filter apllication example
    • Proposed exercise: a simple matrix-based computation
  • High-level data-flow pattern
    • The ff_mdf pattern
    • A simple parallel work-flow computation
    • Proposed exercise: matrix multiplication using Strassen's algorithm.

Proposed Exercises

Exercise1 (stream parallel computation)

Consider the simplecomp.cpp file implementing a very naive file compressor using the minzip routines. It compresses the entire file in memory and then writes the compressed memory file into disk.

  • Modify the sequential code in order to implement a 3-stage pipeline:
    • the first stage reads files from disk (file names are passed in the command line);
    • the second stage compresses each input file in memory;
    • the third stage writes the compressed memory file onto the disk.

Exercise2 (task-farm computation)

Solve the Exercise1 using a task-farm instead of a pipeline. The task-farm's Emitter reads files from the disk, the Workers compress them in parallel and finally the Collector stores the compressed memory file received from workers onto the disk.

Try experimenting with the default task-farm scheduling policy and with the auto-scheduling policy (ondemand scheduling).sk-farm computation)

Exercise 3 (data-parallel computation)

Consider the matcomputation.cpp file implementing a sequential computation on a square matrix whose size is given as input parameter. As an example, having in input the following 3x3 matrix, the result is computed as:

   1 2 3 
   4 5 6        1 + 2*4 + 3*7 + 5 + 6*8 + 9 = 92
   7 8 9

elements on the diagonal are just summed up while corresponding elements on the rows and columns are multiplied.

Modify the sequential code in order to execute the computation in parallel using a ParallelForReduce.

Exercise 4 (task-parallel computation)

The so called Strassen's algorithm computes the matrix multiplication (C=AxB) by dividing the matrix in 4 blocks and computing the following instructions:

A=  | A11 A12 |  B= | B11 B12 |  C= | C11 C12 |
    | A21 A22 |     | B21 B22 |     | C21 C22 |
 S1  = A11 + A22
 S2  = B11 + B22
 P1  = S1 * S2
 S3  = A21 + A22
 P2  = S3 * B11
 S4  = B12 - B22
 P3  = A11 * S4
 S5  = B21 - B11
 P4  = A22 * S5
 S6  = A11 + A12
 P5  = S6 * B22
 S7  = A21 - A11
 S8  = B11 + B12
 P6  = S7 * S8
 S9  = A12 - A22
 S10 = B21 + B22
 P7  = S9*S10
 C11 = P1 + P4 - P5 + P7
 C12 = P3 + P5
 C21 = P2 + P4
 C22 = P1 - P2 + P3 + P6

The sequential code implementing Strassen's algorithm is provided in the strassen.cpp file. Write a parallel version of the sequential code using the macro-data-flow pattern (ff_mdf).

Possible solution of exercises