FastFlow Tutorial

From Gridkaschool
Revision as of 18:03, 10 September 2015 by Mtorquati (talk | contribs) (Proposed Exercises)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

FastFlow

FastFlow is an open-source, structured parallel programming framework targeting shared memory multi-core and GPUs (using both CUDA and OpenCL). Support for FPGA and DSP accelerators is currently on-going within the REPARA FP7 project (http://repara-project.eu/).

FastFlow provides the parallel applications programmer with a set of ready-to-use, parametric algorithmic skeletons modelling the most common parallelism exploitation patterns. The skeletons provided may be almost freely nested to model more and more complex parallelism exploitation patterns.

The framework is provided as a set of header files. The last version of the FastFlow code can be download from the Sourceforge svn repository using the following command:

svn co https://svn.code.sf.net/p/mc-fastflow/code fastflow

Project Home

The FastFlow project web site is: http://calvados.di.unipi.it/fastflow

Requirements

  • Linux operating system (it is possible to use also a Mac OS and Windows OSs but it is not recommended for the tutorial session).
  • A compiler supporting C++11 (gcc > 4.7.x is ok)
  • For running some of the tests provided in the tutorial session it is need OpenCV and ImageMagick and OpenCL SDK

Everything you need to compile and run the tests has been prepared in the Virtual Machine provided during the session (Linux VMs).

Documentation

  • You can find the tutorial slide here.
  • FastFlow tutorial and examples source code here.

Session Agenda

  • Introduction to FastFlow
  • Structured Parallel Programming
  • Stream concept
  • FastFlow's building blocks
  • FastFlow's core streaming patterns: pipeline and task-farm
    • How to build a pipeline based application
    • How to build a task-farm based application
    • The image filtering application example using ImageMagick
    • Proposed exercise: simple files compressor using miniz.c
  • High-level data-parallel patterns
    • ParallelFor* and Map
    • Sobel filter apllication example
    • Mandelbrot set application example
    • Proposed exercise: finding the minimum and the index of the minimum value in an array
  • Sketch of other High-level pattern
    • The ff_mdf (macro) data-flow pattern
    • A simple parallel work-flow computation
    • The StencilReduceLoop (pattern OpenCL version)
    • The image filtering application running on the CPU and on the GPU (using OpenCL)
  • Targeting distributed systems (basic concepts)
    • The image filtering application executed on 2 machines

Proposed Exercises

Exercise1 (pipeline computation)

Consider the simplecomp.cpp file implementing a very naive file compressor using the miniz routines. It compresses the entire file in memory and then writes the compressed memory file into disk.

  • Modify the sequential code in order to implement a 3-stage pipeline:
    • the first stage reads files from disk (file names are passed in the command line);
    • the second stage compresses each input file in memory;
    • the third stage writes the compressed memory file onto the disk.

Compile cmd: g++ -std=c++11 simplecomp.cpp -o simplecomp

To decompress the files you can use this simple code compdecomp.cpp

Possible solution here.

Exercise2 (pipeline and task-farm computation)

Solve the Exercise1 using a task-farm instead of a pipeline. The task-farm's Emitter reads files from the disk, the Workers compress them in parallel and finally the Collector stores the compressed memory file received from workers onto the disk.

Try experimenting with the default task-farm scheduling policy and with the auto-scheduling policy (ondemand scheduling).

Possible solution here

Exercise 3 (data-parallel computation)

Consider the arrayminindex.cpp file implementing a sequential computation on a vector of double elements whose size is given as input parameter. The code finds the minimum of the array and the index of the minimum value in the array. As an example, having in input the following array:

   0  1   2  3  4  5   6   7  8  9     index
   --------------------------------
   31 52 11  13 3  12 23  64  2  12    values
   --------------------------------

the result is <2, 8>.

Modify the sequential code in order to execute the computation in parallel using a ParallelFor* pattern.

Compile cmd: g++ -std=c++11 -I ~/fastflow arrayminindex.cpp -o arrayminindex

Possible solution here.