Performance Instrumented CFD for Windows

Formula1 Simulation (click to enlarge

ParaTools has created a distribution of OpenFOAM 2.0.x for Windows 64-bit that is instrumented for performance profiling via the TAU Performance System®. Binary distributions of the instrumented version is available on this page for free download.

The instrumented version of OpenFOAM automatically generates performance profiles of your CFD model runs, even from within a GUI like Caedium. You can use this version to better understand what OpenFOAM is doing when it runs. For example, you will see which routines are taking the longest to complete, or which are called the most.

MSDPE Logo This work was sponsored by the Microsoft Developer and Platform Evangelism Team.

Download

Installation Instructions

%WM_PROJECT_DIR% in these instructions is the top-level folder where you install OpenFOAM (e.g. C:\OpenFOAM). You may need to adjust these instructions to match your machine's configuration.

  1. Install the free Microsoft HPC SDK.

  2. Download OpenFOAM with TAU

  3. Extract the zip file to %WM_PROJECT_DIR%.

  4. Edit %WM_PROJECT_DIR%\shell\setvars.bat and change the line set WM_PROJECT_DIR=C:\OpenFOAM to match your configuration.

Examples

These results are from a 32-node Cray cluster. Each node has dual-socket 3.0 GHz Harpertown quad core CPUs and 16GB RAM. The interconnect fabric is 20 Gbps Infiniband.

Open Wheel Race Car (32 cluster nodes)

Open wheel race cars, such as those found in Formula 1 (F1), are characterized by complex aerodynamics. Symscape provides the complete description of this RANS flow simulation. Performance results are given here in Paraprof Packed Profile (PPK) format. Paraprof is available as part of the Tau Performance System.

f1-cluster-complete

Geometry

Mesh

Shape:

Volume

Volume Elements:

612966

Faces:

400

Face Elements:

67894

Edges:

944

Edge Elements:

7669

Vertices:

544

Vertex Elements:

544

Nodes:

120066

Exclusive Function Time by Cluster Node

Click for larger image

Mean Function Time Over All Cluster Nodes

Click for larger image

Cyclone Separator (1 cluster node)

Cyclone separators are used in many industries to separate particles from a fluid, where the fluid is usually air or water. The types of particles vary widely, from wood chips to dust. Symscape provides the complete description of this RANS flow simulation. Performance results are given here in Paraprof Packed Profile (PPK) format. Paraprof is available as part of the Tau Performance System.

cyclone

Geometry

Mesh

Shape:

Volume

Volume Elements:

12601

Faces:

29

Face Elements:

3250

Edges:

64

Edge Elements:

470

Vertices:

40

Vertex Elements:

40

Nodes:

2989

Mean Function Time

Click for larger image

Questions

  • Why do some functions show as "addr=<hex string>" in the profile output?

    • This is a problem with GNU BFD in handling very long nested template names. We are investigating several solutions and hope to have this resolved soon.
  • Is the instrumented version compiled with optimization?
    • Yes. The instrumented distribution was compiled with "-O3 -DNDEBUG".
  • Why does instrumentation make OpenFOAM so much larger?
    • These distributions use compiler-based instrumentation, which relies on debugging symbols to discover the name and source code location of a function when it is called. Debugging symbols contribute significantly to the program size.
  • Why does instrumentation make OpenFOAM so much slower?
    • These distributions use compiler-based instrumentation, which parses the program's debug symbols at runtime to determine the name of a function when it is called. TAU implements many clever tricks to make this as fast as possible, but the overhead is still significant. We are working on new features for TAU's source-based instrumentation that will eliminate this overhead.
  • Why did you use compiler-based instrumentation instead of source-based instrumentation?
    • Ideally we would have used source-based instrumentation. Source-based instrumentation works by parsing the program source code before it is compiled and inserting calls to the TAU profiling API. This reduces runtime overhead because the function name and call site are resolved at compile time. However, the Windows API headers contain Microsoft-specific syntax that TAU is unable to parse. MinGW implements workarounds in its preprocessor to cope with this syntax, and we are working on porting these workarounds to TAU's parser.


MSDPE Logo