Performance Instrumented Ray Tracing for Windows

stmvao-white

This is a demonstration of how to use MinGW and the Tau Performance System to to cross compile a parallel Linux application for Windows 64-bit with Microsoft MPI.

MSDPE Logo This work was sponsored by the Microsoft Developer and Platform Evangelism Team.

Download

Compile

These instructions will guide you through cross compiling Tachyon on a Linux machine for Windows 64-bit with Microsoft MPI. $TACHYON_BUILD is the folder on your Linux machine in which Tachyon will be built.

  1. Install TAU and the MinGW-w64 cross compiler for Windows 64-bit.

  2. Download the Tachyon source code to $TACHYON_BUILD.

  3. Download this patch to $TACHYON_BUILD.

  4. Unpack the source code:
       1 cd $TACHYON_BUILD
       2 tar xvzf tachyon-0.99b2.tar.gz
    
  5. Patch the source code:
       1 cd $TACHYON_BUILD/tachyon
       2 gunzip -c $TACHYON_BUILD/paratools-v1-mingw-tachyon.patch.gz | patch -p1
       3 chmod +x winpack.sh
    
  6. Specify your MPI installation directory (e.g. $HOME/software/ms-hpc-2008-sp2):

       1 export MPIDIR=/path/to/msmpi
    
  7. Specify your TAU Makefile:
       1 export TAU_MAKEFILE=/path/to/Makefile.tau-mingw-w64-mpi-pdt
    
  8. Cross-compile Tachyon:
       1 cd $TACHYON_BUILD/tachyon/unix
       2 make mingw-w64-msmpi-tau
    
  9. Package Tachyon for transfer to Windows
       1 cd $TACHYON_BUILD
       2 ./winpack.sh
    

Install

You will need both the Tachyon executable file and the supporting MinGW-w64 and TAU libraries. The patch file creates a script in the tachyon folder that will automatically collect all the necessary files into a zip file called "tachyon.zip". %TACHYON_HOME% in these instructions is the Tachyon installation directory on the Windows machine.

  1. Transfer $TACHYON_BUILD/tachyon.zip (or one of the pre-built versions) to your Windows cluster.

  2. Extract the zip file to %TACHYON_HOME%.

  3. Share %TACHYON_HOME% to all cluster nodes with read/write privileges.

Examples

%TACHYON_HOME%\scenes contains many examples you can use to test your installation. Use the job command to submit new jobs to your cluster. Each of the following examples shows the rendered image and the command executed to produce that image. These examples were executed on a 32-node Cray cluster. Each node has dual-socket 3.0 GHz Harpertown quad core CPUs and 16GB RAM. The interconnect fabric is 20 Gbps Infiniband. %TACHYON_HOME% is \\cray03\tachyon

STMVAO-WHITE

stmvao-white

job submit /jobname:tachyon /name:tachyon /nodegroup:ComputeNodes /numprocessors:32 mpiexec -wdir \\cray03\tachyon\scenes -env PROFILEDIR \\cray03\tachyon -env PATH \\cray03\tachyon tachyon.exe stmvao-white.dat -o \\cray03\tachyon\stmvao-white.bmp -format BMP -aasamples 4 -trans_vmd -rescale_lights 0.4 -add_skylight 0.9 -skylight_samples 32 -res 1024 1024

Mean Function Time Over All Cluster Nodes

Click for larger image

DNA

dna

job submit /jobname:tachyon /name:tachyon /nodegroup:ComputeNodes /numprocessors:32 mpiexec -wdir \\cray03\tachyon\scenes -env PROFILEDIR \\cray03\tachyon -env PATH \\cray03\tachyon tachyon.exe dna.dat -o \\cray03\tachyon\dna.bmp -format BMP -aasamples 4 -trans_vmd -rescale_lights 0.4 -add_skylight 0.9 -skylight_samples 32 -res 1024 1024
  • Performance results in Paraprof Packed Profile (PPK) format: dna.ppk

Mean Function Time Over All Cluster Nodes

Click for larger image

FOG

fog

job submit /jobname:tachyon /name:tachyon /nodegroup:ComputeNodes /numprocessors:32 mpiexec -wdir \\cray03\tachyon\scenes -env PROFILEDIR \\cray03\tachyon -env PATH \\cray03\tachyon tachyon.exe fog.dat -o \\cray03\tachyon\fog.bmp -format BMP -aasamples 4 -trans_vmd -rescale_lights 0.4 -add_skylight 0.9 -skylight_samples 32 -res 1024 1024
  • Performance results in Paraprof Packed Profile (PPK) format: fog.ppk

Mean Function Time Over All Cluster Nodes

Click for larger image

Questions

  • Why does instrumentation make Tachyon larger?
    • These distributions use compiler-based instrumentation, which relies on debugging symbols to discover the name and source code location of a function when it is called. Debugging symbols contribute significantly to the program size.
  • Why does instrumentation make Tachyon slower?
    • These distributions use compiler-based instrumentation, which parses the program's debug symbols at runtime to determine the name of a function when it is called. TAU implements many clever tricks to make this as fast as possible, but the overhead is still significant. TAU's source-based instrumentation features will eliminate this overhead.
  • Why did you use compiler-based instrumentation instead of source-based instrumentation?
    • Ideally we would have used source-based instrumentation. Source-based instrumentation works by parsing the program source code before it is compiled and inserting calls to the TAU profiling API. This reduces runtime overhead because the function name and call site are resolved at compile time. However, the Windows API headers contain Microsoft-specific syntax that TAU is unable to parse. MinGW implements workarounds in its preprocessor to cope with this syntax, and we are working on porting these workarounds to TAU's parser.


MSDPE Logo