threadspotter_logo_light.png

ThreadSpotter

Today's complex computer architectures and their deep memory hierarchies are a poor match for most applications. Due to the wide memory gap, processors often devote more than half of their time waiting for data to arrive. Multicore processors intensify this problem by decreasing cache-per-thread while enabling more concurrent threads contending for bandwidth.

ThreadSpotter automatically analyzes an application as it's running, lists the performance problems in order of importance, suggests fixes and gives the developer valuable insights and statistics needed to quickly assess and fix the problems.

ThreadSpotter makes performance experts more productive, and helps less experienced developers become more educated in which techniques work well with the underlying hardware.

Optimization Workflow

ThreadSpotter works in conjunction with most compiled languages and most parallelization paradigms. First, it helps the developer optimize the code when it's still sequential. Then, after the code is parallelized, it helps optimize the thread interactions. Finally, it helps the developer find the optimal thread placement and assists in removing all the cache pollution effects. No other tool can cover all those bases in such a straight-forward way.

  • ThreadSpotter far surpasses simply collecting raw performance data – instead, it identifies, classifies and instructs the developer in ways to remove specific issues.

  • ThreadSpotter offers a solid, detailed understanding of performance problems to allow a quick resolution on the system under examination. It also models other execution environments and provides performance optimization guidance for other systems.

  • ThreadSpotter models thread communication and interaction effects, giving advice on how to resolve the resulting performance issues.

How does it work?

ThreadSpotter efficiently monitors the execution of unmodified application binaries and capture sparse memory fingerprints representing the essence of the application's locality properties. No restart or recompilation of the application is necessary.

The memory fingerprint carries all relevant information for ThreadSpotter to be able to extrapolate the right metrics to any cache. It can also accurately predict these metrics for cache constellations that are different from the architecture where the fingerprint was acquired.

PTTS_main_page.png

Fortunately, the user does not need to understand the underlying memory system detail. ThreadSpotter leverages the information to direct the user in modifying code for effective use of the memory system and increased application performance.

ThreadSpotter then correlates all problems to their respective source code instructions and data structures. ThreadSpotter captures and displays call stack information leading up to the location of the problem and utilizes this information in the analysis and presentation of relevant advice and specific metrics for each unique call graph leading into a particular instruction. In other words, it understands the hardware intricacies and their relation to performance so you can concentrate on your application.

issue1.png

Downloads (version 1.3.7)

System Requirements

  • OS: Linux 2.6

  • CPU: All x86/x86-64 processors from Intel and AMD

  • HPC Systems: x86/x86-64 Linux Clusters, Cray XT/XE/XK Systems

  • Supported languages: Compiled languages, e.g. C, C++, Fortran, Ada, etc.

Documentation