How to use Charm++'s Projections
This page explains how to use Charm++'s performance analysis tool, Projections with Xyst.
How to analyze Xyst performance with Charm++'s Projections tool
To enable performance analysis of Xyst with Charm++ do
git clone https://codeberg.org/xyst/xyst.git && cd xyst mkdir build && cd build cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -GNinja -DCHARM_OPTS="-DTRACING=true -DTRACING_COMMTHREAD=true" -Wno-dev -DRUNNER_ARGS="--bind-to none -oversubscribe" -DPOSTFIX_RUNNER_ARGS=+setcpuaffinity -DEXTRA_LINK_ARGS="-tracemode projections" ../src ninja
The above will build Charm++ enabling performance tracing and will pass an extra link argument to Xyst executables. This instructs Charm++ to produce information about all Charm++ events, e.g., entry method calls and message packing, during the execution of Xyst executables.
Once the above went fine, performance can be analyzed by first collecting some data:
./charmrun +p32 Main/inciter -i ../../tmp/problems/sedov/sedov01.exo -c ../../tmp/problems/sedov/sedov_riecg.q Running as 32 OS processes: Main/inciter -i ../../tmp/problems/sedov/sedov01.exo -c ../../tmp/problems/sedov/sedov_riecg.q charmrun> /usr/bin/setarch x86_64 -R mpirun -np 32 Main/inciter -i ../../tmp/problems/sedov/sedov01.exo -c ../../tmp/problems/sedov/sedov_riecg.q Charm++> Running on MPI version: 3.1 Charm++> level of thread support used: MPI_THREAD_SINGLE (desired: MPI_THREAD_SINGLE) Charm++> Running in non-SMP mode: 32 processes (PEs) Converse/Charm++ Commit ID: fa84486 Charm++: Tracemode Projections enabled. Trace: traceroot: Main/inciter Isomalloc> Synchronized global address space. CharmLB> Load balancer assumes all CPUs are same. Xyst> Load balancing off Charm++> Running on 1 hosts (2 sockets x 16 cores x 1 PUs = 32-way SMP) Charm++> cpu topology info is gathered in 0.002 seconds. ...
This run will produce log and sts files in the build folder where the executable resides. Projections can then be used to analyze performance data in detail. Example screenshots are displayed below.
Example average CPU utilization profile between 4s and 12.3s of a run taking 50 time steps with the RieCG solver computing the Sedov problem on 32 CPUs. The colors correspond to various tasks during a time step.
Example projection timelines during the same simulation as above. The vertical axis displays the 32 CPUs and the horizontal axis measures wall-clock time. The colors correspond to various tasks. Clearly visible is the time stepping in the middle of the figure in mostly white and light blue, followed by saving the checkpoint at the end of time stepping in purple.