Understanding the performance of parallel code is tricky, however Julia can make it even more opaque: with asynchronous tasks, multithreading, distributed computing, garbage collection, GPU support and calls to many external libraries, getting a full understanding of what your code is doing can be rather complicated. This talk will describe how to use Nvidia Nsight Systems to understand what your parallel Julia code is doing.
Nvidia Nsight Systems is a powerful profiling tool for analyzing code performance, especially when working with parallel or asynchronous code, even without a GPU. This talk will give a short overview of Nsight, as well as the NVTX.jl package for instrumenting Julia code, using examples of both multithreaded and MPI distributed code.