This is a Python script to convert the output from many profilers into a dot graph.
It has the following features:
- reads output from:
- prunes nodes and edges below a certain threshold;
- uses an heuristic to propagate time inside mutually recursive functions;
- uses color efficiently to draw attention to hot-spots;
- works on any platform where Python and graphviz is available, i.e, virtually anywhere.
If you want an interactive viewer for gprof2dot output graphs, check xdot.py.
- 2014-05-28: Derive callgrind's call ratios based on the call samples, and not call count.
- 2014-04-13: Support custom and straightforward JSON format.
- 2014-04-13: Drop AQtime support.
- 2013-05-17: Add option to show function samples
- 2013-05-17: Python 3 compatibility, based on Sebastian Pipping, and Alex Volkov independent work.
- 2013-04-09: Color theme for printing (by stefan.sydow)
- 2013-04-09: Remove Shark support as it has been deprecated upstream.
- 2013-04-09: Allow to derive total time from call stacks with perf (by David Flater)
- 2013-04-08: VTune Amplifier XE format support (by David Flater)
- 2012-11-25: sub-graph generation (by Mark Roth)
- 2011-10-23: linux perf support (w/ Mark Hills assistance)
- 2011-02-13: Handle multiple spaces in callgrind (by Suzev.Kirill)
- 2010-12-12: Handle empty/recent callgrind output
- 2010-12-12: Java HPROF support (by Russell Power)
- 2010-08-27: Basic xperf support.
- 2010-01-31: Initial callgrind support.
- 2009-12-07: Skew colorization curve to show more/less detail (by Cris Ewing).
- 2009-10-21: Support sysprof format.
- 2009-10-20: Support Very Sleepy format.
- 2009-08-09: Support hostshot profile format file (from tocer.deng).
- 2009-03-04: Support AQtime xml format.
- 2009-03-04: Support gprof indices > 10000 (by jkuzeja).
- 2009-01-15: Add MacOSX Shark output parser; use rounded corners (contributed by Tomas Carnecky).
- 2008-09-18: More complete theming. Black and white color theme.
- 2008-06-29: Added an heuristic to propagate time inside cycles. This allows to determine the critical path for highly recursive code, such as the linux kernel code).
- 2008-04-08: Accept " " in oprofile profiles (contributed by Jakob Bornecrantz).
- 2008-01-31: Handle oprofile output with profile specifications.
- 2008-01-25: Propagate the self time spent in cycle member functions, which was being ignored yielding wrong results (mainly when pos-processing oprofile data).
- 2007-12-20: Allow more than a single input file for pstats (contributed by Daniele Varrazzo).
- 2007-11-18: Parse oprofile callgraph output (more detail here).
- 2007-09-28: Get the total time not from the granularity time, but from the longest function call.
- 2007-07-14: Added the ability to read output generated by Python profilers.
- 2007-05-30: Handle the output produced by gprof with the static call graph option.
- 2007-05-16: Strip template parameters from function names; add command options to control this behavior.
- 2007-04-17: Fix bug parsing cycle entries, which have slightly different syntax than regular entries, and therefore are now parsed separately.
- 2007-04-02: Consistent handling of cycles.
- 2007-04-02: Handle output produced by non-GNU gprof.
- 2007-03-30: Initial import.
- Python: known to work with version 2.7 and 3.3; it will most likely not work with earlier releases.
- Graphviz: tested with version 2.26.3, but should work fine with other versions.
apt-get install python graphviz
Usage: gprof2dot.py [options] [file] ... Options: -h, --help show this help message and exit -o FILE, --output=FILE output filename [stdout] -n PERCENTAGE, --node-thres=PERCENTAGE eliminate nodes below this threshold [default: 0.5] -e PERCENTAGE, --edge-thres=PERCENTAGE eliminate edges below this threshold [default: 0.1] -f FORMAT, --format=FORMAT profile format: axe, callgrind, hprof, json, oprofile, perf, prof, pstats, sleepy, sysprof or xperf [default: prof] --total=TOTALMETHOD preferred method of calculating total time: callratios or callstacks (currently affects only perf format) [default: callratios] -c THEME, --colormap=THEME color map: color, pink, gray, bw, or print [default: color] -s, --strip strip function parameters, template parameters, and const modifiers from demangled C++ function names -w, --wrap wrap function names --show-samples show function samples -z ROOT, --root=ROOT prune call graph to show only descendants of specified root function -l LEAF, --leaf=LEAF prune call graph to show only ancestors of specified leaf function --skew=THEME_SKEW skew the colorization curve. Values < 1.0 give more variety to lower percentages. Values > 1.0 give less variety to lower percentages
/path/to/your/executable arg1 arg2 gprof path/to/your/executable | gprof2dot.py | dot -Tpng -o output.png
See Russell Power's blog post for details.
perf record -g -- /path/to/your/executable perf script | gprof2dot.py -f perf | dot -Tpng -o output.png
opcontrol --callgraph=16 opcontrol --start /path/to/your/executable arg1 arg2 opcontrol --stop opcontrol --dump opreport -cgf | gprof2dot.py -f oprofile | dot -Tpng -o output.png
python -m profile -o output.pstats path/to/your/script arg1 arg2 gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png
python cProfile (formerly known as lsprof)
python -m cProfile -o output.pstats path/to/your/script arg1 arg2 gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png
python hotshot profiler
The hotshot profiler does not include a main function. Use the hotshotmain.py script instead.
hotshotmain.py -o output.pstats path/to/your/script arg1 arg2 gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png
If you're not familiar with xperf then read this excellent article first. Then do:
Start xperf as
xperf -on Latency -stackwalk profile
Run your application.
Save the data. ` xperf -d output.etl
Start the visualizer:
In Trace menu, select Load Symbols. Configure Symbol Paths if necessary.
Select an area of interest on the CPU sampling graph, right-click, and select Summary Table.
In the Columns menu, make sure the Stack column is enabled and visible.
Right click on a row, choose Export Full Table, and save to output.csv.
Then invoke gprof2dot as
gprof2dot.py -f xperf output.csv | dot -Tpng -o output.png
VTune Amplifier XE
Collect profile data as (also can be done from GUI):
amplxe-cl -collect hotspots -result-dir output -- your-app
Visualize profile data as:
amplxe-cl -report gprof-cc -result-dir output -format text -report-output output.txt gprof2dot.py -f axe output.txt | dot -Tpng -o output.png
See also Kirill Rogozhin's blog post.
A node in the output graph represents a function and has the following layout:
+------------------------------+ | function name | | total time % ( self time % ) | | total calls | +------------------------------+
- total time % is the percentage of the running time spent in this function and all its children;
- self time % is the percentage of the running time spent in this function alone;
- total calls is the total number of times this function was called (including recursive calls).
An edge represents the calls between two functions and has the following layout:
total time % calls parent --------------------> children
- total time % is the percentage of the running time transfered from the children to this parent (if available);
- calls is the number of calls the parent function called the children.
Note that in recursive cycles, the total time % in the node is the same for the whole functions in the cycle, and there is no total time % figure in the edges inside the cycle, since such figure would make no sense.
The color of the nodes and edges varies according to the total time % value. In the default temperature-like color-map, functions where most time is spent (hot-spots) are marked as saturated red, and functions where little time is spent are marked as dark blue. Note that functions where negligible or no time is spent do not appear in the graph by default.
Frequently Asked Questions
How can I generate a call graph from gprof output?
gprof2dot.py generates a partial call graph, excluding nodes and edges with little or no impact in the total computation time. If you want the full call graph then set a zero threshold for nodes and edges via the
--edge-thres options, as:
gprof2dot.py -n0 -e0
For an even more complete call graph, also run gprof with the
--static-call-graph, which identifies by statical analysis of the binary machine code other functions that could have been called, but never were.
The node labels are too wide. How can I narrow them?
The node labels can get very wide when profiling C++ code, due to inclusion of scope, function arguments, and template arguments in demangled C++ function names.
If you do not need function and template arguments information, then pass the
--strip option to strip them.
If you want to keep all that information, or if the labels are still too wide, then you can pass the
--wrap, to wrap the labels. Note that because
dot does not wrap labels automatically the label margins will not be perfectly aligned.
Why there is no output, or it is all in the same color?
Likely, the total execution time is too short, so there is not enough precision in the profile to determine where time is being spent.
You can still force displaying the whole graph by setting a zero threshold for nodes and edges via the
--edge-thres options, as:
gprof2dot.py -n0 -e0
But to get meaningful results you will need to find a way to run the program for a longer time period, or run gprof with multiple profiles. See also:
- Gprof Manual: Statistical Sampling Error
- Gprof Manual: Answers to Common Questions: How do I analyze a program that runs for less than a second?
Why don't the percentages add up?
You likely have an execution time too short, causing the round-off errors to be large.
See question above for ways to increase execution time.
Which options should I pass to gcc when compiling for profiling?
Options which are essential to produce suitable results are:
-g: produce debugging information
-fno-omit-frame-pointer: use the frame pointer (frame pointer usage is disabled by default in some architectures like x86_64 and for some optimization levels; it is impossible to walk the call stack without it)
Only if you're using gprof will you need:
-pg: generate profiling instrumentation code But these days you'll get much better results with a sampling profiler.
You want the code you are profiling to be as close as possible as the code that you will be releasing. So you should include all options that you use in your release code, typically:
-O2: optimizations that do not involve a space-speed tradeoff
-DNDEBUG: disable debugging code in the standard library (such as the assert macro)
However, due to the profiling mechanism used by gprof (and other profilers), many of the optimizations performed by gcc interfere with the accuracy/granularity of the profiling. You should pass these options to disable those particular optimizations:
-fno-inline-functions: do not inline functions into their parents (otherwise the time spent on these functions will be attributed to the caller)
-fno-inline-functions-called-once: similar to above
-fno-optimize-sibling-calls: do not optimize sibling and tail recursive calls (otherwise tail calls may be attributed to the parent function)
If the granularity is still too low, you may pass these options to achieve finer granularity:
-fno-default-inline: do not make member functions inline by default merely because they are defined inside the class scope
-fno-inline: do not pay attention to the inline keyword Note however that with these last options the timings of functions called many times will be distorted due to the function call overhead. This is particularly true for typical C++ code which expects that these optimizations to be done for decent performance.
See the full list of gcc optimization options for more information.
- Google Performance Tools
Profiling visualization tools
- Google's gprof2dot
- gprof filters
Call-graph generation tools
- pycallgraph -- a call graph generator for Python programs.