It is known that measuring performance is one of the most challenging tasks in system administration.
It requires proper configuration and a good understanding of the results.
Fortunately, Linux systems offer a wide variety of tools for obtaining performance metrics.
In this blog post, we will focus on the instrumentation capabilities of the Linux kernel and some interesting methods of analyzing the results.
The importance of the kernel lies in the fact that usage information related to CPU, memory, disk space, or network interfaces is always passing through it, and it cannot be bypassed.
Of course, there are exceptions where the kernel is not involved in an operation or is involved only at a minimum, like creating the user-space TCP/IP stack with the Data Plane Development Kit.
Still, kernel events provide the most accurate and precise picture of what is happening in the system for most tasks.
There are various tools to measure a Linux system’s performance.
Tools like top, iostat, mpstat, and iftop can provide answers to questions like:
Which process is causing excessive disk writes?
How many CPU cores is a specific application using?
Is system utilization high, moderate, or low?
The average system administrator does not usually need to go beyond these tools.
But what if you need to answer more specific questions, for example:
How many times did a process invoke a certain syscall, and what were the results?
How successful was a process at opening an external device, like a USB drive?
How much time must a device wait before its interrupt is serviced?
ftrace is a kernel event tracing tool that can easily answer these questions.
It has been available since version 2.6 and is essentially a keyhole into the kernel, allowing a user to subscribe to certain types of events or functions.
This excerpt from the tool’s readme file written by its author, Steven Rostedt, describes it perfectly:
Ftrace is an internal tracer designed to help out developers and system designers, to find what is going on inside the kernel.
It can be used for debugging or analyzing latencies and performance issues that take place outside of user-space.
— Steven Rostedt
At the heart of ftrace lies a circular buffer where events are aggregated by tracers.
By tracer, we mean kernel code capable of collecting certain types of events.
ftrace is built on top of debugfs, so its environment is exposed in
For example, to get the list of available tracers, one can consult this file:
# cat /sys/kernel/debug/tracing/available_tracers
hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop
This represents the tracing facilities available by default on most machines.
But the Linux kernel supports numerous other interesting tracers that can be observed by looking in the
kernel/trace/trace_*.c files of the Linux source tree.
They can be activated by recompiling the kernel with the necessary configuration options.
[~/my-kernel-source/kernel/trace]$ ls trace_*.c
trace_benchmark.c trace_events_filter.c trace_hwlat.c trace_output.c trace_selftest_dynamic.c
trace_boot.c trace_events_hist.c trace_irqsoff.c trace_preemptirq.c trace_seq.c
trace_branch.c trace_events_inject.c trace_kdb.c trace_printk.c trace_stack.c
trace_clock.c trace_events_synth.c trace_kprobe.c trace_probe.c trace_stat.c
trace_dynevent.c trace_events_trigger.c trace_kprobe_selftest.c trace_recursion_record.c trace_syscalls.c
trace_eprobe.c trace_export.c trace_mmiotrace.c trace_sched_switch.c trace_uprobe.c
trace_event_perf.c trace_functions.c trace_nop.c trace_sched_wakeup.c
trace_events.c trace_functions_graph.c trace_osnoise.c trace_selftest.c
For example, by reading the comments in the
trace_clock.c source code, we can figure out what it is about:
I’m providing these examples to illustrate just how low the entrance barrier is for getting started with Linux kernel tracing.
Out of the available standard tracers, we’ll concentrate on
blk, which allows tracing block I/O operations.
So let’s take a deeper look!