linux-zen-server/tools/perf/Documentation/perf-bench.txt

perf-bench(1)
=============

NAME
----
perf-bench - General framework for benchmark suites

SYNOPSIS
--------
[verse]
'perf bench' [<common options>] <subsystem> <suite> [<options>]

DESCRIPTION
-----------
This 'perf bench' command is a general framework for benchmark suites.

COMMON OPTIONS
--------------
-r::
--repeat=::
Specify number of times to repeat the run (default 10).

-f::
--format=::
Specify format style.
Current available format styles are:

'default'::
Default style. This is mainly for human reading.
---------------------
% perf bench sched pipe                      # with no style specified
(executing 1000000 pipe operations between two tasks)
        Total time:5.855 sec
                5.855061 usecs/op
		170792 ops/sec
---------------------

'simple'::
This simple style is friendly for automated
processing by scripts.
---------------------
% perf bench --format=simple sched pipe      # specified simple
5.988
---------------------

SUBSYSTEM
---------

'sched'::
	Scheduler and IPC mechanisms.

'syscall'::
	System call performance (throughput).

'mem'::
	Memory access performance.

'numa'::
	NUMA scheduling and MM benchmarks.

'futex'::
	Futex stressing benchmarks.

'epoll'::
	Eventpoll (epoll) stressing benchmarks.

'internals'::
	Benchmark internal perf functionality.

'all'::
	All benchmark subsystems.

SUITES FOR 'sched'
~~~~~~~~~~~~~~~~~~
*messaging*::
Suite for evaluating performance of scheduler and IPC mechanisms.
Based on hackbench by Rusty Russell.

Options of *messaging*
^^^^^^^^^^^^^^^^^^^^^^
-p::
--pipe::
Use pipe() instead of socketpair()

-t::
--thread::
Be multi thread instead of multi process

-g::
--group=::
Specify number of groups

-l::
--nr_loops=::
Specify number of loops

Example of *messaging*
^^^^^^^^^^^^^^^^^^^^^^

---------------------
% perf bench sched messaging                 # run with default
options (20 sender and receiver processes per group)
(10 groups == 400 processes run)

      Total time:0.308 sec

% perf bench sched messaging -t -g 20        # be multi-thread, with 20 groups
(20 sender and receiver threads per group)
(20 groups == 800 threads run)

      Total time:0.582 sec
---------------------

*pipe*::
Suite for pipe() system call.
Based on pipe-test-1m.c by Ingo Molnar.

Options of *pipe*
^^^^^^^^^^^^^^^^^
-l::
--loop=::
Specify number of loops.

Example of *pipe*
^^^^^^^^^^^^^^^^^

---------------------
% perf bench sched pipe
(executing 1000000 pipe operations between two tasks)

        Total time:8.091 sec
                8.091833 usecs/op
                123581 ops/sec

% perf bench sched pipe -l 1000              # loop 1000
(executing 1000 pipe operations between two tasks)

        Total time:0.016 sec
                16.948000 usecs/op
                59004 ops/sec
---------------------

SUITES FOR 'syscall'
~~~~~~~~~~~~~~~~~~
*basic*::
Suite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics).
This uses a single thread simply doing getppid(2), which is a simple syscall where the result is not
cached by glibc.


SUITES FOR 'mem'
~~~~~~~~~~~~~~~~
*memcpy*::
Suite for evaluating performance of simple memory copy in various ways.

Options of *memcpy*
^^^^^^^^^^^^^^^^^^^
-l::
--size::
Specify size of memory to copy (default: 1MB).
Available units are B, KB, MB, GB and TB (case insensitive).

-f::
--function::
Specify function to copy (default: default).
Available functions are depend on the architecture.
On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.

-l::
--nr_loops::
Repeat memcpy invocation this number of times.

-c::
--cycles::
Use perf's cpu-cycles event instead of gettimeofday syscall.

*memset*::
Suite for evaluating performance of simple memory set in various ways.

Options of *memset*
^^^^^^^^^^^^^^^^^^^
-l::
--size::
Specify size of memory to set (default: 1MB).
Available units are B, KB, MB, GB and TB (case insensitive).

-f::
--function::
Specify function to set (default: default).
Available functions are depend on the architecture.
On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.

-l::
--nr_loops::
Repeat memset invocation this number of times.

-c::
--cycles::
Use perf's cpu-cycles event instead of gettimeofday syscall.

SUITES FOR 'numa'
~~~~~~~~~~~~~~~~~
*mem*::
Suite for evaluating NUMA workloads.

SUITES FOR 'futex'
~~~~~~~~~~~~~~~~~~
*hash*::
Suite for evaluating hash tables.

*wake*::
Suite for evaluating wake calls.

*wake-parallel*::
Suite for evaluating parallel wake calls.

*requeue*::
Suite for evaluating requeue calls.

*lock-pi*::
Suite for evaluating futex lock_pi calls.

SUITES FOR 'epoll'
~~~~~~~~~~~~~~~~~~
*wait*::
Suite for evaluating concurrent epoll_wait calls.

*ctl*::
Suite for evaluating multiple epoll_ctl calls.

SUITES FOR 'internals'
~~~~~~~~~~~~~~~~~~~~~~
*synthesize*::
Suite for evaluating perf's event synthesis performance.

SEE ALSO
--------
linkperf:perf[1]
Initial commit 2023-08-30 17:53:23 +02:00			`perf-bench(1)`
			`=============`

			`NAME`
			`----`
			`perf-bench - General framework for benchmark suites`

			`SYNOPSIS`
			`--------`
			`[verse]`
			`'perf bench' [<common options>] <subsystem> <suite> [<options>]`

			`DESCRIPTION`
			`-----------`
			`This 'perf bench' command is a general framework for benchmark suites.`

			`COMMON OPTIONS`
			`--------------`
			`-r::`
			`--repeat=::`
			`Specify number of times to repeat the run (default 10).`

			`-f::`
			`--format=::`
			`Specify format style.`
			`Current available format styles are:`

			`'default'::`
			`Default style. This is mainly for human reading.`
			`---------------------`
			`% perf bench sched pipe # with no style specified`
			`(executing 1000000 pipe operations between two tasks)`
			`Total time:5.855 sec`
			`5.855061 usecs/op`
			`170792 ops/sec`
			`---------------------`

			`'simple'::`
			`This simple style is friendly for automated`
			`processing by scripts.`
			`---------------------`
			`% perf bench --format=simple sched pipe # specified simple`
			`5.988`
			`---------------------`

			`SUBSYSTEM`
			`---------`

			`'sched'::`
			`Scheduler and IPC mechanisms.`

			`'syscall'::`
			`System call performance (throughput).`

			`'mem'::`
			`Memory access performance.`

			`'numa'::`
			`NUMA scheduling and MM benchmarks.`

			`'futex'::`
			`Futex stressing benchmarks.`

			`'epoll'::`
			`Eventpoll (epoll) stressing benchmarks.`

			`'internals'::`
			`Benchmark internal perf functionality.`

			`'all'::`
			`All benchmark subsystems.`

			`SUITES FOR 'sched'`
			`~~~~~~~~~~~~~~~~~~`
			`messaging::`
			`Suite for evaluating performance of scheduler and IPC mechanisms.`
			`Based on hackbench by Rusty Russell.`

			`Options of messaging`
			`^^^^^^^^^^^^^^^^^^^^^^`
			`-p::`
			`--pipe::`
			`Use pipe() instead of socketpair()`

			`-t::`
			`--thread::`
			`Be multi thread instead of multi process`

			`-g::`
			`--group=::`
			`Specify number of groups`

			`-l::`
			`--nr_loops=::`
			`Specify number of loops`

			`Example of messaging`
			`^^^^^^^^^^^^^^^^^^^^^^`

			`---------------------`
			`% perf bench sched messaging # run with default`
			`options (20 sender and receiver processes per group)`
			`(10 groups == 400 processes run)`

			`Total time:0.308 sec`

			`% perf bench sched messaging -t -g 20 # be multi-thread, with 20 groups`
			`(20 sender and receiver threads per group)`
			`(20 groups == 800 threads run)`

			`Total time:0.582 sec`
			`---------------------`

			`pipe::`
			`Suite for pipe() system call.`
			`Based on pipe-test-1m.c by Ingo Molnar.`

			`Options of pipe`
			`^^^^^^^^^^^^^^^^^`
			`-l::`
			`--loop=::`
			`Specify number of loops.`

			`Example of pipe`
			`^^^^^^^^^^^^^^^^^`

			`---------------------`
			`% perf bench sched pipe`
			`(executing 1000000 pipe operations between two tasks)`

			`Total time:8.091 sec`
			`8.091833 usecs/op`
			`123581 ops/sec`

			`% perf bench sched pipe -l 1000 # loop 1000`
			`(executing 1000 pipe operations between two tasks)`

			`Total time:0.016 sec`
			`16.948000 usecs/op`
			`59004 ops/sec`
			`---------------------`

			`SUITES FOR 'syscall'`
			`~~~~~~~~~~~~~~~~~~`
			`basic::`
			`Suite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics).`
			`This uses a single thread simply doing getppid(2), which is a simple syscall where the result is not`
			`cached by glibc.`


			`SUITES FOR 'mem'`
			`~~~~~~~~~~~~~~~~`
			`memcpy::`
			`Suite for evaluating performance of simple memory copy in various ways.`

			`Options of memcpy`
			`^^^^^^^^^^^^^^^^^^^`
			`-l::`
			`--size::`
			`Specify size of memory to copy (default: 1MB).`
			`Available units are B, KB, MB, GB and TB (case insensitive).`

			`-f::`
			`--function::`
			`Specify function to copy (default: default).`
			`Available functions are depend on the architecture.`
			`On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.`

			`-l::`
			`--nr_loops::`
			`Repeat memcpy invocation this number of times.`

			`-c::`
			`--cycles::`
			`Use perf's cpu-cycles event instead of gettimeofday syscall.`

			`memset::`
			`Suite for evaluating performance of simple memory set in various ways.`

			`Options of memset`
			`^^^^^^^^^^^^^^^^^^^`
			`-l::`
			`--size::`
			`Specify size of memory to set (default: 1MB).`
			`Available units are B, KB, MB, GB and TB (case insensitive).`

			`-f::`
			`--function::`
			`Specify function to set (default: default).`
			`Available functions are depend on the architecture.`
			`On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.`

			`-l::`
			`--nr_loops::`
			`Repeat memset invocation this number of times.`

			`-c::`
			`--cycles::`
			`Use perf's cpu-cycles event instead of gettimeofday syscall.`

			`SUITES FOR 'numa'`
			`~~~~~~~~~~~~~~~~~`
			`mem::`
			`Suite for evaluating NUMA workloads.`

			`SUITES FOR 'futex'`
			`~~~~~~~~~~~~~~~~~~`
			`hash::`
			`Suite for evaluating hash tables.`

			`wake::`
			`Suite for evaluating wake calls.`

			`wake-parallel::`
			`Suite for evaluating parallel wake calls.`

			`requeue::`
			`Suite for evaluating requeue calls.`

			`lock-pi::`
			`Suite for evaluating futex lock_pi calls.`

			`SUITES FOR 'epoll'`
			`~~~~~~~~~~~~~~~~~~`
			`wait::`
			`Suite for evaluating concurrent epoll_wait calls.`

			`ctl::`
			`Suite for evaluating multiple epoll_ctl calls.`

			`SUITES FOR 'internals'`
			`~~~~~~~~~~~~~~~~~~~~~~`
			`synthesize::`
			`Suite for evaluating perf's event synthesis performance.`

			`SEE ALSO`
			`--------`
			`linkperf:perf[1]`