# Benchstat > Benchstat computes statistical summaries and A/B comparisons of Go benchmarks. --- # Source: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat # Benchstat: Statistical Analysis for Go Benchmarks Benchstat computes statistical summaries and A/B comparisons of Go benchmarks. ## Overview Benchstat is a command-line tool from the golang.org/x/perf package that analyzes Go benchmark results. It provides statistical summaries with confidence intervals and performs A/B comparisons to determine if performance changes are statistically significant. ## Usage ``` benchstat [flags] inputs... ``` Each input file should be in the Go benchmark format (https://golang.org/design/14313-benchmark-format), such as the output of `go test -bench .`. ## Getting Started ### Installation ```bash go install golang.org/x/perf/cmd/benchstat@latest ``` ### Basic Example Collect benchmark results before a change: ```bash go test -run='^$' -bench=. -count=10 > old.txt ``` Collect benchmark results after the change: ```bash go test -run='^$' -bench=. -count=10 > new.txt ``` Compare the results: ```bash benchstat old.txt new.txt ``` ### Benchmark File Format Example benchmark file format: ``` goos: linux goarch: amd64 pkg: golang.org/x/perf/cmd/benchstat/testdata BenchmarkEncode/format=json-48 690848 1726 ns/op BenchmarkEncode/format=json-48 684861 1723 ns/op BenchmarkEncode/format=json-48 693285 1707 ns/op BenchmarkEncode/format=gob-48 372699 3069 ns/op BenchmarkEncode/format=gob-48 394740 3075 ns/op ``` The order of lines in the file does not matter, except that the output lists benchmarks in order of appearance. ## Output Format ### Example Comparison Output ``` $ benchstat old.txt new.txt goos: linux goarch: amd64 pkg: golang.org/x/perf/cmd/benchstat/testdata │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Encode/format=json-48 1.718µ ± 1% 1.423µ ± 1% -17.20% (p=0.000 n=10) Encode/format=gob-48 3.066µ ± 0% 3.070µ ± 2% ~ (p=0.446 n=10) geomean 2.295µ 2.090µ -8.94% ``` ### Understanding the Output - **Median**: Shows the median value with 95% confidence interval - **vs base**: Percentage change compared to baseline - **p-value**: Probability that differences are due to random chance (lower = more significant) - **n**: Number of samples from each input file - **~**: No statistically significant difference detected ### Geomean Row The last row shows the geometric mean of each column, giving an overall picture of how benchmarks changed. Proportional changes in the geomean reflect proportional changes in the benchmarks. For n benchmarks, if sec/op for one increases by a factor of 2, then the sec/op geomean will increase by a factor of ⁿ√2. ## Filtering Benchstat has a flexible filtering system to configure which benchmarks are summarized and compared. Use the `-filter` flag to filter inputs. ### Filter Syntax Basic filter terms: ``` key:value - Match if key equals value key:"value" - Value is a double-quoted string (can contain spaces) "key":value - Keys may also be double-quoted key:/regexp/ - Match if key matches a regular expression key:(val1 OR val2 OR ...) - Short-hand for key:val1 OR key:val2 * - Match everything ``` Combining terms: ``` x y ... - Match if x, y, etc. all match x AND y - Same as x y x OR y - Match if x or y match -x - Match if x does not match (...) - Subexpression ``` ### Filter Keys - `.name` - The base name of a benchmark - `.fullname` - The full name of a benchmark (including configuration) - `.file` - The name of the input file or user-provided file label - `/{name-key}` - Per-benchmark sub-name configuration key - `{file-key}` - File-level configuration key - `.unit` - The name of a unit for a particular metric ### Filter Example ```bash benchstat -filter "/format:json goos:linux .unit:(ns/op OR B/op)" old.txt new.txt ``` This matches benchmarks with "/format=json" in the sub-name keys, file-level configuration "goos" equal to "linux", and extracts "ns/op" and "B/op" measurements. ## Configuring Comparisons Configure how benchstat groups and compares results using flags: - `-table KEYS` - Group results into tables by KEYS (default: `.config`) - `-row KEYS` - Group results into table rows by KEYS (default: `.fullname`) - `-col KEYS` - Compare across columns with different values of KEYS (default: `.file`) - `-ignore KEYS` - Keys to ignore when grouping results Each KEYS argument is a comma- or space-separated list of keys. ### Comparison Keys - `.name` - The base name of a benchmark - `.fullname` - The full name of a benchmark (including configuration) - `.file` - The name of the input file or user-provided file label - `/{name-key}` - Per-benchmark sub-name configuration key - `{file-key}` - File-level configuration key - `.config` - All file-level configuration keys ### Projection Examples #### Default Projection ```bash benchstat -table .config -row .fullname -col .file old.txt new.txt ``` Groups all benchmarks into one table (when they have same config), rows by full name, columns by file. #### Compare Encoding Formats Compare json encoding to gob encoding from the same file: ```bash benchstat -col /format new.txt ``` Output: ``` goos: linux goarch: amd64 pkg: golang.org/x/perf/cmd/benchstat/testdata │ json │ gob │ │ sec/op │ sec/op vs base │ Encode-48 1.423µ ± 1% 3.070µ ± 2% +115.82% (p=0.000 n=10) ``` #### Simplify by Benchmark Name ```bash benchstat -col /format -row .name new.txt ``` Groups rows by benchmark name rather than full name. #### Warning on Information Loss ```bash benchstat -row .name new.txt ``` Output: ``` goos: linux goarch: amd64 pkg: golang.org/x/perf/cmd/benchstat/testdata │ new.txt │ │ sec/op │ Encode 2.253µ ± 37% ¹ ¹ benchmarks vary in .fullname ``` Benchstat warns when projections strip away information, indicating that results were grouped in a potentially meaningless way. ## Sorting By default, benchstat sorts by order of first observation. Override using the following syntax: - `{key}@{order}` - Specifies built-in sort order: "alpha" (alphabetic) or "num" (numeric, understands metric/IEC prefixes like "2k", "1Mi") - `{key}@({value} {value} ...)` - Specifies a fixed value order ### Sorting Example Compare json improvement over gob: ```bash benchstat -col "/format@(gob json)" -row .name -ignore .file new.txt ``` Output: ``` goos: linux goarch: amd64 pkg: golang.org/x/perf/cmd/benchstat/testdata │ gob │ json │ │ sec/op │ sec/op vs base │ Encode 3.070µ ± 2% 1.423µ ± 1% -53.66% (p=0.000 n=10) ``` ## Custom File Labels Override file name labels by specifying input as `label=path` instead of just `path`. ```bash benchstat O=old.txt N=new.txt ``` Output: ``` goos: linux goarch: amd64 pkg: golang.org/x/perf/cmd/benchstat/testdata │ O │ N │ │ sec/op │ sec/op vs base │ Encode/format=json-48 1.718µ ± 1% 1.423µ ± 1% -17.20% (p=0.000 n=10) Encode/format=gob-48 3.066µ ± 0% 3.070µ ± 2% ~ (p=0.446 n=10) geomean 2.295µ 2.090µ -8.94% ``` ## Units Benchstat normalizes units: - "ns" → "sec" - "MB" → "B" This avoids creating nonsense units like "µns/op" that appear in Go's default metrics. ### Custom Unit Metadata Benchstat supports custom unit metadata for controlling statistics (see https://golang.org/design/14313-benchmark-format). The "assume" metadata is useful for controlling statistics: - **assume=nothing** (default): Non-parametric statistics - Median for summaries - Mann-Whitney U-test for A/B comparisons - **assume=exact**: For measurements with no noise (e.g., binary sizes) - Warns if there's any variation in measured values - Shows A/B comparisons even with single before/after measurement ## Best Practices for Benchmarking ### Run Count - Typically run benchmarks **at least 10 times** - More runs provide statistically significant results - Ideally, use **20+ runs** - Pick a number and stick to it ### Noise Reduction To reduce noise and get more reliable results: 1. **Run on idle machine** - Stop unnecessary background processes 2. **Disable power management** - Avoid battery mode and thermal throttling 3. **Interleave runs** - Mix before and after runs rather than running all before, then all after 4. **Pre-compile benchmarks** - Use `go test -c` to compile benchmark binary first 5. **See https://llvm.org/docs/Benchmarking.html** - Many additional tips on reducing benchmark noise ### Statistical Considerations 1. **Avoid multiple testing** - Don't rerun benchmarks until you see a significant change - Default α threshold is 0.05 (5% expected false positive rate) - Rerunning creates statistical bias 2. **Expect false positives** - With large numbers of benchmarks, ~5% will show "significant" differences even without actual changes 3. **Distinguish between significance and magnitude** - Statistically significant ≠ large change - With low-noise data, even tiny changes can be statistically significant - Large changes are easier to distinguish from noise ### Benchmark File Order The order of lines in a benchmark file does not matter, except that output lists benchmarks in order of appearance. ## Advanced Usage ### Multiple Comparisons Compare multiple files at once: ```bash benchstat old.txt new1.txt new2.txt ``` This creates separate columns for each comparison file. ### Filtering and Comparing Combine filtering with comparison: ```bash benchstat -filter "goos:linux" old.txt new.txt ``` ### Complex Projections For detailed statistics reference, see https://pkg.go.dev/golang.org/x/perf/benchproc/syntax ## Common Workflows ### Before/After Analysis ```bash # Collect baseline go test -run='^$' -bench=BenchmarkEncode -count=20 > baseline.txt # Make changes # ... edit code ... # Collect new results go test -run='^$' -bench=BenchmarkEncode -count=20 > current.txt # Compare benchstat baseline.txt current.txt ``` ### Compare Implementations ```bash # Collect results from different implementations benchstat impl1.txt impl2.txt impl3.txt ``` ### Filter by Platform ```bash benchstat -table "goos,goarch" -col .file old.txt new.txt ``` Creates separate tables for each operating system and architecture combination. ## Related Tools - **benchproc** - Processing and filtering benchmark results - **benchmath** - Mathematical functions for benchmark statistics - **See https://pkg.go.dev/golang.org/x/perf** - Full golang.org/x/perf documentation ## Source - **Repository**: https://github.com/golang/perf - **Package**: golang.org/x/perf/cmd/benchstat - **Documentation**: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat - **Design Document**: https://golang.org/design/14313-benchmark-format