Benchopt#
—A framework for reproducible, comparable benchmarks—
Benchopt is a benchmarking framework for machine learning and optimization. You bring datasets, metrics, and the methods to compare — Benchopt provides the loop that connects them and runs reproducible comparisons at scale. Out of the box:
Scale experiments: loop over grid of parameters, run in parallel locally or on HPC clusters, with native SLURM support.
Save time: cache results to avoid recomputing unchanged runs.
Trust comparisons: control randomness with seeds and stable protocols.
Integrate broadly: use implementations from Python, R, Julia, or binaries.
Share outcomes: merge and publish results from multiple runs, with easy interactive visualization.
Maintain and Extend: modular design to easily add new datasets, solvers, and metrics, and CI tools to test them.
Learn how to construct and run a benchmark with the following pages!
Get started
Install benchopt, run an existing benchmark, and write your own with minimal working examples
Benchmark workflow
Write an ML or optimization benchmark from scratch, run it, visualize it, and publish it
User guide
Full documentation of benchopt API and CLI
Examples
Gallery of use-cases crafted by the benchopt community
Tip
Want to create a new benchmark with benchopt? Use our templates to get started: ML benchmarks | Optimization benchmarks
Frequently asked questions (FAQ)#
Is benchopt only for optimization? (No!)
No! Despite its name, Benchopt supports both machine learning, optimization and infrastructure benchmarks. Most features in benchopt are common to both types of benchmarks. The main difference lies in how performance is evaluated:
Optimization benchmarks: Track iterative solvers’ convergence over time and iterations.
ML benchmarks: Compare estimators/models on prediction tasks (accuracy, F1, etc.).
Use our ML template to get started quickly.
Is benchopt restricted to a fixed set of benchmarks?
No! Benchopt is a framework to write and run benchmarks, not a fixed set of benchmarks. You can create your own benchmark by writing a few lines of code, and share it with the community. See Write a benchmark for a complete guide on how to write a benchmark.
Can I run a benchmark in parallel and use a cache?
Benchopt allows to run different benchmarked methods in parallel, either with joblib using -j 4 to run on multiple CPUs of a single machine or with more advanced distributed backend, detailed in Distributed run with Benchopt.
Moreover, benchopt caches results natively to avoid wasteful recomputation. The cache is automatically invalidated when the code of a solver, dataset, or objective changes. You can also bypass the cache with --no-cache to force re-running all combinations.
How can I write a benchmark?
Learn how to Write a benchmark, including creating an objective, a solver, and a dataset.
Also take a look at our template repository for Optimization and ML to easily start a new benchmark.
How can I reuse code in a benchmark?
For some solvers and datasets, it is handy to share some operations or pre-processing steps. Benchopt allows to factorize this code by Reusing some code in a benchmark.
Example benchmarks#
Reproducing an existing benchmark should be as easy as running the following commands:
A benchmark comparing various optimizers on training NanoGPT models.
git clone https://github.com/benchopt/benchmark_nanogpt.git
benchopt run benchmark_nanogpt
which will produce an interactive HTML report to visualize the results.
A benchmark comparing various methods for unsupervised domain adaptation.
git clone https://github.com/scikit-adaptation/skada-bench.git
benchopt run skada-bench --config configs/Simulated.yml --no-plot
which will produce a parquet file with the results that can be visualized
using instruction on the README.md of the scikit-adaptation/skada-bench.
A minimal benchmark comparing various solvers on a toy problem.
benchopt run examples/minimal_benchmark
which will produce an interactive HTML report to visualize the results.
These different tabs illustrate the diversity of benchmarks that can be built with benchopt, from deep learning optimization to more classical machine learning tasks.
There are already many Available benchmarks that have been created using benchopt.
Join the community#
Join benchopt discord server and get in touch with the community!
Feel free to drop a message to get help with running/constructing benchmarks or (why not) discuss new features to be added and future development directions that benchopt should take.
Citing Benchopt#
Benchopt is a continuous effort to make reproducible and transparent ML and optimization benchmarks. Join this endeavor! If you use benchopt in a scientific publication, please cite
@inproceedings{benchopt,
author = {Moreau, Thomas and Massias, Mathurin and Gramfort, Alexandre
and Ablin, Pierre and Bannier, Pierre-Antoine
and Charlier, Benjamin and Dagréou, Mathieu and Dupré la Tour, Tom
and Durif, Ghislain and F. Dantas, Cassio and Klopfenstein, Quentin
and Larsson, Johan and Lai, En and Lefort, Tanguy
and Malézieux, Benoit and Moufad, Badr and T. Nguyen, Binh and Rakotomamonjy,
Alain and Ramzi, Zaccharie and Salmon, Joseph and Vaiter, Samuel},
title = {Benchopt: Reproducible, efficient and collaborative optimization benchmarks},
year = {2022},
booktitle = {NeurIPS},
url = {https://arxiv.org/abs/2206.13424}
}