Benchopt#

—A framework for reproducible, comparable benchmarks—

Python 3.10+ PyPI version License

Benchopt is a benchmarking framework for machine learning and optimization. You bring datasets, metrics, and the methods to compare — Benchopt provides the loop that connects them and runs reproducible comparisons at scale. Out of the box:

  • Scale experiments: loop over grid of parameters, run in parallel locally or on HPC clusters, with native SLURM support.

  • Save time: cache results to avoid recomputing unchanged runs.

  • Trust comparisons: control randomness with seeds and stable protocols.

  • Integrate broadly: use implementations from Python, R, Julia, or binaries.

  • Share outcomes: merge and publish results from multiple runs, with easy interactive visualization.

  • Maintain and Extend: modular design to easily add new datasets, solvers, and metrics, and CI tools to test them.

Learn how to construct and run a benchmark with the following pages!

Get started

Install benchopt, run an existing benchmark, and write your own with minimal working examples

Get started

Benchmark workflow

Write an ML or optimization benchmark from scratch, run it, visualize it, and publish it

Benchmark workflow

User guide

Full documentation of benchopt API and CLI

User guide

Examples

Gallery of use-cases crafted by the benchopt community

Examples Gallery

Tip

Want to create a new benchmark with benchopt? Use our templates to get started: ML benchmarks | Optimization benchmarks

Frequently asked questions (FAQ)#

Is benchopt only for optimization? (No!)

No! Despite its name, Benchopt supports both machine learning, optimization and infrastructure benchmarks. Most features in benchopt are common to both types of benchmarks. The main difference lies in how performance is evaluated:

  • Optimization benchmarks: Track iterative solvers’ convergence over time and iterations.

  • ML benchmarks: Compare estimators/models on prediction tasks (accuracy, F1, etc.).

Use our ML template to get started quickly.

Is benchopt restricted to a fixed set of benchmarks?

No! Benchopt is a framework to write and run benchmarks, not a fixed set of benchmarks. You can create your own benchmark by writing a few lines of code, and share it with the community. See Write a benchmark for a complete guide on how to write a benchmark.

Can I run a benchmark in parallel and use a cache?

Benchopt allows to run different benchmarked methods in parallel, either with joblib using -j 4 to run on multiple CPUs of a single machine or with more advanced distributed backend, detailed in Distributed run with Benchopt.

Moreover, benchopt caches results natively to avoid wasteful recomputation. The cache is automatically invalidated when the code of a solver, dataset, or objective changes. You can also bypass the cache with --no-cache to force re-running all combinations.

How can I write a benchmark?

Learn how to Write a benchmark, including creating an objective, a solver, and a dataset.

Also take a look at our template repository for Optimization and ML to easily start a new benchmark.

How can I reuse code in a benchmark?

For some solvers and datasets, it is handy to share some operations or pre-processing steps. Benchopt allows to factorize this code by Reusing some code in a benchmark.

Example benchmarks#

Reproducing an existing benchmark should be as easy as running the following commands:

A benchmark comparing various optimizers on training NanoGPT models.

git clone https://github.com/benchopt/benchmark_nanogpt.git
benchopt run benchmark_nanogpt

which will produce an interactive HTML report to visualize the results.

A benchmark comparing various methods for unsupervised domain adaptation.

git clone https://github.com/scikit-adaptation/skada-bench.git
benchopt run skada-bench --config configs/Simulated.yml --no-plot

which will produce a parquet file with the results that can be visualized using instruction on the README.md of the scikit-adaptation/skada-bench.

A minimal benchmark comparing various solvers on a toy problem.

benchopt run examples/minimal_benchmark

which will produce an interactive HTML report to visualize the results.

These different tabs illustrate the diversity of benchmarks that can be built with benchopt, from deep learning optimization to more classical machine learning tasks.

There are already many Available benchmarks that have been created using benchopt.

Join the community#

Join benchopt discord server and get in touch with the community!

Feel free to drop a message to get help with running/constructing benchmarks or (why not) discuss new features to be added and future development directions that benchopt should take.

Citing Benchopt#

Benchopt is a continuous effort to make reproducible and transparent ML and optimization benchmarks. Join this endeavor! If you use benchopt in a scientific publication, please cite

@inproceedings{benchopt,
   author    = {Moreau, Thomas and Massias, Mathurin and Gramfort, Alexandre
                and Ablin, Pierre and Bannier, Pierre-Antoine
                and Charlier, Benjamin and Dagréou, Mathieu and Dupré la Tour, Tom
                and Durif, Ghislain and F. Dantas, Cassio and Klopfenstein, Quentin
                and Larsson, Johan and Lai, En and Lefort, Tanguy
                and Malézieux, Benoit and Moufad, Badr and T. Nguyen, Binh and Rakotomamonjy,
                Alain and Ramzi, Zaccharie and Salmon, Joseph and Vaiter, Samuel},
   title     = {Benchopt: Reproducible, efficient and collaborative optimization benchmarks},
   year      = {2022},
   booktitle = {NeurIPS},
   url       = {https://arxiv.org/abs/2206.13424}
}