Benchopt: Benchmark repository for optimization

Test Status Python 3.6+ codecov

Benchopt is a package to make the comparison of optimizations algorithms simple, transparent and reproducible.

It is written in Python but is available with many programming languages. So far it has been tested with Python, R, Julia and compiled binaries written in C/C++ available via a terminal command. If a solver can be installed via conda, it should just work in benchopt!

Benchopt is used through a command line as documented in the Command line interface (CLI). Once benchopt is installed, running and replicating an optimization benchmark is as simple as doing:

git clone https://github.com/benchopt/benchmark_logreg_l2
benchopt install --env ./benchmark_logreg_l2
benchopt run --env ./benchmark_logreg_l2

Running these commands will fetch the benchmark files, install the benchmark requirements in a dedicated environment called benchopt_benchmark_logreg_l2 and give you a benchmark plot on l2-regularized logistic regression:

_images/sphx_glr_plot_run_benchmark_003.png

Install

This package can be installed through pip. In order to allow benchopt to automatically install solvers dependencies, the install needs to be done in a conda environment.

conda create -n benchopt python
conda activate benchopt

To get the latest release, use:

pip install benchopt

And to get the latest development version, you can use:

pip install -U -i https://test.pypi.org/simple/ benchopt

This will install the command line tool to run the benchmark. Then, existing benchmarks can be retrieved from GitHub or created locally. To discover which benchmarks are presently available look for benchmark_* repositories on GitHub, such as for Lasso – l1-regularized linear regression. This benchmark can be retrieved locally with:

git clone https://github.com/benchopt/benchmark_lasso.git

Run a benchmark

This section illustrates benchopt’s command line interface on the Lasso benchmark; the syntax is applicable to any benchmark. All this section assumes that you are in the parent folder of the benchmark_lasso folder. The --env flag specifies that everything is run in the benchopt_benchmark_lasso conda environment.

Installing benchmark dependencies: benchopt exposes a CLI to install solvers’ dependencies automatically. It only works inside a conda environment. To install all requirements of the benchmark, make sure a conda environment is activated and run:

benchopt install --env ./benchmark_lasso

Run a benchmark: to run benchmarks on all datasets and with all solvers, run:

benchopt run --env ./benchmark_lasso

The command benchopt run can also be used outside of a conda environment without the flag -e/--env. In that case, the benchmark will only run solvers that are currently installed.

Run only some solvers and datasets: to run only the sklearn and celer solvers, on the simulated and finance datasets, run:

benchopt run --env ./benchmark_lasso -s sklearn -s celer -d simulated -d finance

Run a solver or dataset with specific parameters: some solvers and datasets have parameters; by default all combinations are run. If you want to run a specific configuration, pass it explicitly, e.g., to run the python-pgd solver only with its parameter use_acceleration set to True, use:

benchopt run --env ./benchmark_lasso -s python-pgd[use_acceleration=True]

Set the number of repetitions: the benchmark are repeated 5 times by default for greater precision. To run the benchmark 10 times, run:

benchopt run --env ./benchmark_lasso -r 10

Passing option through configuration file: all options of benchopt run can be passed through a YAML configuration file, together with --config <configuration_file_name.yml>. The options are defined using the same name as the CLI options. An example of configuration file is:

objective-filter:
  - Lasso Regression[fit_intercept=False,reg=0.5]
dataset:
  - simulated
  - leukemia
solver:
  - celer
force-solver:
  - cd
n-repetitions: 1

When options are passed both via file and CLI, the CLI takes precedence.

Getting help: use

benchopt run -h

to get more details about the different options. You can also read the Command line interface (CLI).

Benchmark results

All the public benchmark results are available at Benchopt Benchmarks results.

Publish results: You can directly publish the result of a run of benchopt on Benchopt Benchmarks results. You can have a look at this page to Publish benchmark results.

Frequently asked questions (FAQ)

Write a benchmark

Learn how to Write a benchmark, including creating an objective, a solver, and a dataset.

Curve sampling

Benchopt allows to sample both black-boxed solvers and solvers that allow for callbacks. Learn How are the convergence curves sampled?. Note that the sampling strategy can also be tweaked on a per-solver basis, as described in: Changing the strategy to grow the stop_val.

Re-using code in a benchmark

For some solver and datasets, it is necessary to share some operations or pre-processing steps. Benchopt allows to factorize this code by Reusing some code in a benchmark.

Parallel run

Benchopt allows to run different benchmarked methods in parallel, either with joblib using -j 4 to run on multiple CPUs of a single machine or using SLURM, as described in Running the benchmark on a SLURM cluster.

Citing Benchopt

If you use Benchopt in a scientific publication, please cite the following paper

@inproceedings{benchopt,
   author = {Moreau, Thomas and Massias, Mathurin and Gramfort, Alexandre and Ablin, Pierre
             and Bannier, Pierre-Antoine and Charlier, Benjamin and Dagréou, Mathieu and Dupré la Tour, Tom
             and Durif, Ghislain and F. Dantas, Cassio and Klopfenstein, Quentin
             and Larsson, Johan and Lai, En and Lefort, Tanguy and Malézieux, Benoit
             and Moufad, Badr and T. Nguyen, Binh and Rakotomamonjy, Alain and Ramzi, Zaccharie
             and Salmon, Joseph and Vaiter, Samuel},
   title  = {Benchopt: Reproducible, efficient and collaborative optimization benchmarks},
   year   = {2022},
   booktitle = {NeurIPS},
   url    = {https://arxiv.org/abs/2206.13424}
}

Other functionalities

How to contribute

Bug report and feature request

We use benchopt GitHub repo to track all bugs and feature requests; feel free to open an issue if you have found a bug or wish to see a feature implemented.

Code contribution

The preferred way to contribute to benchopt is to fork the main repository on GitHub, then submit a “Pull Request” (PR).

In the first few steps, we explain how to locally install benchopt, and how to set up your git repository:

  1. Create an account on GitHub if you do not already have one.

  2. Fork the project repository: click on the ‘Fork’ button near the top of the page. This creates a copy of the code under your account on the GitHub user account. For more details on how to fork a repository see this guide.

  3. Clone your fork of the benchopt repo from your GitHub account to your local disk:

    git clone git@github.com:YourLogin/benchopt.git
    cd benchopt
    
  1. Add the upstream remote. This saves a reference to the main benchopt repository, which you can use to keep your repository synchronized with the latest changes:

    git remote add upstream https://github.com/benchopt/benchopt
    
  2. Check that the upstream and origin remote aliases are configured correctly by running git remote -v which should display:

    origin  git@github.com:YourLogin/benchopt.git (fetch)
    origin  git@github.com:YourLogin/benchopt.git (push)
    upstream        https://github.com/benchopt/benchopt (fetch)
    upstream        https://github.com/benchopt/benchopt (push)
    

You should now have a working installation of benchopt, and your git repository properly configured. The next steps now describe the process of modifying code and submitting a PR:

  1. Synchronize your main branch with the upstream/main branch, more details on GitHub Docs:

    git switch main
    git fetch upstream
    git merge upstream/main
    
  2. Create a feature branch to hold your development changes:

    git switch -c my_feature
    

    and start making changes. Always use a feature branch. It’s good practice to never work on the main branch!

  3. Develop the feature on your feature branch on your computer, using Git to do the version control. When you’re done editing, add changed files using git add and then git commit:

    git add modified_files
    git commit
    

    to record your changes in Git, then push the changes to your GitHub account with:

    git push -u origin my_feature
    
  4. Follow these instructions to create a pull request from your fork.

Note

It is often helpful to keep your local feature branch synchronized with the latest changes of the main benchopt repository:

git fetch upstream
git merge upstream/main

Documentation

We are glad to accept any sort of documentation: function docstrings, reStructuredText documents (like this one), tutorials, etc. reStructuredText documents live in the source code repository under the doc/ directory.

You can edit the documentation using any text editor, and then generate the HTML output by typing, in a shell:

pip install benchopt[doc]
cd doc/
make html
firefox _build/html/index.html

Some available benchmarks

Note

Some benchmarks are briefly described in the list below. For a complete list of benchmarks, see GitHub repositories of the form benchopt/benchmark_*.

Notation: In what follows, \(n\) (or n_samples) stands for the number of samples and \(p\) (or n_features) stands for the number of features.

\[y \in \mathbb{R}^n, X = [x_1^\top, \dots, x_n^\top]^\top \in \mathbb{R}^{n \times p}\]
\[\min_w \frac{1}{2} \|y - Xw\|^2_2\]
\[\min_{w \geq 0} \frac{1}{2} \|y - Xw\|^2_2\]
\[\min_w \frac{1}{2} \|y - Xw\|^2_2 + \lambda \|w\|_1\]
\[\min_w \sum_{i=1}^{n} \log(1 + \exp(-y_i x_i^\top w)) + \frac{\lambda}{2} \|w\|_2^2\]
\[\min_w \sum_{i=1}^{n} \log(1 + \exp(-y_i x_i^\top w)) + \lambda \|w\|_1\]
\[\min_{w, \sigma} {\sum_{i=1}^n \left(\sigma + H_{\epsilon}\left(\frac{X_{i}w - y_{i}}{\sigma}\right)\sigma\right) + \lambda {\|w\|_2}^2}\]

where

\[\begin{split}H_{\epsilon}(z) = \begin{cases} z^2, & \text {if } |z| < \epsilon, \\ 2\epsilon|z| - \epsilon^2, & \text{otherwise} \end{cases}\end{split}\]
\[\min_{w} \frac{1}{n} \sum_{i=1}^{n} PB_q(y_i - X_i w) + \lambda ||w||_1.\]

where \(PB_q\) is the pinball loss:

\[\begin{split}PB_q(t) = q \max(t, 0) + (1 - q) \max(-t, 0) = \begin{cases} q t, & t > 0, \\ 0, & t = 0, \\ (q - 1) t, & t < 0 \end{cases}\end{split}\]

Given some data \(X \in \mathbb{R}^{d \times n}\) assumed to be linearly related to unknown independent sources \(S \in \mathbb{R}^{d \times n}\) with

\[X = A S\]

where \(A \in \mathbb{R}^{d \times d}\) is also unknown, the objective of linear ICA is to recover \(A\) up to permutation and scaling of its columns. The objective in this benchmark is related to some estimation on \(A\) quantified with the so-called AMARI distance.

Given n square symmetric positive matrices \(C^i\), it consists of solving the following problem:

\[\min_B \frac{1}{2n} \sum_{i=1}^n \log |\textrm{diag} (B C^i B^{\top}) | - \log | B C^i B^{\top} |\]

where \(|\cdot|\) stands for the matrix determinant and \(\textrm{diag}\) stands for the operator that keeps only the diagonal elements of a matrix. Optionally, the matrix \(B\) can be enforced to be orthogonal.

See benchmark_* repositories on GitHub for more.

Website contents