Get started

Contents

Get started#

Installation

Benchopt can be installed directly with pip.

pip install benchopt

To gain access to certain features, in particular to install the requirements of community benchmarks, benchopt needs to be installed in a conda environment.

The recommended way to use benchopt is within a conda environment to fully benefit from all its features. Hence, start by creating a dedicated conda environment and then activate it.

conda create -n benchopt python
conda activate benchopt

Benchopt is available on PyPi. Get the stable version via pip by running:

pip install -U benchopt

This will install the command line tool to run and process the benchmark.

Run an existing benchmark

To run an existing benchmark, you need to get the code, for instance by cloning a repository, and then use benchopt to install the requirements and run the benchmark.

git clone https://github.com/benchopt/template_benchmark_ml.git
benchopt install template_benchmark_ml
benchopt run template_benchmark_ml

This will generate a dashboard with the results of the benchmark.

Note that benchopt install will only work if benchopt is installed in a conda environment.

Let’s get the first steps with benchopt by comparing some solvers of the Lasso problem on the Leukemia dataset.

Start by cloning the Lasso benchmark repository and cd into it.

git clone https://github.com/benchopt/benchmark_lasso.git
cd benchmark_lasso

Then, use benchopt to install the requirements for the solvers skglm, scikit-learn, and the dataset Leukemia.

benchopt install -s skglm -s sklearn -d leukemia

Finally, run the benchmark:

benchopt run . -s skglm -s sklearn -d leukemia

Note

To explore all benchopt CLI features, refer to Command line interface (CLI) or run benchopt --help or benchopt COMMAND_NAME --help.

When the run is finished, benchopt automatically opens a window in your default browser and renders the results as a dashboard.

Dashboard of the Lasso benchmark results

The dashboard displays benchmark-defined metrics tracked throughout the benchmark run such as the evolution of the objective value over time.

Create your own benchmark

Build your own benchmark by specifying:

  • Dataset: specifies how to load data,

  • Objective: defines the evaluation metrics,

  • Solver: implements the method to evaluate.

This same workflow can be reused for ML, optimization, or infrastructure.

A benchopt benchmark has three ingredients: one or more datasets, an objective (your metric or evaluation protocol), and one or more solvers (the evaluated methods). Each is a single Python file. Here is the minimal structure:

my_benchmark/
├── objective.py
├── datasets/
│   └── my_dataset.py
└── solvers/
    └── my_solver.py

The tabs below show minimal examples for three common use cases. Once you are ready to go further, the Benchmark workflow section covers advanced features such as parallelization, seed control, cross-validation, or convergence tracking depending on your use case.

You have a dataset, a metric, and methods to compare. This is the simplest case: each solver runs once to completion and is evaluated with a given metric.

datasets/my_dataset.py: specifies how to load data, and potentially split it into training and test sets.

from benchopt import BaseDataset

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split


class Dataset(BaseDataset):
    name = "My dataset"

    def get_data(self):
        # Replace with your own data loading/splitting logic
        X, y = make_classification()
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=0
        )
        # This dict's keys can be changed, e.g. to have dataloader instead
        return dict(
            X_train=X_train, y_train=y_train,
            X_test=X_test, y_test=y_test,
        )

objective.py: defines the evaluation metric(s) to compare methods.

from benchopt import BaseObjective

class Objective(BaseObjective):
    name = "My ML benchmark"
    sampling_strategy = "run_once"

    def set_data(self, X_train, y_train, X_test, y_test):
        self.X_train, self.y_train = X_train, y_train
        self.X_test, self.y_test = X_test, y_test

    def get_objective(self):
        return dict(X_train=self.X_train, y_train=self.y_train)

    def evaluate_result(self, model):
        return dict(test_score=model.score(self.X_test, self.y_test))

    def get_one_result(self):
        # for testing purpose, declare the minimal result to eval
        from sklearn.dummy import DummyClassifier
        return dict(
            model=DummyClassifier().fit(self.X_train, self.y_train)
        )

solvers/my_solver.py: implements the method to evaluate.

from benchopt import BaseSolver

class Solver(BaseSolver):
    name = "My solver"

    def set_objective(self, X_train, y_train):
        self.X_train, self.y_train = X_train, y_train

    def run(self, _):
        from sklearn.linear_model import LogisticRegression
        self.model = LogisticRegression().fit(
            self.X_train, self.y_train
        )

    def get_result(self):
        return dict(model=self.model)

With benchopt, you also get:

  • Reproducible runs (including controlled random seeds when relevant).

  • Parallel execution of solver/dataset combinations from the CLI.

  • Standardized result collection and comparison across methods.

You have an iterative solver and want to track its convergence over time. Benchopt will call run(n_iter) with increasing budgets and record the objective value at each step.

datasets/my_dataset.py

from benchopt import BaseDataset
import numpy as np

class Dataset(BaseDataset):
    name = "My dataset"

    def get_data(self):
        rng = np.random.default_rng(0)
        A = rng.standard_normal((100, 50))
        b = rng.standard_normal(100)
        return dict(A=A, b=b)

objective.py

from benchopt import BaseObjective
import numpy as np

class Objective(BaseObjective):
    name = "My optimization benchmark"

    def set_data(self, A, b):
        self.A, self.b = A, b

    def get_objective(self):
        return dict(A=self.A, b=self.b)

    def evaluate_result(self, x):
        residual = self.A @ x - self.b
        return dict(value=0.5 * np.dot(residual, residual))

    def get_one_result(self):
        # for testing purpose, declare the minimal result to eval
        return dict(x=np.zeros(self.A.shape[1]))

solvers/my_solver.py

from benchopt import BaseSolver
import numpy as np

class Solver(BaseSolver):
    name = "Gradient descent"

    def set_objective(self, A, b):
        self.A, self.b = A, b
        self.x = np.zeros(A.shape[1])

    def run(self, n_iter):
        grad = self.A.T @ (self.A @ self.x - self.b)
        step = 1 / np.linalg.norm(self.A) ** 2
        for _ in range(n_iter):
            self.x -= step * grad

    def get_result(self):
        return dict(x=self.x)

With benchopt, you also get:

  • Fair convergence comparisons using the same stopping budgets.

  • Cached datasets and validated solver/objective interfaces.

  • Automatic result aggregation and interactive plotting dashboards.

Note that benchopt also defines callbacks evaluate convergence curve at once. See 2. Using a callback for more details.

Benchopt is not limited to ML or optimization. You can benchmark infrastructure components such as data loading, preprocessing, or serving latency. Here is an example that measures dataloader throughput.

datasets/my_dataset.py

from benchopt import BaseDataset
import numpy as np

class Dataset(BaseDataset):
    name = "My dataset"
    parameters = {
        "data_size": [100_000],
    }

    def get_data(self):
        return dict(data=np.zeros((self.data_size, 100)))

objective.py

from benchopt import BaseObjective
import time

class Objective(BaseObjective):
    name = "Dataloader throughput"
    sampling_strategy = "run_once"

    parameters = {"batch_size": [64, 256]}

    def set_data(self, data):
        self.data = data

    def get_objective(self):
        return dict(data=self.data, batch_size=self.batch_size)

    def evaluate_result(self, dataloader):
        n_samples, t0 = 0, time.perf_counter()
        for batch in dataloader:
            n_samples += len(batch)

        runtime = time.perf_counter() - t0
        samples_per_second = n_samples / runtime
        return dict(
            samples_per_second=samples_per_second,
            runtime=runtime,
        )

    def get_one_result(self):
         # for testing purpose, declare the minimal result to eval
        return dict(dataloader=self.data)

solvers/my_solver.py

from benchopt import BaseSolver
import torch

class Solver(BaseSolver):
    name = "Pytorch dataloader"

    def set_objective(self, data, batch_size):
        self.data = torch.utils.data.TensorDataset(
            torch.from_numpy(data)
        )
        self.dataloader = torch.utils.data.DataLoader(
            self.data, batch_size=batch_size
        )

    def run(self, _):
       pass

    def get_result(self):
        return dict(dataloader=self.dataloader)

With benchopt, you also get:

  • Repeated, scriptable measurements from the same command-line workflow.

  • Easy parameter sweeps (for example batch_size, num_workers) and side-by-side comparisons.

  • Caching of generated data and benchmark outputs to avoid unnecessary reruns.

Once the benchmark is created, you can run it with:

benchopt run my_benchmark

Note

The examples above are intentionally minimal. The full Template benchmark ML shows additional features such as parameter sweeps, cross-validation splits, and dataset configuration – all optional, but useful as your benchmark grows.

What’s next?#