Write a benchmark#

A benchmark is composed of three elements: an objective function, a list of datasets, and a list of solvers.

A benchmark is defined in a folder that should respect a certain structure. For example

my_benchmark/
├── objective.py  # contains the definition of the objective
├── datasets/
│   ├── dataset1.py  # some dataset
│   └── dataset2.py  # some dataset
└── solvers/
    ├── solver1.py  # some solver
    └── solver2.py  # some solver

Examples of actual benchmarks are available in the benchopt organisation such as for Ordinary Least Square (OLS), Lasso or L1-regularized logistic regression.

Note

The simplest way to create a benchmark is to copy an existing folder and to adapt its content. A benchmark template is provided as a GitHub template repository here.

1. Objective#

The objective function is defined through a Python class, Objective, defined in objective.py. This class allows to monitor the quantities of interest along the iterations of the solvers. Typically it allows to evaluate the objective function to be minimized by the solvers. An objective class should define 4 methods:

  • set_data(**data): allows to specify the data. See the data as a dictionary of Python variables without any constraint. In the following example, the data contains only one variable X. This data is provided by the Dataset.get_data() method of a dataset_.

  • get_objective(): returns the information that each method will need to provide a result. The information is also passed as a dictionary, which will serve as input for the Solver.set_objective method of the solvers.

  • evaluate_result(X_hat): it allows to evaluate the output of the different methods, here called X_hat. This method should take a dictionary as input, which is provided by the Solver.get_result method. All other parameters should be stored in the class with the set_data method. evaluate_result should return a float (understood as the objective value) or a dictionary. If a dictionary is returned it should contain a key called value (the objective value) and all other keys should have float values allowing to track more than one value of interest (e.g. train and test errors).

  • get_one_result(): returns one solution that can be returned by a solver. This defines the shape of the solution and will be used to test that the benchmark works properly.

An objective class needs to inherit from a base class, benchopt.BaseObjective.

Note

Multiple metrics can be returned by Objective.evaluate_result as long as they are stored in a dictionary, with a key being value corresponding to the main metric to track.

Example#

from benchopt import BaseObjective
import numpy as np


class Objective(BaseObjective):
    # Name of the Objective function
    name = 'Quadratic'

    # The three methods below define the links between the Dataset,
    # the Objective and the Solver.
    def set_data(self, X):
        """Set the data from a Dataset to compute the objective.

        The argument are the keys of the dictionary returned by
        ``Dataset.get_data``.
        """
        self.X = X

    def get_objective(self):
        "Returns a dict passed to ``Solver.set_objective`` method."
        return dict(X=self.X)

    def evaluate_result(self, X_hat):
        """Compute the objective value(s) given the output of a solver.

        The arguments are the keys in the dictionary returned
        by ``Solver.get_result``.
        """
        return dict(value=np.linalg.norm(self.X - X_hat))

    def get_one_result(self):
        """Return one solution for which the objective can be evaluated.

        This function is mostly used for testing and debugging purposes.
        """
        return dict(X_hat=1)

2. Datasets#

A dataset defines what can be passed to an objective. More specifically, a dataset should implement one method:

  • get_data(): A method which outputs a dictionary that is passed as keyword arguments **data to the Objective.set_data method of the objective.

A dataset class also needs to inherit from a base class called benchopt.BaseDataset.

Example#

from benchopt import BaseDataset

import numpy as np


class Dataset(BaseDataset):
    # Name of the Dataset, used to select it in the CLI
    name = 'simulated'

    # ``get_data()`` is the only method a dataset should implement.
    def get_data(self):
        """Load the data for this Dataset.

        Usually, the data are either loaded from disk as arrays or Tensors,
        or a dataset/dataloader object is used to allow the models to load
        the data in more flexible forms (e.g. with mini-batches).

        The dictionary's keys are the kwargs passed to ``Objective.set_data``.
        """
        return dict(X=np.random.randn(10, 2))

3. Solvers#

A solver must define three methods:

  • set_objective(**objective_dict): Store information about the data, objective and initialize required quantities. This method is called with the dictionary returned by the method Objective.get_objective.

  • run(stop_condition): Run the actual method to benchmark. This is where the important part of the solver goes. This method takes one parameter controlling the stopping condition of the solver. This is either a number of iterations n_iter, a tolerance parameter tol, or a callback function that will be called at each iteration can be passed. See the note bellow for more information on this parameter.

  • get_result(): Format the output of the method to be evaluated in the Objective. This method returns a dictionary that is passed to Objective.evaluate_result.

Example#

from benchopt import BaseSolver
import numpy as np


class Solver(BaseSolver):
    # Name of the Solver, used to select it in the CLI
    name = 'gd'

    # By default, benchopt will evaluate the result of a method after various
    # number of iterations. Setting the sampling_strategy controls how this is
    # done. Here, we use a callback function that is called at each iteration.
    sampling_strategy = 'callback'

    # Parameters of the method, that will be tested by the benchmark.
    # Each parameter ``param_name`` will be accessible as ``self.param_name``.
    parameters = {'lr': [1e-3, 1e-2]}

    # The three methods below define the necessary methods for the Solver, to
    # get the info from the Objective, to run the method and to return a
    # result that can be evaluated by the Objective.
    def set_objective(self, X):
        """Set the info from a Objective, to run the method.

        This method is also typically used to adapt the solver's parameters to
        the data (e.g. scaling) or to initialize the algorithm.

        The kwargs are the keys of the dictionary returned by
        ``Objective.get_objective``.
        """
        self.X = X
        self.X_hat = np.zeros_like(X)

    def run(self, cb):
        """Run the actual method to benchmark.

        Here, as we use a "callback", we need to call it at each iteration to
        evaluate the result as the procedure progresses.

        The callback implements a stopping mechanism, based on the number of
        iterations, the time and the evoluation of the performances.
        """
        while cb():
            self.X_hat = self.X_hat - self.lr * (self.X_hat - self.X)

    def get_result(self):
        """Format the output of the method to be evaluated in the Objective.

        Returns a dict which is passed to ``Objective.evaluate_result`` method.
        """
        return {'X_hat': self.X_hat}

Note

Sampling strategy:

A solver should also define a sampling_strategy as class attribute. This sampling_strategy can be:

  • 'iteration': in this case the run method of the solver is parametrized by the number of iterations computed. The parameter is called n_iter and should be an integer.

  • 'tolerance': in this case the run method of the solver is parametrized by a tolerance that should decrease with the running time. The parameter is called tol and should be a positive float.

  • 'callback': in this case, the run method of the solver should call at each iteration the provided callback function. It will compute and store the objective and return False once the computations should stop.

  • 'run_once': in this case, the run method of the solver is run only

once during the benchmark.