Evaluating an iterative method#

Note

For ML benchmarks without iterative evaluation: If you are benchmarking machine learning methods where each solver runs once to completion (e.g., comparing scikit-learn estimators), you can skip performance curves entirely by setting sampling_strategy = "run_once" in your Solver class. See Setting up an ML benchmark for more details on setting up ML benchmarks.

  class Solver(BaseSolver):
      sampling_strategy = "run_once"

      def run(self, _):
          # Your solver code here
          pass

Note that this option can also be set globally for all solvers by setting
the ``Objective.sampling_strategy`` class attribute to ``"run_once"`` in
your ``Objective`` class.

Performance curves are primarily useful for iterative optimization methods, where you want to track how the objective value evolves as computational budget increases (e.g., number of iterations, tolerance, or CPU time).

The performance of each solver is evaluated for various computational budgets. With “performance curves”, we refer to the evolution of the performance as a function of the computational budget.

With benchopt, there are different ways to vary the computational budget to sample the performance curve. They are chosen by the sampling_strategy attribute of each solver, or if specified, the strategy parameter of the stopping_criterion of the solver.

When varying the computational budget, there are often the need to be able to stop the solver when it has converged, to avoid wasting resources on already converged methods. This is handled by the stopping_criterion attribute of each solver, which is described in the last section of this document When are the solvers stopped?.

Sampling strategies#

0. Running once (no performance curve)#

For benchmarks where solvers run to completion without intermediate evaluation (typical in ML benchmarks), use sampling_strategy = "run_once". The solver’s run method is called exactly once, and no convergence curve is generated. This is ideal for comparing final performance of different methods.

See Setting up an ML benchmark for a complete guide on setting up ML benchmarks.

1. Using iterations or tolerance#

The first way is to use Solver.sampling_strategy = "iteration" or Solver.sampling_strategy = "tolerance". This is used for black-box solvers, where one can only get the result of the solver for a given number of iterations or for a given numerical tolerance. This sampling strategy creates curves by calling Solver.run(stop_val) several times with different values for the stop_val parameter:

  • if the solver’s sampling_strategy is "iteration", stop_val is the number of iterations passed to run. It increases geometrically by at least 1, starting from 1 with a factor \(\rho=1.5\). Note that the first call uses a number of iterations of 0. The value from one call to the other follows:

    \[\text{stop_val} = \max(\text{stop_val} + 1, \text{int}(\rho * \text{stop_val}))\]
  • if the solver’s sampling_strategy is "tolerance", the stop_val parameter corresponds to the numerical tolerance. It decreases geometrically by a factor \(\rho=1.5\) between each call to run, starting from 1 at the second call. Note that the first call uses a tolerance of 1e38. The value from one call to the other follows:

    \[\text{stop_val} = \min(1, \max(\text{stop_val} / \rho, 10^{-15}))\]

In both cases, if the objective curve is flat (i.e., the variation of the objective between two points is numerically 0), the geometric rate \(\rho\) is multiplied by 1.2.

Note that the solver is restarted from scratch at each call to solver.run. For more advanced configurations, the evolution of stop_val can be controlled on a per solver basis, by implementing a Solver.get_next method, which receives the current value for tolerance/number of iterations, and returns the next one.

Note that the formula to compute the next stop_val can be configured on a per-solver basis, as described in: 3. Changing the strategy to grow the computational budget (stop_val).

2. Using a callback#

Restarting the solver from scratch, though inevitable to handle black box solvers, can be costly.

When a solver exposes the intermediate values of the iterates, it is possible to create the curve in a single solver run, by using sampling_strategy = "callback". In that case, the argument passed to Solver.run will be a callable object, callback. This callback is created and managed by benchopt. Like with sampling_strategy == "iteration", the objective is computed after a number of callback’s calls that grows geometrically. If the objective was computed after \(n\) calls, the objective and timing will be computed again when reaching \(\max(n+1, \rho * n)\) calls to the callback. The callback makes sure we do not account for objective computation time and also check for convergence every time the objective is computed (as described in the next section). It returns False when the solver should be stopped. A classical usage pattern is:

def run(self, callback):
    self.x = ... # Initialize iterate

    while callback():
        self.x = ...  # Update iterate

def get_result(self):
    return {'x': self.x}

3. Changing the strategy to grow the computational budget (stop_val)#

The way benchopt varies the computational budget stop_val can be configured on a per-solver basis. This is done by implementing a get_next method in the Solver class. This method takes as input the previous value where the objective function has been logged, and outputs the next one. For instance, if a solver needs to be evaluated every 10 iterations, we would have

class Solver(BaseSolver):
    ...
    def get_next(self, stop_val):
        return stop_val + 10

This example allows to set a linear growth for the solver computational budget, instead of the default geometric growth.

When are the solvers stopped?#

For each of the sampling strategies above, the solvers continue running (i.e. the callback returns True, the number of iterations/tolerance passed to Solver.run increases/decreases) until the StoppingCriterion.should_stop() associated to the solver Solver.stopping_criterion returns True.

This method takes into account the maximal number of runs given as --max-runs, the timeout given by --timeout and also tries to stop the solver if it has converged. The convergence of a solver is determined by the StoppingCriterion.check_convergence() method, based on the objective curve so far. There are four StoppingCriterion implemented in benchopt:

  • SingleRunCriterion(stop_val) only calls the solver once with the given stop_val. This criterion is designed for methods that converge to a given value, when one aims to benchmark final performance of multiple solvers.

  • NoCriterion() runs the solver for a fixed number of steps, given by the --max-runs argument. This criterion deactivate the checks for convergence.

  • SufficientDescentCriterion(eps, patience, key_to_monitor) considers that the solver has converged when the relative decrease of the objective was less than a tolerance eps for more than patience calls to check_convergence. The key_to_monitor is the key of the objective dictionary to monitor. By default, it is set to value.

  • SufficientProgressCriterion(eps, patience, key_to_monitor) considers that the solver has converged when the objective has not decreased by more than a tolerance eps for more than patience calls to check_convergence. The key_to_monitor is the key of the objective dictionary to monitor. By default, it is set to value.

Setting the sampling strategy and stopping criterion globally#

Both the sampling strategy and stopping criterion can be set globally for all solvers by setting the corresponding class attributes in your Objective class:

class Objective(BaseObjective):
    sampling_strategy = "callback"  # or "iteration", "tolerance", "run_once"
    stopping_criterion = SufficientProgressCriterion(
        eps=1e-5, patience=5, key_to_monitor="value"
    )

This allows to avoid setting these attributes in each solver when the same strategy and criterion should be used for most solvers.