Configure Benchopt#

Benchopt can be configured using setting files. These files can either be created directly or generated and modified using benchopt config.

There are two configuration levels. The first level is the global config for the benchopt client. It contains the system-specific tweaks, the user info such as the <GitHub token>, and the output levels. The second level is the configuration of the benchmarks. Each benchmark can have its own config for the kind of plots it displays by default and other display tweaks.

To get the BenchOpt global config file used by the benchopt command, you can run benchopt config. Using the option --benchmark,-b <benchmark> allows to display the config file for a specific benchmark. See Config File Location for more details on how the config file path is resolved.

The structure of the files follows the Yaml files structure and is described in Config File Structure. The available settings are listed in Benchopt config settings for global settings and Benchmark config settings for benchmark-specific settings.

The value of each setting can be accessed with the CLI using benchopt config [-b <benchmark>] get <name>. Similarly, the setting value can be set using benchopt config [-b <benchmark>] set <name> <value>.

Config File Location#

For the global configuration file, the resolution order is the following:

  1. The environment variable BENCHOPT_CONFIG is set to an existing file,

  2. A file benchopt.yml in the current directory,

  3. The default file is $HOME/.config/benchopt.yml.

For benchmark configuration files, they are usually located in the benchmark folder, and named benchopt.yml. If it does not exist, the default is to use the global config file.

Config File Structure#

The config files for benchopt follow the YAML files structure:

debug: true  # Activate or not debug logs. Default is false.
conda_cmd: mamba  # Command to use to install packages. Default is conda.
github_token: 0...0  # Token used to publish results on benchopt/results

For benchmark settings, they are grouped in a section with the same name as the benchmark. For a benchmark named benchmark_bench, the config structure is:

benchmark_bench:
    plots:
        - objective_curve
        - bar_chart
        - boxplot
        - table
    data_paths:
        imagenet: /path/to/imagenet

Note that specific benchmark config can also be set into the config file of the benchmark, located in the benchmark folder. The global config file is used as a fallback if the benchmark config file does not exist.

Benchopt config settings#

This section lists the available global settings for a benchopt.

benchopt.config.DEFAULT_GLOBAL_CONFIG = {'_bench_config_check': False, '_g_config_check': True, 'cache': None, 'conda_cmd': 'conda', 'data_dir': './data/', 'debug': False, 'default_timeout': 100, 'github_token': None, 'hf_token': None, 'raise_install_error': False, 'shell': 'bash --norc --noprofile', 'warn_nonunique_files': True}#

These are the config options available globally for benchopt, that can be set in the global config file. These options change benchopt’s behavior, in particular for logging, warnings and errors. The available options are:

  • debug, bool: If set to true, enable debug logs.

  • raise_install_error, boolean: If set to true, raise error when install fails.

  • github_token, str: token to publish results on benchopt/results via github.

  • hf_token, str: token to publish results on Hugging Face.

  • conda_cmd, str: can be used to give the path to conda if it is not directly installed on $PATH. This can also be used to use mamba to install benchmarks instead of conda. See Using mamba to install packages.

  • shell, str: can be used to specify the shell to use. Default to SHELL from env if it exists and 'bash' otherwise.

  • cache, str: can be used to specify where the cache for the benchmarks should be stored. By default, the cache files are stored in the benchmark directory, under the folder __cache__. Setting this configuration would results in having the cache for benchmark B1 stored in ${cache}/B1/.

  • default_timeout, int: default timeout in seconds for the benchmark runs. Default is 100 seconds.

  • warn_nonunique_files, bool: If set to True, raise a warning when a results file is about to be overwritten because a file with the same name already exists. Mostly useful to deactivate this warning in tests.

Using mamba to install packages#

When many packages need to be installed, conda can be slow or even fail to resolve the dependency graph. Using mamba can speed up this process and make it more reliable.

To use mamba instead of conda when installing benchmark requirements, it is necessary to have mamba installed in the base conda environment, e.g. using conda install -n base mamba. Then, benchopt can be configured to use this command instead of conda by either configuring the CLI using benchopt config set conda_cmd mamba or setting the environment variable BENCHOPT_CONDA_CMD=mamba.

Benchmark config settings#

This section lists the available settings for a benchmark configuration.

benchopt.config.DEFAULT_BENCHMARK_CONFIG = {'data_home': None, 'data_paths': {}, 'hf_repo': None, 'plot_configs': {}, 'plots': None}#

These are the config options available for each benchmark, that can be set in the benchmark’s config.yml file or in the global config file under the benchmark’s name. These options change the behavior of benchopt for one benchmark. It can be used to configure the visualization or the data directory. The available options are:

  • plots, list: Select the plots to display for the benchmark. Should be valid plot kinds. The list can simply be one item by line, with each item indented, as:

    plots:
    - objective_curve
    - suboptimality_curve
    - relative_suboptimality_curve
    - bar_chart
    
  • plot_configs, dict: list of saved views that can be easily display for the plot. Each view corresponds to a name, with specified values to select one options for the plots. Common options for all plot kinds include:

    plot_kind, scale, with_quantiles, suboptimal_curve, relative_curve, hidden_curves.

    Other options are specific to the plot kind and defined as the options in the plot definition, prefixed with the plot kind. Typically, when the plot kind my-kind includes an option dataset, the view should specify it my-view_dataset: my_dataset.

    Values that are not specified by the view are left as is when setting the view in the interface. An example of views is:

    plot_configs:
      linear_objective:
          plot_kind: objective_curve
          scale: linear
      view2:
          plot_kind: boxplot
          boxplot_objective_column: objective_score_train
          scale: loglog
    

    These views can be easily created from the interactive HTML page, by hitting the Save as view button in the plot controls and downloading eiher the new HTML file to save them or the config file in the benchmark’s repo, so that these saved views are embeded in the next plot results automatically.

  • data_home, str: Allows users to define a home path where the function get_data_path() search data files defined in data_paths. If no paths are provided, the default path used is your_benchmark/data

  • data_paths, dict: Allows users to store some data files in custom locations. If you are writing your own benchmark, you can use this options to allow users to use custom location for data files. Otherwise, please refer to the benchmark’s documentation you use to know if you can use this option and which paths you must define.

    For example, if the benchmark asks you to define a data path, you can proceed as follows in the benchmark’s config.yaml file:

    data_home: path/to/data/home
    
    data_paths:
        my_data: path/to/my/file.npz
    

    In your benchmark’s datasets, you can use the get_data_path() to retrieve the paths:

    from benchopt.config import get_data_path
    
    path = get_data_path('my_data')
    
    # The "path" variable now contains "path/to/data/home/path/to/my/file.npz"
    

    If no paths are provided, the default path used is {data_home}/my_data. Note that the data_home can be set to a custom path, or default to your_benchmark/data.

  • hf_repo, str: Hugging Face dataset repo id, e.g. ‘my-org/benchopt-results’, used to publish results files on Hugging Face using benchopt publish --hub huggingface. If not specified, users can still publish on HF using the --repo option but this allows to set a default repo for the benchmark.

Customising data file paths#

If a benchmark exposes configurable data paths, you can set them in the benchmark config file using the data_home and data_paths keys. data_home sets a base directory; each entry in data_paths is resolved relative to it (default: the benchmark directory).

data_home: /path/to/data_home/folder
data_paths:
    the_key: /path/to/the/file.ext

With this config, the benchmark will retrieve the file located at /path/to/data_home/folder/path/to/the/file.ext.

To know which keys a benchmark exposes, refer to its documentation.

Benchmark makers expose keys by calling benchopt.config.get_data_path() inside get_data():

from benchopt import config

class Dataset(BaseDataset):

    def get_data(self):
        path = config.get_data_path(key="the_key_name")
        ...