Configure Benchopt#
Benchopt can be configured using setting files. These files can either be created directly or generated and modified using benchopt config.
There are two configuration levels. The first level is the global config for the benchopt client. It contains the system-specific tweaks, the user info such as the <GitHub token>, and the output levels. The second level is the configuration of the benchmarks. Each benchmark can have its own config for the kind of plots it displays by default and other display tweaks.
To get the BenchOpt global config file used by the benchopt command, you can run benchopt config. Using the option --benchmark,-b <benchmark> allows to display the config file for a specific benchmark. See Config File Location for more details on how the config file path is resolved.
The structure of the files follows the Yaml files structure and is described in Config File Structure. The available settings are listed in Benchopt config settings for global settings and Benchmark config settings for benchmark-specific settings.
The value of each setting can be accessed with the CLI using benchopt config [-b <benchmark>] get <name>. Similarly, the setting value can be set using benchopt config [-b <benchmark>] set <name> <value>.
Config File Location#
For the global configuration file, the resolution order is the following:
The environment variable
BENCHOPT_CONFIGis set to an existing file,A file
benchopt.ymlin the current directory,The default file is
$HOME/.config/benchopt.yml.
For benchmark configuration files, they are usually located in the benchmark folder, and named benchopt.yml. If it does not exist, the default is to use the global config file.
Config File Structure#
The config files for benchopt follow the YAML files structure:
debug: true # Activate or not debug logs. Default is false.
conda_cmd: mamba # Command to use to install packages. Default is conda.
github_token: 0...0 # Token used to publish results on benchopt/results
For benchmark settings, they are grouped in a section with the same name as the benchmark. For a benchmark named benchmark_bench, the config structure is:
benchmark_bench:
plots:
- objective_curve
- bar_chart
- boxplot
- table
data_paths:
imagenet: /path/to/imagenet
Note that specific benchmark config can also be set into the config file of the benchmark, located in the benchmark folder. The global config file is used as a fallback if the benchmark config file does not exist.
Benchopt config settings#
This section lists the available global settings for a benchopt.
- benchopt.config.DEFAULT_GLOBAL_CONFIG = {'_bench_config_check': False, '_g_config_check': True, 'cache': None, 'conda_cmd': 'conda', 'data_dir': './data/', 'debug': False, 'default_timeout': 100, 'github_token': None, 'hf_token': None, 'raise_install_error': False, 'shell': 'bash --norc --noprofile', 'warn_nonunique_files': True}#
These are the config options available globally for benchopt, that can be set in the global config file. These options change benchopt’s behavior, in particular for logging, warnings and errors. The available options are:
debug, bool: If set to true, enable debug logs.raise_install_error, boolean: If set to true, raise error when install fails.github_token, str: token to publish results onbenchopt/resultsvia github.hf_token, str: token to publish results on Hugging Face.conda_cmd, str: can be used to give the path tocondaif it is not directly installed on$PATH. This can also be used to usemambato install benchmarks instead of conda. See Using mamba to install packages.shell, str: can be used to specify the shell to use. Default to SHELL from env if it exists and'bash'otherwise.cache, str: can be used to specify where the cache for the benchmarks should be stored. By default, the cache files are stored in the benchmark directory, under the folder __cache__. Setting this configuration would results in having the cache for benchmark B1 stored in ${cache}/B1/.default_timeout, int: default timeout in seconds for the benchmark runs. Default is 100 seconds.warn_nonunique_files, bool: If set to True, raise a warning when a results file is about to be overwritten because a file with the same name already exists. Mostly useful to deactivate this warning in tests.
Using mamba to install packages#
When many packages need to be installed, conda can be slow or even fail to resolve the dependency graph. Using mamba can speed up this process and make it more reliable.
To use mamba instead of conda when installing benchmark requirements, it is necessary to have mamba installed in the base conda environment, e.g. using conda install -n base mamba. Then, benchopt can be configured to use this command instead of conda by either configuring the CLI using benchopt config set conda_cmd mamba or setting the environment variable BENCHOPT_CONDA_CMD=mamba.
Benchmark config settings#
This section lists the available settings for a benchmark configuration.
- benchopt.config.DEFAULT_BENCHMARK_CONFIG = {'data_home': None, 'data_paths': {}, 'hf_repo': None, 'plot_configs': {}, 'plots': None}#
These are the config options available for each benchmark, that can be set in the benchmark’s
config.ymlfile or in the global config file under the benchmark’s name. These options change the behavior of benchopt for one benchmark. It can be used to configure the visualization or the data directory. The available options are:plots, list: Select the plots to display for the benchmark. Should be valid plot kinds. The list can simply be one item by line, with each item indented, as:plots: - objective_curve - suboptimality_curve - relative_suboptimality_curve - bar_chart
plot_configs, dict: list of saved views that can be easily display for the plot. Each view corresponds to a name, with specified values to select one options for the plots. Common options for all plot kinds include:plot_kind,scale,with_quantiles,suboptimal_curve,relative_curve,hidden_curves.Other options are specific to the plot kind and defined as the options in the plot definition, prefixed with the plot kind. Typically, when the plot kind
my-kindincludes an optiondataset, the view should specify itmy-view_dataset: my_dataset.Values that are not specified by the view are left as is when setting the view in the interface. An example of views is:
plot_configs: linear_objective: plot_kind: objective_curve scale: linear view2: plot_kind: boxplot boxplot_objective_column: objective_score_train scale: loglog
These views can be easily created from the interactive HTML page, by hitting the
Save as viewbutton in the plot controls and downloading eiher the new HTML file to save them or the config file in the benchmark’s repo, so that these saved views are embeded in the next plot results automatically.data_home, str: Allows users to define a home path where the functionget_data_path()search data files defined indata_paths. If no paths are provided, the default path used isyour_benchmark/datadata_paths, dict: Allows users to store some data files in custom locations. If you are writing your own benchmark, you can use this options to allow users to use custom location for data files. Otherwise, please refer to the benchmark’s documentation you use to know if you can use this option and which paths you must define.For example, if the benchmark asks you to define a data path, you can proceed as follows in the benchmark’s
config.yamlfile:data_home: path/to/data/home data_paths: my_data: path/to/my/file.npz
In your benchmark’s datasets, you can use the
get_data_path()to retrieve the paths:from benchopt.config import get_data_path path = get_data_path('my_data') # The "path" variable now contains "path/to/data/home/path/to/my/file.npz"
If no paths are provided, the default path used is
{data_home}/my_data. Note that thedata_homecan be set to a custom path, or default toyour_benchmark/data.hf_repo, str: Hugging Face dataset repo id, e.g. ‘my-org/benchopt-results’, used to publish results files on Hugging Face usingbenchopt publish --hub huggingface. If not specified, users can still publish on HF using the--repooption but this allows to set a default repo for the benchmark.
Customising data file paths#
If a benchmark exposes configurable data paths, you can set them in the
benchmark config file using the data_home and data_paths keys.
data_home sets a base directory; each entry in data_paths is
resolved relative to it (default: the benchmark directory).
data_home: /path/to/data_home/folder
data_paths:
the_key: /path/to/the/file.ext
With this config, the benchmark will retrieve the file located at
/path/to/data_home/folder/path/to/the/file.ext.
To know which keys a benchmark exposes, refer to its documentation.
Benchmark makers expose keys by calling benchopt.config.get_data_path()
inside get_data():
from benchopt import config
class Dataset(BaseDataset):
def get_data(self):
path = config.get_data_path(key="the_key_name")
...