benchopt.datasets.simulated.make_correlated_data

benchopt.datasets.simulated.make_correlated_data(n_samples=100, n_features=50, n_tasks=1, rho=0.6, snr=3.0, w_true=None, density=0.2, X_density=1.0, random_state=None)

Generate a linear regression with decaying correlation for the design matrix \(\rho^{|i-j|}\).

The data are generated according to:

\[y = X w^* + \epsilon\]

such that the signal to noise ratio is \(snr = \frac{||X w^*||}{||\epsilon||}\).

The generated features have mean 0, variance 1 and the expected correlation structure

\[\mathbb E[x_i] = 0~, \quad \mathbb E[x_i^2] = 1 \quad and \quad \mathbb E[x_ix_j] = \rho^{|i-j|}\]
Parameters
n_samples: int

Number of samples in the design matrix.

n_features: int

Number of features in the design matrix.

n_tasks: int

Number of tasks.

rho: float

Correlation \(\rho\) between successive features. The cross correlation \(C_{i, j}\) between feature i and feature j will be \(\rho^{|i-j|}\). This parameter should be selected in \([0, 1[\).

snr: float or np.inf

Signal-to-noise ratio.

w_true: np.array, shape (n_features,) or (n_features, n_tasks)| None

True regression coefficients. If None, a sparse array with standard Gaussian non zero entries is simulated.

density: float

Proportion of non zero elements in w_true if the latter is simulated.

random_state: int | RandomState instance | None (default)

Determines random number generation for data generation. Use an int to make the randomness deterministic.

X_density: float in ]0, 1]

Proportion of elements of X which are non-zero.

Returns
X: ndarray or CSC matrix, shape (n_samples, n_features)

A design matrix with Toeplitz covariance.

y: ndarray, shape (n_samples,) or (n_samples, n_tasks)

Observation vector/matrix.

w_true: ndarray, shape (n_features,) or (n_features, n_tasks)

True regression vector/matrix of the model.