benchopt.datasets.simulated.make_correlated_data#
- benchopt.datasets.simulated.make_correlated_data(n_samples=100, n_features=50, n_tasks=1, rho=0.6, snr=3.0, w_true=None, density=0.2, X_density=1.0, random_state=None)#
Generate a linear regression with decaying correlation for the design matrix \(\rho^{|i-j|}\).
The data are generated according to:
\[y = X w^* + \epsilon\]such that the signal to noise ratio is \(snr = \frac{||X w^*||}{||\epsilon||}\).
The generated features have mean 0, variance 1 and the expected correlation structure
\[\mathbb E[x_i] = 0~, \quad \mathbb E[x_i^2] = 1 \quad and \quad \mathbb E[x_ix_j] = \rho^{|i-j|}\]- Parameters:
- n_samples: int
Number of samples in the design matrix.
- n_features: int
Number of features in the design matrix.
- n_tasks: int
Number of tasks.
- rho: float
Correlation \(\rho\) between successive features. The cross correlation \(C_{i, j}\) between feature i and feature j will be \(\rho^{|i-j|}\). This parameter should be selected in \([0, 1[\).
- snr: float or np.inf
Signal-to-noise ratio.
- w_true: np.array, shape (n_features,) or (n_features, n_tasks)| None
True regression coefficients. If None, a sparse array with standard Gaussian non zero entries is simulated.
- density: float
Proportion of non zero elements in w_true if the latter is simulated.
- random_state: int | RandomState instance | None (default)
Determines random number generation for data generation. Use an int to make the randomness deterministic.
- X_density: float in ]0, 1]
Proportion of elements of X which are non-zero.
- Returns:
- X: ndarray or CSC matrix, shape (n_samples, n_features)
A design matrix with Toeplitz covariance.
- y: ndarray, shape (n_samples,) or (n_samples, n_tasks)
Observation vector/matrix.
- w_true: ndarray, shape (n_features,) or (n_features, n_tasks)
True regression vector/matrix of the model.