PCE Regression Class#

class tesuract.PCEReg(order=2, customM=None, mindex_type='total_order', coef=None, a=None, b=None, polytype='Legendre', fit_type='linear', fit_params={}, normalized=False, store_phi=False, input_range=None, use_sklearn_poly_features=False)[source]#

Class for performing multivariate polynomial regression

This class fits the coefficients of a a multivariate polynomial object, aka as a polynomial chaos model, using different linear regression algorithms from sklearn. Given labeled data pairs \((x_j,y_j)\) for \(j=1,\dots,n\), where \(x_j \in \mathbb{R}^d\) and \(y_j \in \mathbb{R}\), we look for

\[\text{arg} \min_{c} \sum_{j}d\left(f(\mathbf{x}_j;c),y_j\right) + \text{Regularizer}(c)\]

where \(f\) is the polynomial model with unknown coefficient parameters, i.e.,

\[f(\mathbf{x}_j;c) \doteq \sum_{i=1}^N c_i \Phi_i(\mathbf{x}_j)\]

where \(d\) is the error metric, which is typically a squared error, and the regularizer can be either \(\ell_1\) (lasso), \(\ell_2\) (ridge), or both (elastic net). Again, we note that the input must be on \([-1,1]\) for now since the polynomials are setup on that domain by default. One may use sklearn’s preprocessing utilities to make that transformation or use the DomainScaler class that comes with this library.

Below we list the parameters to construct the object followed by the class attributes and returns.

Parameters

orderint, default=2: Description of the order of the polynomials in the expansion. For total order, the order is the maximum polynomial order for each basis function per dimension.
customMnumpy.ndarray, default=None: An integer numpy aray of size (# of basis functions, # of dimensions). Each row represents the basis function and each column represents the order of the 1d polynomial, \(L_i\).
mindex_type{‘total_order’, ‘hyperbolic’}, default=’total_order’: Different types of multiindex generators. Total order produces basis vectors that have a maximum order as defined by order. Hyperbolic order is similar but generates fewer cross terms.
coefnumpy.ndarray, default=None: 1d numpy coefficient array if defining polynomial. Must be the same length as the number of basis elements, i.e. length of multiindex array.
polytype{‘Legendre’}, default=’Legendre’: A string representing the type of polynomial. So far we only include Legendre polynomial construction defined on [-1,1]
fit_type{‘linear’,’LassoCV’,’ElasticNetCV’,’OmpCV’,’quadrature’}, default=’linear’: A string defining the algorithm to solve the linear regression problem. All but the quadrature option utilizes sklearn’s linear regression algorithms. In order to use the quadrature routine, you must define the ‘w’ variable in the fit_params dictionary.
fit_paramsdefault={}: Dictionary to be passed to the particular fit type algorithm chosen above. See sklearn’s documentation for parameters. This dictionary will be passed as a **kwargs type input for the fit algorithm.
normalizedbool, default=False: Whether or not to use normalized polynomials such that \(\int_{-1}^{1}L_i(x)dx = 1\) or \(\frac{2}{2n+1}\) if True or False, respecively, where \(n\) is the order of the polynomial (only for Legendre polynomials).

Attributes

dimint: Dimension of the polynomial, defined after construction.
mindexndarray of shape (nbasis, dim): Integer array of the multiindex which describes the structure of the polynomial basis.
nPCTermsint: The number of basis terms of the multi-variate polynomial.
feature_importances_ndarray of shape (dim,): Sobol sensitivity indices for each dimension. This is computed after the fit function is called.
coefndarray of shape (nbasis,): coefficient array of polynomial function. It can be fed into the constructor, but for most cases it will be computed after self.fit is called.

Notes

As of now, the base class requires the input range to be on the default range \([-1,1]\) for all dimensions. We have included a useful preprocessing utility (DomainScaler) to transform the domain easily to the canonical target range from any input range. In the future this will be an option in the this class.

Examples

>>> from tesuract import PCEReg
>>> p = PCEReg(order=3)
>>> p.fit(X,y)

Methods

feature_importances:

compute Sobol total order indices, normalized to sum to 1

_compile(X)[source]#

Builds the multiindex using the PCEBuilder class. Private.

Parameters

Xnumpy array of size (nsamples, dim): data matrix in feature space.

Returns

selfself: returns object

_quad_fit(X, y)[source]#

Fit the coefficients of the polynomial model using quadrature points and weights

It is expected that the data matrix X represent the quadrature point and the weights are given in the fit_params dictionary with the key name ‘w’.

Parameters

Xnumpy.ndarray of size (nsamples, dim): Data matrix where each row is the first part of the data pairs (x,y)_i. X values must be between (-1,1), for now.
ynumpy.ndarray of size (nsamples,): 1d array of the data labels. Must be the same size as the number of rows of X.

fit(X, y)[source]#

Fit the polynomial using linear regression or quadrature solvers

The algorithm is determined by the fit_type option in the initialization and the options in fit_params.

Parameters

Xnumpy.ndarray of shape (nsamples, dim): data matrix feature space samples. Must be in [-1,1]
ynumpy.ndarray of shape (nsamples,): data labels.

Returns

selfself object: sets the internal coefficient array self.coef_

feature_importances()[source]#

Compute feature importances which are equivalent to the normalized Sobol total order sensitivity indices.

Parameters

Returns

feature_importances_numpy.ndarray of shape (dim,): array representing normalized Sobol total order indices.

joint_effects()[source]#

Compute Sobol joint effect sensitivity indices.

Parameters

Returns

joint_effects_numpy.ndarray of shape (dim,dim): the lower triangular part of the array contains the joint effect sensitivity indices.

predict(X)[source]#

After fitting, evaluates the polynomial for a single feature space sample or array of samples.

Parameters

Xnumpy.ndarray of shape (nsamples, dim): Samples to evaluate the fit polynomial. Must have self.coef_ set or defined already. Can take 1d or 2d array of samples.

Returns

ynumpy.ndarray of shape (nsamples,): Returns the scalar array of polynomial evaluations.

transform(X)#

Transform the given \(X\) coordinates to the polynomial space.

This method essentially performs a high dimensional kernel mapping onto a polynomial space spanned by the Legendre polynomials. This is similar to sklearn’s PolynomialFeatures, except here the features are multi-variate Legendre; the big difference being that the features are uncorrelated.

Alternatively, the resulting

Parameters

Xnumpy.ndarray of shape (nsamples, dim): feature matrix where each row is a sample of the feature space.

Returns

Phinumpy.ndarray of shape (nsamples, nPCTerms): returns the polynomial feature map that transforms each sample in \(X\) of dimension \(d\) to dimension \(nPCTerms\).

tesuract 0.2.0 documentation

PCE Regression Class

PCE Regression Class#