NcmStatsDist

NcmStatsDist — Abstract class for implementing N-dimensional probability distributions.

Functions

Properties

NcmStatsDistCV CV-type Read / Write / Construct
guint N Read
NcmStatsDistKernel * kernel Read / Write / Construct Only
double over-smooth Read / Write / Construct
gboolean print-fit Read / Write / Construct
double split-frac Read / Write / Construct
gboolean use-threads Read / Write / Construct

Types and Values

Object Hierarchy

    GEnum
    ╰── NcmStatsDistCV
    GObject
    ╰── NcmStatsDist
        ╰── NcmStatsDistKDE

Description

Abstract class to reconstruct an arbitrary N-dimensional probability distribution. This class provides the tools to perform a radial basis interpolation in a multidimensional function using a radial basis function and then generates a new sample using the interpolation function as the kernel. This method generates a sample that is distributed by the original distribution, but in a more simple way since the used kernels are easier to sample from. For more information about radial basis interpolation, check [Radial Basis Function Interpolation, Wilna du Toit]. A brief description of the radial basis interpolation method can be found below.

Given a d-simensional function $g(x): \mathbf{R}^d \rightarrow \mathbf{R}$, a radial basis function $\phi(x, \Sigma)$ is used such that \begin{align} \label{Interpolation_eq} s(x) = \sum_i^n \lambda_i \phi(|x-x_i|, \Sigma_i), \quad x~ \in~ \mathbf{R} . \end{align} The variables $\lambda_i$ represent the weights and are found such that \begin{align} \label{eqnnls1} s(x_i) = g(x_i) , \end{align} being $x_i$ the sample points. The values generated by $\phi(|x-x_i|, \Sigma_i)$ are displayed in a symmetric $n \times n$ matrix $\Phi$. This function depends on the norm of the points and on the covariance matrix $\Sigma$ associated with each point. The weights $\lambda_i$ are also organised in a matrix representation such that equation \eqref{eqnnls1} becomes \begin{align} \label{eqnnls} G = \lambda \times \Phi ,\end{align} where $G$ is a matrix containing all the function values $g(xi)$. Once the Lambda matrix is found, one may use $s(x)$ to sample values from $g(x)$, which is easier to do since $s(x)$ is a polynomial function.

We want $s(x)$ to be a probability distribution so we can sample from it. Therefore the Lambda matrix containing the weights is seen as the probability density and it must be minimized such that its values are always positive and sum up to one. To solve equation this problem, this algorithm has the tools to solve equation \eqref{eqnnls} for $\lambda$, which is a least-squares problem, using the NNLS method, which can be found in nnls.c file. Thus, the algorithm can randomly choose a kernel $\phi(|x-x_i|, \Sigma_i)$ associated to a probability contained in $\lambda$ and sample a point from it.

In this object, the radial basis interpolation function is not completely defined. One must choose one of the instances of the class, the NcmStatsDistKernelST object or the NcmStatsDistKernelGauss object, which uses a multivariate Student's t function and a Gaussian function as the kernel. After initializing the desired object for the interpolation function, one may use the methods of this file to generate the interpolation and to sample from the new interpolated function.

The user must provide the input the values: over_smooth - ncm_stats_dist_set_over_smooth(), split_frac - ncm_stats_dist_set_split_frac(), over_smooth - ncm_stats_dist_set_over_smooth(), $v(x)$ - ncm_stats_dist_prepare_interp(). The other parameters must be inserted when the instance for the NcmStatsDistKDE or the NcmStatsDistVKDE object is initialized. To perform a calculation of this class, one needs to initialize the class within one of its subclasses (NcmStatsDistKernelGauss or NcmStatsDistKernelST), along with the input of a child object of the class NcmStatsDistKernel. For more information about the algorithm, see the description below.

-Since this class does not define what type of kernel will be used in the calculation (the fixed kernel in the NcmStatsDistKDE class or the variable kernel in NcmStatsDistVKDE class), one cannot compute the sample just using this instance. Also, it must be provided the function to be used as the kernel, which is implemented in the children from the class NcmStatsDistKernel. When initializing the NcmStatsDistKDE or NcmStatsDistVKDE classes, the function to be used as the kernel is defined in the object initialization function.

-This class also needs a child object to compute the interpolation matrix $IM$ and the covariance matrices stored in cov_decomp to perform the interpolation, which is kernel dependent and therefore also computed by the class child objects.

-Regarding the kernel types based on the radial basis function, $\phi(|x-x_i|)$, and how the sample points in ncm_stats_dist_sample() are generated, see the different implementations of NcmStatsDistKernel, e.g., NcmStatsDistKernelGauss and NcmStatsDistKernelST

-Regarding how the functions ncm_stats_dist_eval() and ncm_stats_dist_eval_m2lnp() are implemented, see the different implementations of NcmStatsDist, i.e., NcmStatsDistKDE and NcmStatsDistVKDE. These objects also compute the covariance matrix of each sample point and other objects needed for the least-squares problem, when computing the weights matrix ($\lambda$).

Functions

ncm_stats_dist_ref ()

NcmStatsDist *
ncm_stats_dist_ref (NcmStatsDist *sd);

Increases the reference count of sd .

Parameters

sd

a NcmStatsDist

 

Returns

sd .

[transfer full]


ncm_stats_dist_free ()

void
ncm_stats_dist_free (NcmStatsDist *sd);

Decreases the reference count of sd .

Parameters

sd

a NcmStatsDist

 

ncm_stats_dist_clear ()

void
ncm_stats_dist_clear (NcmStatsDist **sd);

Decreases the reference count of *sd and sets the pointer *sd to NULL.

Parameters

sd

a NcmStatsDist

 

ncm_stats_dist_set_kernel ()

void
ncm_stats_dist_set_kernel (NcmStatsDist *sd,
                           NcmStatsDistKernel *sdk);

Sets the kernel to be used in the interpolation. The different types of kernels are: the gaussian kernel and the studentt kernel, which are under the file names ncm_stats_dist_kernel_gauss.c and ncm_stats_dist_kernel_st.c.

Parameters

sd

a NcmStatsDist

 

sdk

a NcmStatsDistKernel

 

ncm_stats_dist_peek_kernel ()

NcmStatsDistKernel *
ncm_stats_dist_peek_kernel (NcmStatsDist *sd);

Gets the kernel to be used in the interpolation.

Parameters

sd

a NcmStatsDist

 

Returns

current NcmStatsDistKernel used.

[transfer none]


ncm_stats_dist_get_kernel ()

NcmStatsDistKernel *
ncm_stats_dist_get_kernel (NcmStatsDist *sd);

Gets the kernel to be used in the interpolation.

Parameters

sd

a NcmStatsDist

 

Returns

current NcmStatsDistKernel used.

[transfer full]


ncm_stats_dist_get_dim ()

guint
ncm_stats_dist_get_dim (NcmStatsDist *sd);

Parameters

sd

a NcmStatsDist

 

Returns

an int d, the dimension of the sample space, which is the same dimension of the used kernel.


ncm_stats_dist_get_sample_size ()

guint
ncm_stats_dist_get_sample_size (NcmStatsDist *sd);

After the prepare call, this function returns the size of the sample used in the interpolation.

Parameters

sd

a NcmStatsDist

 

Returns

the size of the sample used.


ncm_stats_dist_get_n_kernels ()

guint
ncm_stats_dist_get_n_kernels (NcmStatsDist *sd);

After the prepare call, this function returns the number of kernels used in the interpolation.

Parameters

sd

a NcmStatsDist

 

Returns

the number of kernels used.


ncm_stats_dist_get_href ()

gdouble
ncm_stats_dist_get_href (NcmStatsDist *sd);

Parameters

sd

a NcmStatsDist

 

Returns

a double h, the currently used href . If the object was prepared with the VKDE class, the VKDE method is called. The href value is computed by the kernel object that was called in the set kernel function.


ncm_stats_dist_set_over_smooth ()

void
ncm_stats_dist_set_over_smooth (NcmStatsDist *sd,
                                const gdouble over_smooth);

Sets the over-smooth factor to over_smooth .

Parameters

sd

a NcmStatsDist

 

over_smooth

the over-smooth factor

 

ncm_stats_dist_get_over_smooth ()

gdouble
ncm_stats_dist_get_over_smooth (NcmStatsDist *sd);

Parameters

sd

a NcmStatsDist

 

Returns

a double os, the over-smooth factor.


ncm_stats_dist_set_split_frac ()

void
ncm_stats_dist_set_split_frac (NcmStatsDist *sd,
                               const gdouble split_frac);

Sets cross-correlation split fraction to split_frac . This method shall be used when the cv_type is the cv_split. The split fraction determines the fraction of sample points that will be left out to use the cross validation method.

Parameters

sd

a NcmStatsDist

 

split_frac

the over-smooth factor

 

ncm_stats_dist_get_split_frac ()

gdouble
ncm_stats_dist_get_split_frac (NcmStatsDist *sd);

Parameters

sd

a NcmStatsDist

 

Returns

a double split_frac , the cross-correlation split fraction.


ncm_stats_dist_set_print_fit ()

void
ncm_stats_dist_set_print_fit (NcmStatsDist *sd,
                              const gboolean print_fit);

Whether to print steps during the fitting process.

Parameters

sd

a NcmStatsDist

 

print_fit

a boolean

 

ncm_stats_dist_get_print_fit ()

gboolean
ncm_stats_dist_get_print_fit (NcmStatsDist *sd);

Parameters

sd

a NcmStatsDist

 

Returns

Whether it is going to print steps during the fitting process.


ncm_stats_dist_set_cv_type ()

void
ncm_stats_dist_set_cv_type (NcmStatsDist *sd,
                            const NcmStatsDistCV cv_type);

Sets the cross-validation method to cv_type . If the selected method is none, all the sample points will be used to compute the interpolation. If the cv_type is the cv_split, a split fraction of the points are randomly excluded and the interpolation is computed to a best fit of the remaining sample points, which leads to a more point independent interpolation.

Parameters

sd

a NcmStatsDist

 

cv_type

a NcmStatsDistCV

 

ncm_stats_dist_get_cv_type ()

NcmStatsDistCV
ncm_stats_dist_get_cv_type (NcmStatsDist *sd);

Parameters

sd

a NcmStatsDist

 

Returns

a string cv_type , current cross-validation method used.


ncm_stats_dist_set_use_threads ()

void
ncm_stats_dist_set_use_threads (NcmStatsDist *sd,
                                const gboolean use_threads);

Sets whether to use OpenMP threads during the computation.

Parameters

sd

a NcmStatsDist

 

use_threads

whether to use threads

 

ncm_stats_dist_get_use_threads ()

gboolean
ncm_stats_dist_get_use_threads (NcmStatsDist *sd);

Parameters

sd

a NcmStatsDist

 

Returns

whether to use OpenMP threads during the computation.


ncm_stats_dist_prepare_kernel ()

void
ncm_stats_dist_prepare_kernel (NcmStatsDist *sd,
                               GPtrArray *sample_array);

Prepares the object for computations of the individuals kernels and is usually part of ncm_stats_dist_prepare() and is should not be called directly.

This virtual method does not have a default implementation and must be defined by the descendants.

[virtual prepare_kernel]

Parameters

sd

a NcmStatsDist

 

sample_array

an array of NcmVector.

[element-type NcmVector]

ncm_stats_dist_prepare ()

void
ncm_stats_dist_prepare (NcmStatsDist *sd);

Prepares the object for calculations. This function prepares the weight matrix and sets all the weights to 1.0/sample size. It also calls the kernel_prepare function, implemented by a child, and calls the get_href function.

[virtual prepare]

Parameters

sd

a NcmStatsDist

 

ncm_stats_dist_prepare_interp ()

void
ncm_stats_dist_prepare_interp (NcmStatsDist *sd,
                               NcmVector *m2lnp);

Prepares the object for calculations. Using the distribution values at the sample points. This function calls the prepare function and prepares the needed objects to compute the least squares problem. The interpolation matrix IM is prepered by a child object and called in this function. Then, depending on the cross validation method, the function solves the least squares problem using the ncm_nnls object.

[virtual prepare_interp]

Parameters

sd

a NcmStatsDist

 

m2lnp

a NcmVector containing the distribution values that will be used to compute the interpolation function.

 

ncm_stats_dist_eval ()

gdouble
ncm_stats_dist_eval (NcmStatsDist *sd,
                     NcmVector *x);

Evaluate the distribution at $\vec{x}=$x . The method ncm_stats_dist_eval_m2lnp() can be used to avoid underflow.

Parameters

sd

a NcmStatsDist

 

x

a NcmVector

 

Returns

$P(\vec{x})$.


ncm_stats_dist_eval_m2lnp ()

gdouble
ncm_stats_dist_eval_m2lnp (NcmStatsDist *sd,
                           NcmVector *x);

Evaluate the distribution at $\vec{x}=$x . This method is more stable than ncm_stats_dist_eval() since it avoids underflows and overflows.

Parameters

sd

a NcmStatsDist

 

x

a NcmVector

 

Returns

$P(\vec{x})$.


ncm_stats_dist_kernel_choose ()

guint
ncm_stats_dist_kernel_choose (NcmStatsDist *sd,
                              NcmRNG *rng);

Using the pseudo-random number generator rng chooses a random kernel based on the computed weights.

Parameters

sd

a NcmStatsDist

 

rng

a NcmRNG

 

ncm_stats_dist_sample ()

void
ncm_stats_dist_sample (NcmStatsDist *sd,
                       NcmVector *x,
                       NcmRNG *rng);

Using the pseudo-random number generator rng generates a point from the distribution and copy it to x .

Parameters

sd

a NcmStatsDist

 

x

a NcmVector

 

rng

a NcmRNG

 

ncm_stats_dist_get_rnorm ()

gdouble
ncm_stats_dist_get_rnorm (NcmStatsDist *sd);

Gets the value of the last $\chi^2$ fit obtained when computing the interpolation through ncm_stats_dist_prepare_interp().

Parameters

sd

a NcmStatsDist

 

Returns

a double, the value of the $\chi^2$.


ncm_stats_dist_add_obs ()

void
ncm_stats_dist_add_obs (NcmStatsDist *sd,
                        NcmVector *y);

Adds a new point y to the sample with weight 1.0. This function must be called to insert an initial sample into the object, so the interpolation can be computed.

Parameters

sd

a NcmStatsDist

 

y

a NcmVector

 

ncm_stats_dist_peek_sample_array ()

GPtrArray *
ncm_stats_dist_peek_sample_array (NcmStatsDist *sd);

Parameters

sd

a NcmStatsDist

 

Returns

current sample array.

[transfer none][element-type NcmVector]


ncm_stats_dist_peek_cov_decomp ()

NcmMatrix *
ncm_stats_dist_peek_cov_decomp (NcmStatsDist *sd,
                                guint i);

Gets the covariance matrix associated with the i -th kernel.

[virtual peek_cov_decomp]

Parameters

sd

a NcmStatsDist

 

i

kernel index

 

Returns

Cholesky decomposition of the i -th covariance matrix.

[transfer none]


ncm_stats_dist_get_lnnorm ()

gdouble
ncm_stats_dist_get_lnnorm (NcmStatsDist *sd,
                           guint i);

Gets the logarithm of the i -th kernel normalization.

[virtual get_lnnorm]

Parameters

sd

a NcmStatsDist

 

i

kernel index

 

Returns

$\ln (N_i)$.


ncm_stats_dist_peek_weights ()

NcmVector *
ncm_stats_dist_peek_weights (NcmStatsDist *sd);

Parameters

sd

a NcmStatsDist

 

Returns

current kernel weights vector.

[transfer none]


ncm_stats_dist_get_Ki ()

void
ncm_stats_dist_get_Ki (NcmStatsDist *sd,
                       const guint i,
                       NcmVector **y_i,
                       NcmMatrix **cov_i,
                       gdouble *n_i,
                       gdouble *w_i);

Return all information about the i -th kernel.

Parameters

sd

a NcmStatsDist

 

i

kernel index

 

y_i

kernel location.

[out callee-allocates][transfer full]

cov_i

kernel covariance U.

[out callee-allocates][transfer full]

n_i

kernel normalization.

[out]

w_i

kernel weight.

[out]

ncm_stats_dist_reset ()

void
ncm_stats_dist_reset (NcmStatsDist *sd);

Reset the object discarding all added points.

[virtual reset]

Parameters

sd

a NcmStatsDist

 

Types and Values

NCM_TYPE_STATS_DIST

#define NCM_TYPE_STATS_DIST (ncm_stats_dist_get_type ())

struct NcmStatsDistClass

struct NcmStatsDistClass {
};

enum NcmStatsDistCV

Cross-validation method to be applied.

Members

NCM_STATS_DIST_CV_NONE

No cross validation

 

NCM_STATS_DIST_CV_SPLIT

Sample split cross validation

 

NCM_STATS_DIST_CV_SPLIT_NOFIT

Sample split cross validation without fitting

 

NCM_STATS_DIST_CV_LOO

Leave-one-out cross validation

 

NcmStatsDist

typedef struct _NcmStatsDist NcmStatsDist;

Property Details

The “CV-type” property

  “CV-type”                  NcmStatsDistCV

Cross-validation method.

Owner: NcmStatsDist

Flags: Read / Write / Construct

Default value: NCM_STATS_DIST_CV_NONE


The “N” property

  “N”                        guint

sample size.

Owner: NcmStatsDist

Flags: Read

Default value: 0


The “kernel” property

  “kernel”                   NcmStatsDistKernel *

Interpolating kernel.

Owner: NcmStatsDist

Flags: Read / Write / Construct Only


The “over-smooth” property

  “over-smooth”              double

Over-smooth distribution.

Owner: NcmStatsDist

Flags: Read / Write / Construct

Allowed values: >= 1e-05

Default value: 1


The “print-fit” property

  “print-fit”                gboolean

Whether to print the fitting process.

Owner: NcmStatsDist

Flags: Read / Write / Construct

Default value: FALSE


The “split-frac” property

  “split-frac”               double

Fraction to use in the split cross-validation.

Owner: NcmStatsDist

Flags: Read / Write / Construct

Allowed values: [0.1,0.95]

Default value: 0.5


The “use-threads” property

  “use-threads”              gboolean

Whether to use OpenMP threads during computation.

Owner: NcmStatsDist

Flags: Read / Write / Construct

Default value: FALSE