NcmStatsDistKernel: NumCosmo Reference Manual

NcmStatsDistKernel

NcmStatsDistKernel — An N-dimensional kernel used to compute the kernel density estimation function (KDE) in the NcmStatsDist class.

Functions

NcmStatsDistKernel *	ncm_stats_dist_kernel_ref ()
void	ncm_stats_dist_kernel_free ()
void	ncm_stats_dist_kernel_clear ()
guint	ncm_stats_dist_kernel_get_dim ()
gdouble	ncm_stats_dist_kernel_get_rot_bandwidth ()
gdouble	ncm_stats_dist_kernel_get_lnnorm ()
gdouble	ncm_stats_dist_kernel_eval_unnorm ()
void	ncm_stats_dist_kernel_eval_unnorm_vec ()
void	ncm_stats_dist_kernel_eval_sum0_gamma_lambda ()
void	ncm_stats_dist_kernel_eval_sum1_gamma_lambda ()
void	ncm_stats_dist_kernel_sample ()

Properties

guint

dimension

Read / Write / Construct Only

Types and Values

#define	NCM_TYPE_STATS_DIST_KERNEL
struct	NcmStatsDistKernelClass
	NcmStatsDistKernel

Object Hierarchy

    GObject
    ╰── NcmStatsDistKernel
        ├── NcmStatsDistKernelGauss
        ╰── NcmStatsDistKernelST

Description

An N-dimensional kernel used to compute the kernel density estimation function (KDE) in the NcmStatsDist class.

This class provides the tools to generate a kernel function to be used in a kernel density estimation method. Below is a quick review of the kernel density estimation method and some properties of the kernel function, which are generalized for multidimensional problems. For further information, check [Density Estimation for Statistics and Data Analysis, B.W. Silverman].

Starting with the uni-dimensional case, let $X_1,...,X_n$ be independent and identically distributed (iid) samples drawn from a distribution $f(x)$. The kernel density estimation of the function is \begin{align} \tilde{f}(x) = \sum_{i=1}^{n}K\left(\frac{x-x_i}{h}\right) ,\end{align} where $K$ is the kernel function and $h$ is the bandwidth parameter. The kernel density estimator function must be close to the true density function $f(x)$, which can be tested by analyzing whether the estimator provides similar expected values as the function $f(x)$, that is, the function $\tilde{f}(x)$ must minimize the mean square error (MSE) \begin{align} \label{eqmse} MSE_x(\tilde{f}) = E\left[\tilde{f}(x) - f(x)\right]^2 ,\end{align} where $E$ represents the expected value. This value depends on the choice of the kernel function, the data and the bandwidth. If the estimator $\tilde{f}(x)$ is close enough to the true function, it shall be used to generate samples that are distributed by $f(x)$.

The kernel $K$ is a symmetric function that must satisfy \begin{align} &\int K(x)~dx = 1 .\end{align} Usually, the kernel function is a symmetric probability density function that is easy to sample from, but it is totally under the user's control. Using simple kernels, such as the Gaussian kernel, makes the kernel density estimator method a better alternative to generate samples when the desired distribution is a complicated function.

For the multidimensional case, given i.i.d d-dimensional sample points $X_1,.., X_n$ distributed by $f(x)$, the multivariate kernel density estimator function $\tilde{f}(x)$ is given by \begin{align} \tilde{f}(x) = \frac{1}{h^d} \sum_{i=1}^n w_i K\left(\frac{x-x_i}{h}, \Sigma_i\right) ,\end{align} where $\Sigma_i$ is the covariance matrix of the $i$-th point (the kernels used in this library depend on the covariance matrix), $d$ is the dimension and $w_i$ is the weight attached to each kernel to find the minimal error in equation \eqref{eqmse}.

The methods in this class define the type of kernel $K$, compute the bandwidth factor $h$, evaluate the kernel function at a given $d$-dimensional point $x$ or at a given vector of points $\vec{x}$, and, given the weights $w_i$, compute the kernel density estimation function $\tilde{f}(x)$.

Besides the function ncm_stats_dist_kernel_get_dim(), this class object only has virtual methods. Therefore, to use this object, the user must initialize one of the child objects (NcmStatsDistKernelGauss or NcmStatsDistKernelST). Inside the child objects are the implemented functions, which must be defined for each specific type of kernel function. Check the childs documentations for more information. More information about how the algorithm should be implemented is described below:

-This class is implemented in the NcmStatsDist class, where the NcmStatsDistKernel class shall define the type of kernel used in the interpolation function in NcmStatsDist and how to compute values such as the weighted sum of the kernels, the bandwidth, and so on. Yet, the user may use these class objects to perform other kernel calculations, although some of the methods are not implemented outside the NcmStatsDist class.

-This class does not possess the methods to compute the weights of each kernel. You may find this method in the NcmStatsDist class.

-Every child object of this class can be used either in the NcmStatsDistKDE class or in the NcmStatsDistVKDE class.

Functions

ncm_stats_dist_kernel_ref ()

NcmStatsDistKernel *
ncm_stats_dist_kernel_ref (NcmStatsDistKernel *sdk);

Increase the reference of sdk by one.

Parameters

sdk

a NcmStatsDistKernel

Returns

sdk .

[transfer full]

ncm_stats_dist_kernel_free ()

void
ncm_stats_dist_kernel_free (NcmStatsDistKernel *sdk);

Decrease the reference count of sdk by one.

Parameters

sdk

a NcmStatsDistKernel

ncm_stats_dist_kernel_clear ()

void
ncm_stats_dist_kernel_clear (NcmStatsDistKernel **sdk);

Decrease the reference count of stats_dist_nd_kde_gauss by one, and sets the pointer *sdk to NULL.

Parameters

sdk

a NcmStatsDistKernel

ncm_stats_dist_kernel_get_dim ()

guint
ncm_stats_dist_kernel_get_dim (NcmStatsDistKernel *sdk);

Gets current kernel dimension.

[virtual get_dim]

Parameters

sdk

a NcmStatsDistKernel

Returns

current kernel dimension.

ncm_stats_dist_kernel_get_rot_bandwidth ()

gdouble
ncm_stats_dist_kernel_get_rot_bandwidth
                               (NcmStatsDistKernel *sdk,
                                const gdouble n);

Computes the rule-of-thumb bandwidth for a interpolation using n kernels.

[virtual get_rot_bandwidth]

Parameters

sdk	a NcmStatsDistKernel
n	number of kernels

Returns

the rule-of-thumb bandwidth.

ncm_stats_dist_kernel_get_lnnorm ()

gdouble
ncm_stats_dist_kernel_get_lnnorm (NcmStatsDistKernel *sdk,
                                  NcmMatrix *cov_decomp);

Computes the kernel normalization for a given covariance cov_decomp .

[virtual get_lnnorm]

Parameters

sdk	a NcmStatsDistKernel
cov_decomp	Cholesky decomposition of the kernel covariance

Returns

the kernel normalization logarithm.

ncm_stats_dist_kernel_eval_unnorm ()

gdouble
ncm_stats_dist_kernel_eval_unnorm (NcmStatsDistKernel *sdk,
                                   const gdouble chi2);

Computes the unnormalized kernel at $\chi^2=$chi2 .

[virtual eval_unnorm]

Parameters

sdk	a NcmStatsDistKernel
chi2	a double

Returns

the unnormalized kernel at $\chi^2=$chi2 .

ncm_stats_dist_kernel_eval_unnorm_vec ()

void
ncm_stats_dist_kernel_eval_unnorm_vec (NcmStatsDistKernel *sdk,
                                       NcmVector *chi2,
                                       NcmVector *Ku);

Computes the unnormalized kernel at $\chi^2=$chi2 for all elements of chi2 and store the results at Ku .

[virtual eval_unnorm_vec]

Parameters

sdk	a NcmStatsDistKernel
chi2	a NcmVector
Ku	a NcmVector

ncm_stats_dist_kernel_eval_sum0_gamma_lambda ()

void
ncm_stats_dist_kernel_eval_sum0_gamma_lambda
                               (NcmStatsDistKernel *sdk,
                                NcmVector *chi2,
                                NcmVector *weights,
                                NcmVector *lnnorms,
                                NcmVector *lnK,
                                gdouble *gamma,
                                gdouble *lambda);

Computes the weighted sum of kernels at $\chi^2=$chi2 (the density estimator function), $$ e^\gamma (1+\lambda) = \sum_i w_i\bar{K} (\chi^2_i) / u_i,$$ where $\gamma = \ln(w_a\bar{K} (\chi^2_a) / u_a)$ and $a$ labels is the largest term of the sum. This function shall be used when each kernel has a different normalization factor.

[virtual eval_sum0_gamma_lambda]

Parameters

sdk	a NcmStatsDistKernel
chi2	a NcmVector
weights	a NcmVector
lnnorms	a NcmVector
lnK	a NcmVector to store the logarithm of the kernels
gamma	$\gamma$.	[out]
lambda	$\lambda$.	[out]

ncm_stats_dist_kernel_eval_sum1_gamma_lambda ()

void
ncm_stats_dist_kernel_eval_sum1_gamma_lambda
                               (NcmStatsDistKernel *sdk,
                                NcmVector *chi2,
                                NcmVector *weights,
                                gdouble lnnorm,
                                NcmVector *lnK,
                                gdouble *gamma,
                                gdouble *lambda);

Computes the weighted sum of kernels at $\chi^2=$chi2 (the density estimator function), $$ e^\gamma (1+\lambda) = \sum_i w_i\bar{K} (\chi^2_i) / u,$$ where $\gamma = \ln(w_a\bar{K} (\chi^2_a) / u)$ and $a$ labels is the largest term of the sum. This function shall be used when all the kernels have the same normalization factor.

[virtual eval_sum1_gamma_lambda]

Parameters

sdk	a NcmStatsDistKernel
chi2	a NcmVector
weights	a NcmVector
lnnorm	a double
lnK	a NcmVector to store the logarithm of the kernels
gamma	$\gamma$.	[out]
lambda	$\lambda$.	[out]

ncm_stats_dist_kernel_sample ()

void
ncm_stats_dist_kernel_sample (NcmStatsDistKernel *sdk,
                              NcmMatrix *cov_decomp,
                              const gdouble href,
                              NcmVector *mu,
                              NcmVector *y,
                              NcmRNG *rng);

Generates a random vector from the kernel distribution using the covariance cov_decomp , bandwidth href and location vector mu . The result is stored in y .

[virtual sample]

Parameters

sdk	a NcmStatsDistKernel
cov_decomp	Cholesky decomposition of the kernel covariance
href	kernel bandwidth
mu	kernel location vector
y	output vector
rng	a NcmRNG

Types and Values

NCM_TYPE_STATS_DIST_KERNEL

#define NCM_TYPE_STATS_DIST_KERNEL (ncm_stats_dist_kernel_get_type ())

struct NcmStatsDistKernelClass

struct NcmStatsDistKernelClass {
  GObjectClass parent_class;

  void (*set_dim) (NcmStatsDistKernel *sdk, const guint dim);
  guint (*get_dim) (NcmStatsDistKernel *sdk);
  gdouble (*get_rot_bandwidth) (NcmStatsDistKernel *sdk, const gdouble n);
  gdouble (*get_lnnorm) (NcmStatsDistKernel *sdk, NcmMatrix *cov_decomp);
  gdouble (*eval_unnorm) (NcmStatsDistKernel *sdk, const gdouble chi2);
  void (*eval_unnorm_vec) (NcmStatsDistKernel *sdk, NcmVector *chi2, NcmVector *Ku);
  void (*eval_sum0_gamma_lambda) (NcmStatsDistKernel *sdk, NcmVector *chi2, NcmVector *weights, NcmVector *lnnorms, NcmVector *lnK, gdouble *gamma, gdouble *lambda);
  void (*eval_sum1_gamma_lambda) (NcmStatsDistKernel *sdk, NcmVector *chi2, NcmVector *weights, gdouble lnnorm, NcmVector *lnK, gdouble *gamma, gdouble *lambda);
  void (*sample) (NcmStatsDistKernel *sdk, NcmMatrix *cov_decomp, const gdouble href, NcmVector *mu, NcmVector *y, NcmRNG *rng);

  /* Padding to allow 18 virtual functions without breaking ABI. */
};

The virtual function table for NcmStatsDistKernel.

Members

`set_dim` ()	Sets the dimension of the kernel.
`get_dim` ()	Gets the dimension of the kernel.
`get_rot_bandwidth` ()	Gets the rule-of-thumb bandwidth of the kernel.
`get_lnnorm` ()	Gets the log of the normalization constant of the kernel.
`eval_unnorm` ()	Evaluates the unnormalized kernel at a given chi2.
`eval_unnorm_vec` ()	Evaluates the unnormalized kernel at a given chi2 vector.
`eval_sum0_gamma_lambda` ()	Evaluates the kernels sum0, gamma and lambda at a given chi2 vector.
`eval_sum1_gamma_lambda` ()	Evaluates the kernels sum1, gamma and lambda at a given chi2 vector.
`sample` ()	Samples the kernel.