NcmStatsDistKernel

NcmStatsDistKernel — An N-dimensional kernel used to compute the kernel density estimation function (KDE) in the NcmStatsDist class.

Functions

Properties

guint dimension Read / Write / Construct Only

Types and Values

Object Hierarchy

    GObject
    ╰── NcmStatsDistKernel
        ├── NcmStatsDistKernelGauss
        ╰── NcmStatsDistKernelST

Description

An N-dimensional kernel used to compute the kernel density estimation function (KDE) in the NcmStatsDist class.

This class provides the tools to generate a kernel function to be used in a kernel density estimation method. Below is a quick review of the kernel density estimation method and some properties of the kernel function, which are generalized for multidimensional problems. For further information, check [Density Estimation for Statistics and Data Analysis, B.W. Silverman].

Starting with the uni-dimensional case, let $X_1,...,X_n$ be independent and identically distributed (iid) samples drawn from a distribution $f(x)$. The kernel density estimation of the function is \begin{align} \tilde{f}(x) = \sum_{i=1}^{n}K\left(\frac{x-x_i}{h}\right) ,\end{align} where $K$ is the kernel function and $h$ is the bandwidth parameter. The kernel density estimator function must be close to the true density function $f(x)$, which can be tested by analyzing whether the estimator provides similar expected values as the function $f(x)$, that is, the function $\tilde{f}(x)$ must minimize the mean square error (MSE) \begin{align} \label{eqmse} MSE_x(\tilde{f}) = E\left[\tilde{f}(x) - f(x)\right]^2 ,\end{align} where $E$ represents the expected value. This value depends on the choice of the kernel function, the data and the bandwidth. If the estimator $\tilde{f}(x)$ is close enough to the true function, it shall be used to generate samples that are distributed by $f(x)$.

The kernel $K$ is a symmetric function that must satisfy \begin{align} &\int K(x)~dx = 1 .\end{align} Usually, the kernel function is a symmetric probability density function that is easy to sample from, but it is totally under the user's control. Using simple kernels, such as the Gaussian kernel, makes the kernel density estimator method a better alternative to generate samples when the desired distribution is a complicated function.

For the multidimensional case, given i.i.d d-dimensional sample points $X_1,.., X_n$ distributed by $f(x)$, the multivariate kernel density estimator function $\tilde{f}(x)$ is given by \begin{align} \tilde{f}(x) = \frac{1}{h^d} \sum_{i=1}^n w_i K\left(\frac{x-x_i}{h}, \Sigma_i\right) ,\end{align} where $\Sigma_i$ is the covariance matrix of the $i$-th point (the kernels used in this library depend on the covariance matrix), $d$ is the dimension and $w_i$ is the weight attached to each kernel to find the minimal error in equation \eqref{eqmse}.

The methods in this class define the type of kernel $K$, compute the bandwidth factor $h$, evaluate the kernel function at a given $d$-dimensional point $x$ or at a given vector of points $\vec{x}$, and, given the weights $w_i$, compute the kernel density estimation function $\tilde{f}(x)$.

Besides the function ncm_stats_dist_kernel_get_dim(), this class object only has virtual methods. Therefore, to use this object, the user must initialize one of the child objects (NcmStatsDistKernelGauss or NcmStatsDistKernelST). Inside the child objects are the implemented functions, which must be defined for each specific type of kernel function. Check the childs documentations for more information. More information about how the algorithm should be implemented is described below:

-This class is implemented in the NcmStatsDist class, where the NcmStatsDistKernel class shall define the type of kernel used in the interpolation function in NcmStatsDist and how to compute values such as the weighted sum of the kernels, the bandwidth, and so on. Yet, the user may use these class objects to perform other kernel calculations, although some of the methods are not implemented outside the NcmStatsDist class.

-This class does not possess the methods to compute the weights of each kernel. You may find this method in the NcmStatsDist class.

-Every child object of this class can be used either in the NcmStatsDistKDE class or in the NcmStatsDistVKDE class.

Functions

ncm_stats_dist_kernel_ref ()

NcmStatsDistKernel *
ncm_stats_dist_kernel_ref (NcmStatsDistKernel *sdk);

Increase the reference of sdk by one.

Parameters

Returns

sdk .

[transfer full]


ncm_stats_dist_kernel_free ()

void
ncm_stats_dist_kernel_free (NcmStatsDistKernel *sdk);

Decrease the reference count of sdk by one.

Parameters


ncm_stats_dist_kernel_clear ()

void
ncm_stats_dist_kernel_clear (NcmStatsDistKernel **sdk);

Decrease the reference count of stats_dist_nd_kde_gauss by one, and sets the pointer *sdk to NULL.

Parameters


ncm_stats_dist_kernel_get_dim ()

guint
ncm_stats_dist_kernel_get_dim (NcmStatsDistKernel *sdk);

Gets current kernel dimension.

[virtual get_dim]

Parameters

Returns

current kernel dimension.


ncm_stats_dist_kernel_get_rot_bandwidth ()

gdouble
ncm_stats_dist_kernel_get_rot_bandwidth
                               (NcmStatsDistKernel *sdk,
                                const gdouble n);

Computes the rule-of-thumb bandwidth for a interpolation using n kernels.

[virtual get_rot_bandwidth]

Parameters

sdk

a NcmStatsDistKernel

 

n

number of kernels

 

Returns

the rule-of-thumb bandwidth.


ncm_stats_dist_kernel_get_lnnorm ()

gdouble
ncm_stats_dist_kernel_get_lnnorm (NcmStatsDistKernel *sdk,
                                  NcmMatrix *cov_decomp);

Computes the kernel normalization for a given covariance cov_decomp .

[virtual get_lnnorm]

Parameters

sdk

a NcmStatsDistKernel

 

cov_decomp

Cholesky decomposition of the kernel covariance

 

Returns

the kernel normalization logarithm.


ncm_stats_dist_kernel_eval_unnorm ()

gdouble
ncm_stats_dist_kernel_eval_unnorm (NcmStatsDistKernel *sdk,
                                   const gdouble chi2);

Computes the unnormalized kernel at $\chi^2=$chi2 .

[virtual eval_unnorm]

Parameters

sdk

a NcmStatsDistKernel

 

chi2

a double

 

Returns

the unnormalized kernel at $\chi^2=$chi2 .


ncm_stats_dist_kernel_eval_unnorm_vec ()

void
ncm_stats_dist_kernel_eval_unnorm_vec (NcmStatsDistKernel *sdk,
                                       NcmVector *chi2,
                                       NcmVector *Ku);

Computes the unnormalized kernel at $\chi^2=$chi2 for all elements of chi2 and store the results at Ku .

[virtual eval_unnorm_vec]

Parameters

sdk

a NcmStatsDistKernel

 

chi2

a NcmVector

 

Ku

a NcmVector

 

ncm_stats_dist_kernel_eval_sum0_gamma_lambda ()

void
ncm_stats_dist_kernel_eval_sum0_gamma_lambda
                               (NcmStatsDistKernel *sdk,
                                NcmVector *chi2,
                                NcmVector *weights,
                                NcmVector *lnnorms,
                                NcmVector *lnK,
                                gdouble *gamma,
                                gdouble *lambda);

Computes the weighted sum of kernels at $\chi^2=$chi2 (the density estimator function), $$ e^\gamma (1+\lambda) = \sum_i w_i\bar{K} (\chi^2_i) / u_i,$$ where $\gamma = \ln(w_a\bar{K} (\chi^2_a) / u_a)$ and $a$ labels is the largest term of the sum. This function shall be used when each kernel has a different normalization factor.

[virtual eval_sum0_gamma_lambda]

Parameters

sdk

a NcmStatsDistKernel

 

chi2

a NcmVector

 

weights

a NcmVector

 

lnnorms

a NcmVector

 

lnK

a NcmVector to store the logarithm of the kernels

 

gamma

$\gamma$.

[out]

lambda

$\lambda$.

[out]

ncm_stats_dist_kernel_eval_sum1_gamma_lambda ()

void
ncm_stats_dist_kernel_eval_sum1_gamma_lambda
                               (NcmStatsDistKernel *sdk,
                                NcmVector *chi2,
                                NcmVector *weights,
                                gdouble lnnorm,
                                NcmVector *lnK,
                                gdouble *gamma,
                                gdouble *lambda);

Computes the weighted sum of kernels at $\chi^2=$chi2 (the density estimator function), $$ e^\gamma (1+\lambda) = \sum_i w_i\bar{K} (\chi^2_i) / u,$$ where $\gamma = \ln(w_a\bar{K} (\chi^2_a) / u)$ and $a$ labels is the largest term of the sum. This function shall be used when all the kernels have the same normalization factor.

[virtual eval_sum1_gamma_lambda]

Parameters

sdk

a NcmStatsDistKernel

 

chi2

a NcmVector

 

weights

a NcmVector

 

lnnorm

a double

 

lnK

a NcmVector to store the logarithm of the kernels

 

gamma

$\gamma$.

[out]

lambda

$\lambda$.

[out]

ncm_stats_dist_kernel_sample ()

void
ncm_stats_dist_kernel_sample (NcmStatsDistKernel *sdk,
                              NcmMatrix *cov_decomp,
                              const gdouble href,
                              NcmVector *mu,
                              NcmVector *y,
                              NcmRNG *rng);

Generates a random vector from the kernel distribution using the covariance cov_decomp , bandwidth href and location vector mu . The result is stored in y .

[virtual sample]

Parameters

sdk

a NcmStatsDistKernel

 

cov_decomp

Cholesky decomposition of the kernel covariance

 

href

kernel bandwidth

 

mu

kernel location vector

 

y

output vector

 

rng

a NcmRNG

 

Types and Values

NCM_TYPE_STATS_DIST_KERNEL

#define NCM_TYPE_STATS_DIST_KERNEL (ncm_stats_dist_kernel_get_type ())

struct NcmStatsDistKernelClass

struct NcmStatsDistKernelClass {
  GObjectClass parent_class;

  void (*set_dim) (NcmStatsDistKernel *sdk, const guint dim);
  guint (*get_dim) (NcmStatsDistKernel *sdk);
  gdouble (*get_rot_bandwidth) (NcmStatsDistKernel *sdk, const gdouble n);
  gdouble (*get_lnnorm) (NcmStatsDistKernel *sdk, NcmMatrix *cov_decomp);
  gdouble (*eval_unnorm) (NcmStatsDistKernel *sdk, const gdouble chi2);
  void (*eval_unnorm_vec) (NcmStatsDistKernel *sdk, NcmVector *chi2, NcmVector *Ku);
  void (*eval_sum0_gamma_lambda) (NcmStatsDistKernel *sdk, NcmVector *chi2, NcmVector *weights, NcmVector *lnnorms, NcmVector *lnK, gdouble *gamma, gdouble *lambda);
  void (*eval_sum1_gamma_lambda) (NcmStatsDistKernel *sdk, NcmVector *chi2, NcmVector *weights, gdouble lnnorm, NcmVector *lnK, gdouble *gamma, gdouble *lambda);
  void (*sample) (NcmStatsDistKernel *sdk, NcmMatrix *cov_decomp, const gdouble href, NcmVector *mu, NcmVector *y, NcmRNG *rng);

  /* Padding to allow 18 virtual functions without breaking ABI. */
};

The virtual function table for NcmStatsDistKernel.

Members

set_dim ()

Sets the dimension of the kernel.

 

get_dim ()

Gets the dimension of the kernel.

 

get_rot_bandwidth ()

Gets the rule-of-thumb bandwidth of the kernel.

 

get_lnnorm ()

Gets the log of the normalization constant of the kernel.

 

eval_unnorm ()

Evaluates the unnormalized kernel at a given chi2.

 

eval_unnorm_vec ()

Evaluates the unnormalized kernel at a given chi2 vector.

 

eval_sum0_gamma_lambda ()

Evaluates the kernels sum0, gamma and lambda at a given chi2 vector.

 

eval_sum1_gamma_lambda ()

Evaluates the kernels sum1, gamma and lambda at a given chi2 vector.

 

sample ()

Samples the kernel.

 

NcmStatsDistKernel

typedef struct _NcmStatsDistKernel NcmStatsDistKernel;

Property Details

The “dimension” property

  “dimension”                guint

Kernel dimension.

Owner: NcmStatsDistKernel

Flags: Read / Write / Construct Only

Allowed values: >= 2

Default value: 2