Top |
NcmStatsDistKernelNcmStatsDistKernel — An N-dimensional kernel used to compute the kernel density estimation function (KDE) in the NcmStatsDist class. |
NcmStatsDistKernel * | ncm_stats_dist_kernel_ref () |
void | ncm_stats_dist_kernel_free () |
void | ncm_stats_dist_kernel_clear () |
guint | ncm_stats_dist_kernel_get_dim () |
gdouble | ncm_stats_dist_kernel_get_rot_bandwidth () |
gdouble | ncm_stats_dist_kernel_get_lnnorm () |
gdouble | ncm_stats_dist_kernel_eval_unnorm () |
void | ncm_stats_dist_kernel_eval_unnorm_vec () |
void | ncm_stats_dist_kernel_eval_sum0_gamma_lambda () |
void | ncm_stats_dist_kernel_eval_sum1_gamma_lambda () |
void | ncm_stats_dist_kernel_sample () |
#define | NCM_TYPE_STATS_DIST_KERNEL |
struct | NcmStatsDistKernelClass |
NcmStatsDistKernel |
GObject ╰── NcmStatsDistKernel ├── NcmStatsDistKernelGauss ╰── NcmStatsDistKernelST
An N-dimensional kernel used to compute the kernel density estimation function (KDE) in the NcmStatsDist class.
This class provides the tools to generate a kernel function to be used in a kernel density estimation method. Below is a quick review of the kernel density estimation method and some properties of the kernel function, which are generalized for multidimensional problems. For further information, check [Density Estimation for Statistics and Data Analysis, B.W. Silverman].
Starting with the uni-dimensional case, let $X_1,...,X_n$ be independent and identically distributed (iid) samples drawn from a distribution $f(x)$. The kernel density estimation of the function is \begin{align} \tilde{f}(x) = \sum_{i=1}^{n}K\left(\frac{x-x_i}{h}\right) ,\end{align} where $K$ is the kernel function and $h$ is the bandwidth parameter. The kernel density estimator function must be close to the true density function $f(x)$, which can be tested by analyzing whether the estimator provides similar expected values as the function $f(x)$, that is, the function $\tilde{f}(x)$ must minimize the mean square error (MSE) \begin{align} \label{eqmse} MSE_x(\tilde{f}) = E\left[\tilde{f}(x) - f(x)\right]^2 ,\end{align} where $E$ represents the expected value. This value depends on the choice of the kernel function, the data and the bandwidth. If the estimator $\tilde{f}(x)$ is close enough to the true function, it shall be used to generate samples that are distributed by $f(x)$.
The kernel $K$ is a symmetric function that must satisfy \begin{align} &\int K(x)~dx = 1 .\end{align} Usually, the kernel function is a symmetric probability density function that is easy to sample from, but it is totally under the user's control. Using simple kernels, such as the Gaussian kernel, makes the kernel density estimator method a better alternative to generate samples when the desired distribution is a complicated function.
For the multidimensional case, given i.i.d d-dimensional sample points $X_1,.., X_n$ distributed by $f(x)$, the multivariate kernel density estimator function $\tilde{f}(x)$ is given by \begin{align} \tilde{f}(x) = \frac{1}{h^d} \sum_{i=1}^n w_i K\left(\frac{x-x_i}{h}, \Sigma_i\right) ,\end{align} where $\Sigma_i$ is the covariance matrix of the $i$-th point (the kernels used in this library depend on the covariance matrix), $d$ is the dimension and $w_i$ is the weight attached to each kernel to find the minimal error in equation \eqref{eqmse}.
The methods in this class define the type of kernel $K$, compute the bandwidth factor $h$, evaluate the kernel function at a given $d$-dimensional point $x$ or at a given vector of points $\vec{x}$, and, given the weights $w_i$, compute the kernel density estimation function $\tilde{f}(x)$.
Besides the function ncm_stats_dist_kernel_get_dim()
, this class object only has virtual methods.
Therefore, to use this object, the user must initialize one of the child objects (NcmStatsDistKernelGauss or NcmStatsDistKernelST).
Inside the child objects are the implemented functions, which must be defined for each specific type of kernel function.
Check the childs documentations for more information. More information about how the algorithm should be implemented is described below:
-This class is implemented in the NcmStatsDist class, where the NcmStatsDistKernel class shall define the type of kernel used in the interpolation function in NcmStatsDist and how to compute values such as the weighted sum of the kernels, the bandwidth, and so on. Yet, the user may use these class objects to perform other kernel calculations, although some of the methods are not implemented outside the NcmStatsDist class.
-This class does not possess the methods to compute the weights of each kernel. You may find this method in the NcmStatsDist class.
-Every child object of this class can be used either in the NcmStatsDistKDE class or in the NcmStatsDistVKDE class.
NcmStatsDistKernel *
ncm_stats_dist_kernel_ref (NcmStatsDistKernel *sdk
);
Increase the reference of sdk
by one.
void
ncm_stats_dist_kernel_free (NcmStatsDistKernel *sdk
);
Decrease the reference count of sdk
by one.
void
ncm_stats_dist_kernel_clear (NcmStatsDistKernel **sdk
);
Decrease the reference count of stats_dist_nd_kde_gauss
by one, and sets the pointer *sdk
to
NULL.
guint
ncm_stats_dist_kernel_get_dim (NcmStatsDistKernel *sdk
);
Gets current kernel dimension.
[virtual get_dim]
gdouble ncm_stats_dist_kernel_get_rot_bandwidth (NcmStatsDistKernel *sdk
,const gdouble n
);
Computes the rule-of-thumb bandwidth for a interpolation
using n
kernels.
[virtual get_rot_bandwidth]
gdouble ncm_stats_dist_kernel_get_lnnorm (NcmStatsDistKernel *sdk
,NcmMatrix *cov_decomp
);
Computes the kernel normalization for a given covariance cov_decomp
.
[virtual get_lnnorm]
gdouble ncm_stats_dist_kernel_eval_unnorm (NcmStatsDistKernel *sdk
,const gdouble chi2
);
Computes the unnormalized kernel at $\chi^2=$chi2
.
[virtual eval_unnorm]
void ncm_stats_dist_kernel_eval_unnorm_vec (NcmStatsDistKernel *sdk
,NcmVector *chi2
,NcmVector *Ku
);
Computes the unnormalized kernel at $\chi^2=$chi2
for all elements of chi2
and store the results at Ku
.
[virtual eval_unnorm_vec]
void ncm_stats_dist_kernel_eval_sum0_gamma_lambda (NcmStatsDistKernel *sdk
,NcmVector *chi2
,NcmVector *weights
,NcmVector *lnnorms
,NcmVector *lnK
,gdouble *gamma
,gdouble *lambda
);
Computes the weighted sum of kernels at $\chi^2=$chi2
(the density estimator function),
$$ e^\gamma (1+\lambda) = \sum_i w_i\bar{K} (\chi^2_i) / u_i,$$
where $\gamma = \ln(w_a\bar{K} (\chi^2_a) / u_a)$ and $a$ labels
is the largest term of the sum. This function shall be used when
each kernel has a different normalization factor.
[virtual eval_sum0_gamma_lambda]
sdk |
||
chi2 |
||
weights |
||
lnnorms |
||
lnK |
a NcmVector to store the logarithm of the kernels |
|
gamma |
$\gamma$. |
[out] |
lambda |
$\lambda$. |
[out] |
void ncm_stats_dist_kernel_eval_sum1_gamma_lambda (NcmStatsDistKernel *sdk
,NcmVector *chi2
,NcmVector *weights
,gdouble lnnorm
,NcmVector *lnK
,gdouble *gamma
,gdouble *lambda
);
Computes the weighted sum of kernels at $\chi^2=$chi2
(the density estimator function),
$$ e^\gamma (1+\lambda) = \sum_i w_i\bar{K} (\chi^2_i) / u,$$
where $\gamma = \ln(w_a\bar{K} (\chi^2_a) / u)$ and $a$ labels
is the largest term of the sum. This function shall be used when
all the kernels have the same normalization factor.
[virtual eval_sum1_gamma_lambda]
sdk |
||
chi2 |
||
weights |
||
lnnorm |
a double |
|
lnK |
a NcmVector to store the logarithm of the kernels |
|
gamma |
$\gamma$. |
[out] |
lambda |
$\lambda$. |
[out] |
void ncm_stats_dist_kernel_sample (NcmStatsDistKernel *sdk
,NcmMatrix *cov_decomp
,const gdouble href
,NcmVector *mu
,NcmVector *y
,NcmRNG *rng
);
Generates a random vector from the kernel distribution
using the covariance cov_decomp
, bandwidth href
and
location vector mu
. The result is stored in y
.
[virtual sample]
sdk |
||
cov_decomp |
Cholesky decomposition of the kernel covariance |
|
href |
kernel bandwidth |
|
mu |
kernel location vector |
|
y |
output vector |
|
rng |
a NcmRNG |
struct NcmStatsDistKernelClass { GObjectClass parent_class; void (*set_dim) (NcmStatsDistKernel *sdk, const guint dim); guint (*get_dim) (NcmStatsDistKernel *sdk); gdouble (*get_rot_bandwidth) (NcmStatsDistKernel *sdk, const gdouble n); gdouble (*get_lnnorm) (NcmStatsDistKernel *sdk, NcmMatrix *cov_decomp); gdouble (*eval_unnorm) (NcmStatsDistKernel *sdk, const gdouble chi2); void (*eval_unnorm_vec) (NcmStatsDistKernel *sdk, NcmVector *chi2, NcmVector *Ku); void (*eval_sum0_gamma_lambda) (NcmStatsDistKernel *sdk, NcmVector *chi2, NcmVector *weights, NcmVector *lnnorms, NcmVector *lnK, gdouble *gamma, gdouble *lambda); void (*eval_sum1_gamma_lambda) (NcmStatsDistKernel *sdk, NcmVector *chi2, NcmVector *weights, gdouble lnnorm, NcmVector *lnK, gdouble *gamma, gdouble *lambda); void (*sample) (NcmStatsDistKernel *sdk, NcmMatrix *cov_decomp, const gdouble href, NcmVector *mu, NcmVector *y, NcmRNG *rng); /* Padding to allow 18 virtual functions without breaking ABI. */ };
The virtual function table for NcmStatsDistKernel.
Sets the dimension of the kernel. |
||
Gets the dimension of the kernel. |
||
Gets the rule-of-thumb bandwidth of the kernel. |
||
Gets the log of the normalization constant of the kernel. |
||
Evaluates the unnormalized kernel at a given chi2. |
||
Evaluates the unnormalized kernel at a given chi2 vector. |
||
Evaluates the kernels sum0, gamma and lambda at a given chi2 vector. |
||
Evaluates the kernels sum1, gamma and lambda at a given chi2 vector. |
||
Samples the kernel. |