Top |
NcmStatsDistNcmStatsDist — Abstract class for implementing N-dimensional probability distributions. |
NcmStatsDistCV | CV-type | Read / Write / Construct |
guint | N | Read |
NcmStatsDistKernel * | kernel | Read / Write / Construct Only |
double | over-smooth | Read / Write / Construct |
gboolean | print-fit | Read / Write / Construct |
double | split-frac | Read / Write / Construct |
gboolean | use-threads | Read / Write / Construct |
#define | NCM_TYPE_STATS_DIST |
struct | NcmStatsDistClass |
enum | NcmStatsDistCV |
NcmStatsDist |
Abstract class to reconstruct an arbitrary N-dimensional probability distribution. This class provides the tools to perform a radial basis interpolation in a multidimensional function using a radial basis function and then generates a new sample using the interpolation function as the kernel. This method generates a sample that is distributed by the original distribution, but in a more simple way since the used kernels are easier to sample from. For more information about radial basis interpolation, check [Radial Basis Function Interpolation, Wilna du Toit]. A brief description of the radial basis interpolation method can be found below.
Given a d-simensional function $g(x): \mathbf{R}^d \rightarrow \mathbf{R}$, a radial basis function $\phi(x, \Sigma)$ is used such that \begin{align} \label{Interpolation_eq} s(x) = \sum_i^n \lambda_i \phi(|x-x_i|, \Sigma_i), \quad x~ \in~ \mathbf{R} . \end{align} The variables $\lambda_i$ represent the weights and are found such that \begin{align} \label{eqnnls1} s(x_i) = g(x_i) , \end{align} being $x_i$ the sample points. The values generated by $\phi(|x-x_i|, \Sigma_i)$ are displayed in a symmetric $n \times n$ matrix $\Phi$. This function depends on the norm of the points and on the covariance matrix $\Sigma$ associated with each point. The weights $\lambda_i$ are also organised in a matrix representation such that equation \eqref{eqnnls1} becomes \begin{align} \label{eqnnls} G = \lambda \times \Phi ,\end{align} where $G$ is a matrix containing all the function values $g(xi)$. Once the Lambda matrix is found, one may use $s(x)$ to sample values from $g(x)$, which is easier to do since $s(x)$ is a polynomial function.
We want $s(x)$ to be a probability distribution so we can sample from it. Therefore the Lambda matrix containing the weights is seen as the probability density and it must be minimized such that its values are always positive and sum up to one. To solve equation this problem, this algorithm has the tools to solve equation \eqref{eqnnls} for $\lambda$, which is a least-squares problem, using the NNLS method, which can be found in nnls.c file. Thus, the algorithm can randomly choose a kernel $\phi(|x-x_i|, \Sigma_i)$ associated to a probability contained in $\lambda$ and sample a point from it.
In this object, the radial basis interpolation function is not completely defined. One must choose one of the instances of the class, the NcmStatsDistKernelST object or the NcmStatsDistKernelGauss object, which uses a multivariate Student's t function and a Gaussian function as the kernel. After initializing the desired object for the interpolation function, one may use the methods of this file to generate the interpolation and to sample from the new interpolated function.
The user must provide the input the values: over_smooth
- ncm_stats_dist_set_over_smooth()
, split_frac
- ncm_stats_dist_set_split_frac()
,
over_smooth
- ncm_stats_dist_set_over_smooth()
, $v(x)$ - ncm_stats_dist_prepare_interp()
. The other parameters
must be inserted when the instance for the NcmStatsDistKDE or the NcmStatsDistVKDE object is initialized. To perform a calculation of this class, one
needs to initialize the class within one of its subclasses (NcmStatsDistKernelGauss or NcmStatsDistKernelST), along with the input of a child object of the class
NcmStatsDistKernel. For more information about the algorithm, see the description below.
-Since this class does not define what type of kernel will be used in the calculation (the fixed kernel in the NcmStatsDistKDE class or the variable kernel in NcmStatsDistVKDE class), one cannot compute the sample just using this instance. Also, it must be provided the function to be used as the kernel, which is implemented in the children from the class NcmStatsDistKernel. When initializing the NcmStatsDistKDE or NcmStatsDistVKDE classes, the function to be used as the kernel is defined in the object initialization function.
-This class also needs a child object to compute the interpolation matrix $IM$ and the covariance matrices stored in cov_decomp
to perform the interpolation,
which is kernel dependent and therefore also computed by the class child objects.
-Regarding the kernel types based on the radial basis function, $\phi(|x-x_i|)$, and how the sample points in ncm_stats_dist_sample()
are generated,
see the different implementations of NcmStatsDistKernel, e.g., NcmStatsDistKernelGauss and NcmStatsDistKernelST
-Regarding how the functions ncm_stats_dist_eval()
and ncm_stats_dist_eval_m2lnp()
are implemented, see
the different implementations of NcmStatsDist, i.e., NcmStatsDistKDE and NcmStatsDistVKDE. These objects also
compute the covariance matrix of each sample point and other objects needed for the least-squares problem, when
computing the weights matrix ($\lambda$).
NcmStatsDist *
ncm_stats_dist_ref (NcmStatsDist *sd
);
Increases the reference count of sd
.
void
ncm_stats_dist_free (NcmStatsDist *sd
);
Decreases the reference count of sd
.
void
ncm_stats_dist_clear (NcmStatsDist **sd
);
Decreases the reference count of *sd
and sets the pointer *sd
to NULL.
void ncm_stats_dist_set_kernel (NcmStatsDist *sd
,NcmStatsDistKernel *sdk
);
Sets the kernel to be used in the interpolation. The different types of kernels are: the gaussian kernel and the studentt kernel, which are under the file names ncm_stats_dist_kernel_gauss.c and ncm_stats_dist_kernel_st.c.
NcmStatsDistKernel *
ncm_stats_dist_peek_kernel (NcmStatsDist *sd
);
Gets the kernel to be used in the interpolation.
NcmStatsDistKernel *
ncm_stats_dist_get_kernel (NcmStatsDist *sd
);
Gets the kernel to be used in the interpolation.
guint
ncm_stats_dist_get_sample_size (NcmStatsDist *sd
);
After the prepare call, this function returns the size of the sample used in the interpolation.
guint
ncm_stats_dist_get_n_kernels (NcmStatsDist *sd
);
After the prepare call, this function returns the number of kernels used in the interpolation.
void ncm_stats_dist_set_over_smooth (NcmStatsDist *sd
,const gdouble over_smooth
);
Sets the over-smooth factor to over_smooth
.
void ncm_stats_dist_set_split_frac (NcmStatsDist *sd
,const gdouble split_frac
);
Sets cross-correlation split fraction to split_frac
.
This method shall be used when the cv_type is the cv_split.
The split fraction determines the fraction of sample points
that will be left out to use the cross validation method.
void ncm_stats_dist_set_print_fit (NcmStatsDist *sd
,const gboolean print_fit
);
Whether to print steps during the fitting process.
void ncm_stats_dist_set_cv_type (NcmStatsDist *sd
,const NcmStatsDistCV cv_type
);
Sets the cross-validation method to cv_type
.
If the selected method is none, all the sample points
will be used to compute the interpolation. If the cv_type is the cv_split,
a split fraction of the points are randomly excluded and the interpolation
is computed to a best fit of the remaining sample points,
which leads to a more point independent interpolation.
void ncm_stats_dist_set_use_threads (NcmStatsDist *sd
,const gboolean use_threads
);
Sets whether to use OpenMP threads during the computation.
void ncm_stats_dist_prepare_kernel (NcmStatsDist *sd
,GPtrArray *sample_array
);
Prepares the object for computations of the individuals kernels
and is usually part of ncm_stats_dist_prepare()
and is should not
be called directly.
This virtual method does not have a default implementation and must be defined by the descendants.
[virtual prepare_kernel]
void
ncm_stats_dist_prepare (NcmStatsDist *sd
);
Prepares the object for calculations. This function prepares the weight matrix and sets all the weights to 1.0/sample size. It also calls the kernel_prepare function, implemented by a child, and calls the get_href function.
[virtual prepare]
void ncm_stats_dist_prepare_interp (NcmStatsDist *sd
,NcmVector *m2lnp
);
Prepares the object for calculations. Using the distribution values at the sample points. This function calls the prepare function and prepares the needed objects to compute the least squares problem. The interpolation matrix IM is prepered by a child object and called in this function. Then, depending on the cross validation method, the function solves the least squares problem using the ncm_nnls object.
[virtual prepare_interp]
sd |
||
m2lnp |
a NcmVector containing the distribution values that will be used to compute the interpolation function. |
gdouble ncm_stats_dist_eval (NcmStatsDist *sd
,NcmVector *x
);
Evaluate the distribution at $\vec{x}=$x
. The method ncm_stats_dist_eval_m2lnp()
can be used to avoid underflow.
gdouble ncm_stats_dist_eval_m2lnp (NcmStatsDist *sd
,NcmVector *x
);
Evaluate the distribution at $\vec{x}=$x
. This method is more
stable than ncm_stats_dist_eval()
since it avoids underflows
and overflows.
guint ncm_stats_dist_kernel_choose (NcmStatsDist *sd
,NcmRNG *rng
);
Using the pseudo-random number generator rng
chooses
a random kernel based on the computed weights.
void ncm_stats_dist_sample (NcmStatsDist *sd
,NcmVector *x
,NcmRNG *rng
);
Using the pseudo-random number generator rng
generates a
point from the distribution and copy it to x
.
gdouble
ncm_stats_dist_get_rnorm (NcmStatsDist *sd
);
Gets the value of the last $\chi^2$ fit obtained
when computing the interpolation through
ncm_stats_dist_prepare_interp()
.
void ncm_stats_dist_add_obs (NcmStatsDist *sd
,NcmVector *y
);
Adds a new point y
to the sample with weight 1.0.
This function must be called to insert an initial sample into the object, so the interpolation can be computed.
GPtrArray *
ncm_stats_dist_peek_sample_array (NcmStatsDist *sd
);
NcmMatrix * ncm_stats_dist_peek_cov_decomp (NcmStatsDist *sd
,guint i
);
Gets the covariance matrix associated with the i
-th
kernel.
[virtual peek_cov_decomp]
gdouble ncm_stats_dist_get_lnnorm (NcmStatsDist *sd
,guint i
);
Gets the logarithm of the i
-th kernel normalization.
[virtual get_lnnorm]
void ncm_stats_dist_get_Ki (NcmStatsDist *sd
,const guint i
,NcmVector **y_i
,NcmMatrix **cov_i
,gdouble *n_i
,gdouble *w_i
);
Return all information about the i
-th kernel.
void
ncm_stats_dist_reset (NcmStatsDist *sd
);
Reset the object discarding all added points.
[virtual reset]
“CV-type”
property“CV-type” NcmStatsDistCV
Cross-validation method.
Owner: NcmStatsDist
Flags: Read / Write / Construct
Default value: NCM_STATS_DIST_CV_NONE
“kernel”
property“kernel” NcmStatsDistKernel *
Interpolating kernel.
Owner: NcmStatsDist
Flags: Read / Write / Construct Only
“over-smooth”
property “over-smooth” double
Over-smooth distribution.
Owner: NcmStatsDist
Flags: Read / Write / Construct
Allowed values: >= 1e-05
Default value: 1
“print-fit”
property “print-fit” gboolean
Whether to print the fitting process.
Owner: NcmStatsDist
Flags: Read / Write / Construct
Default value: FALSE
“split-frac”
property “split-frac” double
Fraction to use in the split cross-validation.
Owner: NcmStatsDist
Flags: Read / Write / Construct
Allowed values: [0.1,0.95]
Default value: 0.5