Provided by: mlpack-bin_4.3.0-2build1_amd64 bug

NAME

       mlpack_gmm_train - gaussian mixture model (gmm) training

SYNOPSIS

        mlpack_gmm_train -g int -i unknown [-d bool] [-m unknown] [-k int] [-n int] [-P bool] [-N double] [-p double] [-r bool] [-S int] [-s int] [-T double] [-t int] [-V bool] [-M unknown] [-h -v]

DESCRIPTION

       This program takes a parametric estimate of a Gaussian mixture model (GMM) using the EM algorithm to find
       the maximum likelihood estimate. The model may be saved and reused by other mlpack GMM tools.

       The  input  data  to train on must be specified with the '--input_file (-i)' parameter, and the number of
       Gaussians in the model must be specified with the ’--gaussians (-g)' parameter. Optionally,  many  trials
       with  different  random  initializations  may  be  run, and the result with highest log-likelihood on the
       training data will be taken. The number of trials to run is specified with the '--trials (-t)' parameter.
       By default, only one trial is run.

       The tolerance for convergence and maximum number of iterations of the EM algorithm are specified with the
       '--tolerance (-T)' and '--max_iterations (-n)' parameters, respectively. The GMM may be  initialized  for
       training  with  another  model,  specified  with the '--input_model_file (-m)' parameter.  Otherwise, the
       model is initialized by running k-means on  the  data.  The  k-means  clustering  initialization  can  be
       controlled  with  the  ’--kmeans_max_iterations  (-k)',  '--refined_start  (-r)', '--samplings (-S)', and
       '--percentage (-p)' parameters. If '--refined_start (-r)' is specified, then the  Bradley-Fayyad  refined
       start initialization will be used. This can often lead to better clustering results.

       The  'diagonal_covariance'  flag  will  cause  the  learned  covariances  to  be  diagonal matrices. This
       significantly simplifies the model itself and causes training to be faster, but restricts the ability  to
       fit more complex GMMs.

       If  GMM training fails with an error indicating that a covariance matrix could not be inverted, make sure
       that the '--no_force_positive (-P)' parameter is not specified. Alternately, adding  a  small  amount  of
       Gaussian noise (using the '--noise (-N)' parameter) to the entire dataset may help prevent Gaussians with
       zero  variance  in  a  particular  dimension,  which  is  usually  the cause of non-invertible covariance
       matrices.

       The '--no_force_positive (-P)' parameter, if set, will avoid the checks after each iteration  of  the  EM
       algorithm  which ensure that the covariance matrices are positive definite. Specifying the flag can cause
       faster runtime, but may also cause non-positive  definite  covariance  matrices,  which  will  cause  the
       program to crash.

       As an example, to train a 6-Gaussian GMM on the data in 'data.csv' with a maximum of 100 iterations of EM
       and 3 trials, saving the trained GMM to ’gmm.bin', the following command can be used:

       $ mlpack_gmm_train --input_file data.csv --gaussians 6 --trials 3 --output_model_file gmm.bin

       To re-train that GMM on another set of data 'data2.csv', the following command may be used:

       $  mlpack_gmm_train  --input_model_file  gmm.bin --input_file data2.csv --gaussians 6 --output_model_file
       new_gmm.bin

REQUIRED INPUT OPTIONS

       --gaussians (-g) [int]
              Number of Gaussians in the GMM.

       --input_file (-i) [unknown]
              The training data on which the model will be fit.

OPTIONAL INPUT OPTIONS

       --diagonal_covariance (-d) [bool]
              Force the covariance  of  the  Gaussians  to  be  diagonal.  This  can  accelerate  training  time
              significantly.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Initial input GMM model to start training with.

       --kmeans_max_iterations (-k) [int]
              Maximum  number  of  iterations  for  the k-means algorithm (used to initialize EM). Default value
              1000.

       --max_iterations (-n) [int]
              Maximum number of iterations of EM algorithm (passing 0 will run until convergence). Default value
              250.

       --no_force_positive (-P) [bool]
              Do not force the covariance matrices to be positive definite.

       --noise (-N) [double]
              Variance of zero-mean Gaussian noise to add to data. Default value 0.

       --percentage (-p) [double]
              If using --refined_start, specify the percentage of the dataset used for each sampling (should  be
              between 0.0 and 1.0). Default value 0.02.

       --refined_start (-r) [bool]
              During  the  initialization,  use  refined  initial  positions for k-means clustering (Bradley and
              Fayyad, 1998).

       --samplings (-S) [int]
              If using --refined_start, specify the number of samplings used for initial points.  Default  value
              100.

       --seed (-s) [int]
              Random seed. If 0, 'std::time(NULL)' is used.  Default value 0.

       --tolerance (-T) [double]
              Tolerance for convergence of EM. Default value 1e-10.

       --trials (-t) [int]
              Number of trials to perform in training GMM.  Default value 1.

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_model_file (-M) [unknown]
              Output for trained GMM model.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant  papers, citations, and theory, consult the documentation
       found at http://www.mlpack.org or included with your distribution of mlpack.

mlpack-4.3.0                                     19 January 2024                             mlpack_gmm_train(1)