Provided by: mlpack-bin_4.3.0-2build1_amd64 bug

NAME

       mlpack_random_forest - random forests

SYNOPSIS

        mlpack_random_forest [-m unknown] [-l unknown] [-D int] [-g double] [-n int] [-N int] [-a bool] [-s int] [-d int] [-T unknown] [-L unknown] [-t unknown] [-V bool] [-w bool] [-M unknown] [-p unknown] [-P unknown] [-h -v]

DESCRIPTION

       This  program is an implementation of the standard random forest classification algorithm by Leo Breiman.
       A random forest can be trained and saved for later use, or a random forest may be loaded and  predictions
       or class probabilities for points may be generated.

       The  training  set and associated labels are specified with the '--training_file (-t)' and '--labels_file
       (-l)' parameters, respectively. The labels should be in the range [0, num_classes -  1].  Optionally,  if
       '--labels_file  (-l)'  is  not specified, the labels are assumed to be the last dimension of the training
       dataset.

       When a model is trained, the '--output_model_file (-M)' output parameter may be used to save the  trained
       model.  A  model  may  be  loaded  for  predictions  with  the  '--input_model_file  (-m)'parameter.  The
       '--input_model_file (-m)' parameter may not be specified when the  '--training_file  (-t)'  parameter  is
       specified.  The '--minimum_leaf_size (-n)' parameter specifies the minimum number of training points that
       must fall into each leaf for it to be split.  The '--num_trees (-N)' controls the number of trees in  the
       random  forest.  The  ’--minimum_gain_split  (-g)'  parameter  controls  the  minimum required gain for a
       decision tree node to split. Larger values will  force  higher-confidence  splits.  The  '--maximum_depth
       (-D)'  parameter  specifies the maximum depth of the tree. The '--subspace_dim (-d)' parameter is used to
       control   the   number   of   random   dimensions   chosen   for   an   individual   node's   split.   If
       ’--print_training_accuracy  (-a)'  is  specified,  the  calculated  accuracy  on the training set will be
       printed.

       Test data may be specified with the '--test_file (-T)' parameter, and if performance measures are desired
       for that test set, labels for the test  points  may  be  specified  with  the  '--test_labels_file  (-L)'
       parameter.  Predictions  for  each  test  point  may  be  saved  via  the '--predictions_file (-p)'output
       parameter. Class probabilities for each prediction may be  saved  with  the  ’--probabilities_file  (-P)'
       output parameter.

       For  example,  to  train  a  random  forest  with a minimum leaf size of 20 using 10 trees on the dataset
       contained in 'data.csv'with labels 'labels.csv', saving the output random forest  to  'rf_model.bin'  and
       printing the training error, one could call

       $   mlpack_random_forest   --training_file   data.csv  --labels_file  labels.csv  --minimum_leaf_size  20
       --num_trees 10 --output_model_file rf_model.bin --print_training_accuracy

       Then, to use that model to classify points in 'test_set.csv' and print the test error  given  the  labels
       'test_labels.csv' using that model, while saving the predictions for each point to 'predictions.csv', one
       could call

       $   mlpack_random_forest  --input_model_file  rf_model.bin  --test_file  test_set.csv  --test_labels_file
       test_labels.csv --predictions_file predictions.csv

OPTIONAL INPUT OPTIONS

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Pre-trained random forest to use for classification.   --labels_file  (-l)  [unknown]  Labels  for
              training dataset.

       --maximum_depth (-D) [int]
              Maximum depth of the tree (0 means no limit).  Default value 0.

       --minimum_gain_split (-g) [double]
              Minimum gain needed to make a split when building a tree. Default value 0.

       --minimum_leaf_size (-n) [int]
              Minimum number of points in each leaf node.  Default value 1.

       --num_trees (-N) [int]
              Number of trees in the random forest. Default value 10.

       --print_training_accuracy (-a) [bool]
              If set, then the accuracy of the model on the training set will be predicted (verbose must also be
              specified).

       --seed (-s) [int]
              Random seed. If 0, 'std::time(NULL)' is used.  Default value 0.

       --subspace_dim (-d) [int]
              Dimensionality  of  random  subspace to use for each split. '0' will autoselect the square root of
              data dimensionality. Default value 0.

       --test_file (-T) [unknown]
              Test dataset to produce predictions for.

       --test_labels_file (-L) [unknown]
              Test dataset labels, if accuracy calculation is desired.

       --training_file (-t) [unknown]
              Training dataset.

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

       --warm_start (-w) [bool]
              If true and passed along with `training` and `input_model`  then  trains  more  trees  on  top  of
              existing model.

OPTIONAL OUTPUT OPTIONS

       --output_model_file (-M) [unknown]
              Model to save trained random forest to.

       --predictions_file (-p) [unknown]
              Predicted classes for each point in the test set.

       --probabilities_file (-P) [unknown]
              Predicted class probabilities for each point in the test set.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant  papers, citations, and theory, consult the documentation
       found at http://www.mlpack.org or included with your distribution of mlpack.

mlpack-4.3.0                                     19 January 2024                         mlpack_random_forest(1)