Ubuntu Manpage: vw - Vowpal Wabbit -- fast online learning tool

Provided by: vowpal-wabbit_8.6.1.dfsg1-1build3_amd64

NAME

       vw - Vowpal Wabbit -- fast online learning tool

DESCRIPTION

   VW options:
       --ring_size arg
              size of example ring

       --onethread
              Disable parse thread

   Update options:
       -l [ --learning_rate ] arg
              Set learning rate

       --power_t arg
              t power value

       --decay_learning_rate arg
              Set Decay factor for learning_rate between passes

       --initial_t arg
              initial t value

       --feature_mask arg
              Use  existing  regressor  to  determine  which parameters may be updated.  If no initial_regressor
              given, also used for initial weights.

   Weight options:
       -i [ --initial_regressor ] arg
              Initial regressor(s)

       --initial_weight arg
              Set all weights to an initial value of arg.

       --random_weights arg
              make initial weights random

       --normal_weights arg
              make initial weights normal

       --truncated_normal_weights arg
              make initial weights truncated normal

       --sparse_weights
              Use a sparse datastructure for weights

       --input_feature_regularizer arg
              Per feature regularization input file

   Parallelization options:
       --span_server arg
              Location of server for setting up spanning tree

       --threads
              Enable multi-threading

       --unique_id arg (=0)
              unique id used for cluster parallel jobs

       --total arg (=1)
              total number of nodes used in cluster parallel job

       --node arg (=0)
              node number in cluster parallel job

   Diagnostic options:
       --version
              Version information

       -a [ --audit ]
              print weights of features

       -P [ --progress ] arg
              Progress update frequency. int: additive, float: multiplicative

       --quiet
              Don't output disgnostics and progress updates

       -h [ --help ]
              Look here: http://hunch.net/~vw/ and click on Tutorial.

   Random Seed option:
       --random_seed arg
              seed random number generator

   Feature options:
       --hash arg
              how to hash the features. Available options: strings, all

       --hash_seed arg (=0)
              seed for hash function

       --ignore arg
              ignore namespaces beginning with character <arg>

       --ignore_linear arg
              ignore namespaces beginning with character <arg> for linear terms only

       --keep arg
              keep namespaces beginning with character <arg>

       --redefine arg
              redefine namespaces beginning with characters of string S as namespace N.  <arg> shall be in  form
              'N:=S'  where := is operator. Empty N or S are treated as default namespace. Use ':' as a wildcard
              in S.

       -b [ --bit_precision ] arg
              number of bits in the feature table

       --noconstant
              Don't add a constant feature

       -C [ --constant ] arg
              Set initial value of constant

       --ngram arg
              Generate N grams. To generate N grams for a single namespace 'foo', arg should be fN.

       --skips arg
              Generate skips in N grams. This in conjunction  with  the  ngram  tag  can  be  used  to  generate
              generalized n-skip-k-gram. To generate n-skips for a single namespace 'foo', arg should be fN.

       --feature_limit arg
              limit to N features. To apply to a single namespace 'foo', arg should be fN

       --affix arg
              generate  prefixes/suffixes  of features; argument '+2a,-3b,+1' means generate 2-char prefixes for
              namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

       --spelling arg
              compute spelling features for a give namespace (use '_' for default namespace)

       --dictionary arg
              read a dictionary for additional features (arg either 'x:file' or just 'file')

       --dictionary_path arg
              look in this directory for dictionaries; defaults to current directory or env{PATH}

       --interactions arg
              Create feature interactions of any level between namespaces.

       --permutations
              Use permutations instead of combinations for feature interactions of same namespace.

       --leave_duplicate_interactions
              Don't remove interactions with duplicate combinations of namespaces.  For ex. this is a duplicate:
              '-q ab -q ba' and a lot more in '-q ::'.

       -q [ --quadratic ] arg
              Create and use quadratic features

       --q: arg
              : corresponds to a wildcard for all printable characters

       --cubic arg
              Create and use cubic features

   Example options:
       -t [ --testonly ]
              Ignore label information and just test

       --holdout_off
              no holdout data in multiple passes

       --holdout_period arg (=10)
              holdout period for test only

       --holdout_after arg
              holdout after n training examples, default off (disables holdout_period)

       --early_terminate arg (=3)
              Specify the number of passes tolerated when holdout loss doesn't decrease before early termination

       --passes arg
              Number of Training Passes

       --initial_pass_length arg
              initial number of examples per pass

       --examples arg
              number of examples to parse

       --min_prediction arg
              Smallest prediction to output

       --max_prediction arg
              Largest prediction to output

       --sort_features
              turn this on to disregard order in which features have been defined. This  will  lead  to  smaller
              cache sizes

       --loss_function arg (=squared)
              Specify  the  loss  function  to  be  used,  uses squared by default. Currently available ones are
              squared, classic, hinge, logistic, quantile and poisson.

       --quantile_tau arg (=0.5)
              Parameter \tau associated with Quantile loss. Defaults to 0.5

       --l1 arg
              l_1 lambda

       --l2 arg
              l_2 lambda

       --no_bias_regularization arg
              no bias in regularization

       --named_labels arg
              use names for labels (multiclass, etc.)  rather than integers,  argument  specified  all  possible
              labels, comma-sep, eg "--named_labels Noun,Verb,Adj,Punc"

   Output model:
       -f [ --final_regressor ] arg
              Final regressor

       --readable_model arg
              Output human-readable final regressor with numeric features

       --invert_hash arg
              Output human-readable final regressor with feature names.  Computationally expensive.

       --save_resume
              save extra state so learning can be resumed later with new data

       --preserve_performance_counters
              reset performance counters when warmstarting

       --save_per_pass
              Save the model after every pass over data

       --output_feature_regularizer_binary arg
              Per feature regularization output file

       --output_feature_regularizer_text arg Per feature regularization output file,
              in text

       --id arg
              User supplied ID embedded into the final regressor

   Output options:
       -p [ --predictions ] arg
              File to output predictions to

       -r [ --raw_predictions ] arg
              File to output unnormalized predictions to

   Audit Regressor:
       --audit_regressor arg
              stores  feature  names  and  their  regressor values. Same dataset must be used for both regressor
              training and this mode.

   Search options:
       --search arg
              Use learning to search, argument=maximum action id or 0 for LDF

       --search_task arg
              the search task (use "--search_task list" to get a list of available tasks)

       --search_metatask arg
              the search metatask (use "--search_metatask list" to get a list of available metatasks)

       --search_interpolation arg
              at what level should interpolation happen? [*data|policy]

       --search_rollout arg
              how should rollouts be executed?  [policy|oracle|*mix_per_state|mix_p

              er_roll|none]

       --search_rollin arg
              how should past trajectories be generated? [policy|oracle|*mix_per_stat e|mix_per_roll]

       --search_passes_per_policy arg (=1)
              number of passes per policy (only valid for search_interpolation=policy)

       --search_beta arg (=0.5)
              interpolation rate for policies (only valid for search_interpolation=policy)

       --search_alpha arg (=1.00000001e-10)
              annealed beta = 1-(1-alpha)^t (only valid for search_interpolation=data)

       --search_total_nb_policies arg
              if we are going to train the policies through multiple separate calls to vw, we  need  to  specify
              this parameter and tell vw how many policies are eventually going to be trained

       --search_trained_nb_policies arg
              the number of trained policies in a file

       --search_allowed_transitions arg
              read file of allowed transitions [def: all transitions are allowed]

       --search_subsample_time arg
              instead  of  training  at all timesteps, use a subset. if value in (0,1), train on a random v%. if
              v>=1, train on precisely v steps per example, if v<=-1, use active learning

       --search_neighbor_features arg
              copy features from neighboring lines.  argument looks like: '-1:a,+2' meaning copy  previous  line
              namespace a and next next line from namespace _unnamed_, where ',' separates them

       --search_rollout_num_steps arg
              how  many  calls  of  "loss"  before  we  stop  really predicting on rollouts and switch to oracle
              (default means "infinite")

       --search_history_length arg (=1)
              some tasks allow you to specify how much history their depend on; specify that here

       --search_no_caching
              turn off the built-in caching ability (makes things slower, but technically more safe)

       --search_xv
              train two separate policies, alternating prediction/learning

       --search_perturb_oracle arg (=0)
              perturb the oracle on rollin with this probability

       --search_linear_ordering
              insist on generating examples in linear order (def: hoopla permutation)

       --search_active_verify arg
              verify that active learning is doing the right thing (arg = multiplier, should be =  cost_range  *
              range_c)

       --search_save_every_k_runs arg
              save model every k runs

   Experience Replay:
       --replay_c arg
              use  experience  replay  at  a  specified level [b=classification/regression, m=multiclass, c=cost
              sensitive] with specified buffer size

       --replay_c_count arg (=1)
              how many times (in expectation) should each example be played (default: 1 = permuting)

   Explore evaluation:
       --explore_eval
              Evaluate explore_eval adf policies

       --multiplier arg
              Multiplier used to make all rejection sample probabilities <= 1

   Make Multiclass into Contextual Bandit:
       --cbify arg
              Convert multiclass on <k> classes into a contextual bandit problem

       --cbify_cs
              consume cost-sensitive classification examples instead of multiclass

       --loss0 arg (=0)
              loss for correct label

       --loss1 arg (=1)
              loss for incorrect label

   Contextual Bandit Exploration with Action Dependent Features:
       --cb_explore_adf
              Online explore-exploit for a contextual bandit problem with multiline action dependent features

       --first arg
              tau-first exploration

       --epsilon arg
              epsilon-greedy exploration

       --bag arg
              bagging-based exploration

       --cover arg
              Online cover based exploration

       --psi arg (=1)
              disagreement parameter for cover

       --nounif
              do not explore uniformly on zero-probability actions in cover

       --softmax
              softmax exploration

       --regcb
              RegCB-elim exploration

       --regcbopt
              RegCB optimistic exploration

       --mellowness arg (=0.100000001)
              RegCB mellowness parameter c_0. Default 0.1

       --greedify
              always update first policy once in bagging

       --cb_min_cost arg (=0)
              lower bound on cost

       --cb_max_cost arg (=1)
              upper bound on cost

       --first_only
              Only explore the first action in a tie-breaking event

       --lambda arg (=-1)
              parameter for softmax

   Contextual Bandit Exploration:
       --cb_explore arg
              Online explore-exploit for a <k> action contextual bandit problem

       --first arg
              tau-first exploration

       --epsilon arg (=0.0500000007)
              epsilon-greedy exploration

       --bag arg
              bagging-based exploration

       --cover arg
              Online cover based exploration

       --psi arg (=1)
              disagreement parameter for cover

   Multiworld Testing Options:
       --multiworld_test arg
              Evaluate features as a policies

       --learn arg
              Do Contextual Bandit learning on <n> classes.

       --exclude_eval
              Discard mwt policy features before learning

   Contextual Bandit with Action Dependent Features:
       --cb_adf
              Do Contextual Bandit learning with multiline action dependent features.

       --rank_all
              Return actions sorted by score order

       --no_predict
              Do not do a prediction when training

       --cb_type arg (=ips)
              contextual bandit method to use in {ips,dm,dr, mtr}

   Contextual Bandit Options:
       --cb arg
              Use contextual bandit learning with <k> costs

       --cb_type arg (=dr)
              contextual bandit method to use in {ips,dm,dr}

       --eval Evaluate a policy rather than optimizing.

   Cost Sensitive One Against All with Label Dependent Features:
       --csoaa_ldf arg
              Use one-against-all multiclass learning with label dependent features.

       --ldf_override arg
              Override singleline or multiline from csoaa_ldf or wap_ldf, eg if stored in file

       --csoaa_rank
              Return actions sorted by score order

       --probabilities
              predict probabilites of all classes

       --wap_ldf arg
              Use weighted all-pairs multiclass learning with label dependent features.

              Specify singleline or multiline.

   Interact via elementwise multiplication:
       --interact arg
              Put weights on feature products from namespaces <n1> and <n2>

   Cost Sensitive One Against All:
       --csoaa arg
              One-against-all multiclass with <k> costs

   Cost-sensitive Active Learning:
       --cs_active arg
              Cost-sensitive active learning with <k> costs

       --simulation
              cost-sensitive active learning simulation mode

       --baseline
              cost-sensitive active learning baseline

       --domination
              cost-sensitive active learning use domination. Default 1

       --mellowness arg (=0.100000001)
              mellowness parameter c_0. Default 0.1.

       --range_c arg (=0.5)
              parameter controlling the threshold for per-label cost uncertainty. Default 0.5.

       --max_labels arg (=18446744073709551615)
              maximum number of label queries.

       --min_labels arg (=18446744073709551615)
              minimum number of label queries.

       --cost_max arg (=1)
              cost upper bound. Default 1.

       --cost_min arg (=0)
              cost lower bound. Default 0.

       --csa_debug
              print debug stuff for cs_active

   Multilabel One Against All:
       --multilabel_oaa arg
              One-against-all multilabel with <k> labels

   importance weight classes:
       --classweight arg
              importance weight multiplier for class

   Recall Tree:
       --recall_tree arg
              Use online tree for multiclass

       --max_candidates arg
              maximum number of labels per leaf in the tree

       --bern_hyper arg (=1)
              recall tree depth penalty

       --max_depth arg
              maximum depth of the tree, default log_2 (#classes)

       --node_only arg (=0)
              only use node features, not full path features

       --randomized_routing arg (=0)
              randomized routing

   Logarithmic Time Multiclass Tree:
       --log_multi arg
              Use online tree for multiclass

       --no_progress
              disable progressive validation

       --swap_resistance arg (=4)
              higher = more resistance to swap, default=4

   Error Correcting Tournament Options:
       --ect arg
              Error correcting tournament with <k> labels

       --error arg (=0)
              errors allowed by ECT

   Boosting:
       --boosting arg
              Online boosting with <N> weak learners

       --gamma arg (=0.100000001)
              weak learner's edge (=0.1), used only by online BBM

       --alg arg (=BBM)
              specify the boosting algorithm: BBM (default), logistic (AdaBoost.OL.W), adaptive (AdaBoost.OL)

   One Against All Options:
       --oaa arg
              One-against-all multiclass with <k> labels

       --oaa_subsample arg
              subsample this number of negative examples when learning

       --probabilities
              predict probabilites of all classes

       --scores
              output raw scores per class

   Top K:
       --top arg
              top k recommendation

   Experience Replay:
       --replay_m arg
              use experience replay at a  specified  level  [b=classification/regression,  m=multiclass,  c=cost
              sensitive] with specified buffer size

       --replay_m_count arg (=1)
              how many times (in expectation) should each example be played (default: 1 = permuting)

   Binary loss:
       --binary
              report loss as binary classification on -1,1

   Bootstrap:
       --bootstrap arg
              k-way bootstrap by online importance resampling

       --bs_type arg
              prediction type {mean,vote}

   scorer options:
       --link arg (=identity)
              Specify the link function: identity, logistic, glf1 or poisson

   Stagewise polynomial options:
       --stage_poly
              use stagewise polynomial feature learning

       --sched_exponent arg (=1)
              exponent controlling quantity of included features

       --batch_sz arg (=1000)
              multiplier on batch size before including more features

       --batch_sz_no_doubling
              batch_sz does not double

   Low Rank Quadratics FA:
       --lrqfa arg
              use low rank quadratic features with field aware weights

   Low Rank Quadratics:
       --lrq arg
              use low rank quadratic features

       --lrqdropout
              use dropout training for low rank quadratic features

   Autolink:
       --autolink arg
              create link function with polynomial d

   Marginal:
       --marginal arg
              substitute marginal label estimates for ids

       --initial_denominator arg (=1)
              initial denominator

       --initial_numerator arg (=0.5)
              initial numerator

       --compete
              enable competition with marginal features

       --update_before_learn arg (=0)
              update marginal values before learning

       --unweighted_marginals arg (=0)
              ignore importance weights when computing marginals

       --decay arg (=0)
              decay multiplier per event (1e-3 for example)

   Matrix Factorization Reduction:
       --new_mf arg
              rank for reduction-based matrix factorization

   Neural Network:
       --nn arg
              Sigmoidal feedforward network with <k> hidden units

       --inpass
              Train or test sigmoidal feedforward network with input passthrough.

       --multitask
              Share hidden layer across all reduced tasks.

       --dropout
              Train or test sigmoidal feedforward network using dropout.

       --meanfield
              Train or test sigmoidal feedforward network using mean field.

   Confidence:
       --confidence
              Get confidence for binary predictions

       --confidence_after_training
              Confidence after training

   Active Learning with Cover:
       --active_cover
              enable active learning with cover

       --mellowness arg (=8)
              active learning mellowness parameter c_0. Default 8.

       --alpha arg (=1)
              active learning variance upper bound parameter alpha. Default 1.

       --beta_scale arg (=3.1622777)
              active learning variance upper bound parameter beta_scale. Default sqrt(10).

       --cover arg (=12)
              cover size. Default 12.

       --oracular
              Use Oracular-CAL style query or not.  Default false.

   Active Learning:
       --active
              enable active learning

       --simulation
              active learning simulation mode

       --mellowness arg (=8)
              active learning mellowness parameter c_0. Default 8

   Experience Replay:
       --replay_b arg
              use  experience  replay  at  a  specified level [b=classification/regression, m=multiclass, c=cost
              sensitive] with specified buffer size

       --replay_b_count arg (=1)
              how many times (in expectation) should each example be played (default: 1 = permuting)

   Baseline options:
       --baseline
              Learn an additive baseline (from constant features) and a residual separately in regression.

       --lr_multiplier arg
              learning rate multiplier for baseline model

       --global_only
              use separate example with only global constant for baseline predictions

       --check_enabled
              only use baseline when the example contains enabled flag

   OjaNewton options:
       --OjaNewton
              Online Newton with Oja's Sketch

       --sketch_size arg (=10)
              size of sketch

       --epoch_size arg (=1)
              size of epoch

       --alpha arg (=1)
              mutiplicative constant for indentiy

       --alpha_inverse arg
              one over alpha, similar to learning rate

       --learning_rate_cnt arg (=2)
              constant for the learning rate 1/t

       --normalize arg (=1)
              normalize the features or not

       --random_init arg (=1)
              randomize initialization of Oja or not

   LBFGS and Conjugate Gradient options:
       --conjugate_gradient
              use conjugate gradient based optimization

       --bfgs use bfgs optimization

       --hessian_on
              use second derivative in line search

       --mem arg (=15)
              memory in bfgs

       --termination arg (=0.00100000005)
              Termination threshold

   Latent Dirichlet Allocation:
       --lda arg
              Run lda with <int> topics

       --lda_alpha arg (=0.100000001)
              Prior on sparsity of per-document topic weights

       --lda_rho arg (=0.100000001)
              Prior on sparsity of topic distributions

       --lda_D arg (=10000)
              Number of documents

       --lda_epsilon arg (=0.00100000005)
              Loop convergence threshold

       --minibatch arg (=1)
              Minibatch size, for LDA

       --math-mode arg (=0)
              Math mode: simd, accuracy, fast-approx

       --metrics arg (=0)
              Compute metrics

   Noop Learner:
       --noop do no learning

   Print psuedolearner:
       --print
              print examples

   Gradient Descent Matrix Factorization:
       --rank arg
              rank for matrix factorization.

   Network sending:
       --sendto arg
              send examples to <host>

   Stochastic Variance Reduced Gradient:
       --svrg Streaming Stochastic Variance Reduced Gradient

       --stage_size arg (=1)
              Number of passes per SVRG stage

   Follow the Regularized Leader:
       --ftrl FTRL: Follow the Proximal Regularized Leader

       --ftrl_alpha arg (=0.00499999989)
              Learning rate for FTRL optimization

       --ftrl_beta arg (=0.100000001)
              FTRL beta parameter

       --pistol
              FTRL: Parameter-free Stochastic Learning

       --ftrl_alpha arg (=1)
              Learning rate for FTRL optimization

       --ftrl_beta arg (=0.5)
              FTRL beta parameter

   Kernel SVM:
       --ksvm kernel svm

       --reprocess arg (=1)
              number of reprocess steps for LASVM

       --pool_greedy
              use greedy selection on mini pools

       --para_active
              do parallel active learning

       --pool_size arg (=1)
              size of pools for active learning

       --subsample arg (=1)
              number of items to subsample from the pool

       --kernel arg (=linear)
              type of kernel (rbf or linear (default))

       --bandwidth arg (=1)
              bandwidth of rbf kernel

       --degree arg (=2)
              degree of poly kernel

       --lambda arg
              saving regularization for test time

   Gradient Descent options:
       --sgd  use regular stochastic gradient descent update.

       --adaptive
              use adaptive, individual learning rates.

       --adax use adaptive learning rates with x^2 instead of g^2x^2

       --invariant
              use safe/importance aware updates.

       --normalized
              use per feature normalized updates

       --sparse_l2 arg (=0)
              use per feature normalized updates

       --l1_state arg (=0)
              use per feature normalized updates

       --l2_state arg (=1)
              use per feature normalized updates

   Input options:
       -d [ --data ] arg
              Example Set

       --daemon
              persistent daemon mode on port 26542

       --foreground
              in persistent daemon mode, do not run in the background

       --port arg
              port to listen on; use 0 to pick unused port

       --num_children arg
              number of children for persistent daemon mode

       --pid_file arg
              Write pid file in persistent daemon mode

       --port_file arg
              Write port used in persistent daemon mode

       -c [ --cache ]
              Use a cache.  The default is <data>.cache

       --cache_file arg
              The location(s) of cache_file.

       --json Enable JSON parsing.

       --dsjson
              Enable Decision Service JSON parsing.

       -k [ --kill_cache ]
              do not reuse existing cache: create a new one always

       --compressed
              use gzip format whenever possible. If a cache  file  is  being  created,  this  option  creates  a
              compressed   cache   file.   A  mixture  of  raw-text  &  compressed  inputs  are  supported  with
              autodetection.

       --no_stdin
              do not default to reading from stdin

vw 8.6.1                                          December 2020                                            VW(1)