Provided by: vowpal-wabbit_8.6.1.dfsg1-1build3_amd64 

NAME
vw - Vowpal Wabbit -- fast online learning tool
DESCRIPTION
VW options:
--ring_size arg
size of example ring
--onethread
Disable parse thread
Update options:
-l [ --learning_rate ] arg
Set learning rate
--power_t arg
t power value
--decay_learning_rate arg
Set Decay factor for learning_rate between passes
--initial_t arg
initial t value
--feature_mask arg
Use existing regressor to determine which parameters may be updated. If no initial_regressor
given, also used for initial weights.
Weight options:
-i [ --initial_regressor ] arg
Initial regressor(s)
--initial_weight arg
Set all weights to an initial value of arg.
--random_weights arg
make initial weights random
--normal_weights arg
make initial weights normal
--truncated_normal_weights arg
make initial weights truncated normal
--sparse_weights
Use a sparse datastructure for weights
--input_feature_regularizer arg
Per feature regularization input file
Parallelization options:
--span_server arg
Location of server for setting up spanning tree
--threads
Enable multi-threading
--unique_id arg (=0)
unique id used for cluster parallel jobs
--total arg (=1)
total number of nodes used in cluster parallel job
--node arg (=0)
node number in cluster parallel job
Diagnostic options:
--version
Version information
-a [ --audit ]
print weights of features
-P [ --progress ] arg
Progress update frequency. int: additive, float: multiplicative
--quiet
Don't output disgnostics and progress updates
-h [ --help ]
Look here: http://hunch.net/~vw/ and click on Tutorial.
Random Seed option:
--random_seed arg
seed random number generator
Feature options:
--hash arg
how to hash the features. Available options: strings, all
--hash_seed arg (=0)
seed for hash function
--ignore arg
ignore namespaces beginning with character <arg>
--ignore_linear arg
ignore namespaces beginning with character <arg> for linear terms only
--keep arg
keep namespaces beginning with character <arg>
--redefine arg
redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form
'N:=S' where := is operator. Empty N or S are treated as default namespace. Use ':' as a wildcard
in S.
-b [ --bit_precision ] arg
number of bits in the feature table
--noconstant
Don't add a constant feature
-C [ --constant ] arg
Set initial value of constant
--ngram arg
Generate N grams. To generate N grams for a single namespace 'foo', arg should be fN.
--skips arg
Generate skips in N grams. This in conjunction with the ngram tag can be used to generate
generalized n-skip-k-gram. To generate n-skips for a single namespace 'foo', arg should be fN.
--feature_limit arg
limit to N features. To apply to a single namespace 'foo', arg should be fN
--affix arg
generate prefixes/suffixes of features; argument '+2a,-3b,+1' means generate 2-char prefixes for
namespace a, 3-char suffixes for b and 1 char prefixes for default namespace
--spelling arg
compute spelling features for a give namespace (use '_' for default namespace)
--dictionary arg
read a dictionary for additional features (arg either 'x:file' or just 'file')
--dictionary_path arg
look in this directory for dictionaries; defaults to current directory or env{PATH}
--interactions arg
Create feature interactions of any level between namespaces.
--permutations
Use permutations instead of combinations for feature interactions of same namespace.
--leave_duplicate_interactions
Don't remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate:
'-q ab -q ba' and a lot more in '-q ::'.
-q [ --quadratic ] arg
Create and use quadratic features
--q: arg
: corresponds to a wildcard for all printable characters
--cubic arg
Create and use cubic features
Example options:
-t [ --testonly ]
Ignore label information and just test
--holdout_off
no holdout data in multiple passes
--holdout_period arg (=10)
holdout period for test only
--holdout_after arg
holdout after n training examples, default off (disables holdout_period)
--early_terminate arg (=3)
Specify the number of passes tolerated when holdout loss doesn't decrease before early termination
--passes arg
Number of Training Passes
--initial_pass_length arg
initial number of examples per pass
--examples arg
number of examples to parse
--min_prediction arg
Smallest prediction to output
--max_prediction arg
Largest prediction to output
--sort_features
turn this on to disregard order in which features have been defined. This will lead to smaller
cache sizes
--loss_function arg (=squared)
Specify the loss function to be used, uses squared by default. Currently available ones are
squared, classic, hinge, logistic, quantile and poisson.
--quantile_tau arg (=0.5)
Parameter \tau associated with Quantile loss. Defaults to 0.5
--l1 arg
l_1 lambda
--l2 arg
l_2 lambda
--no_bias_regularization arg
no bias in regularization
--named_labels arg
use names for labels (multiclass, etc.) rather than integers, argument specified all possible
labels, comma-sep, eg "--named_labels Noun,Verb,Adj,Punc"
Output model:
-f [ --final_regressor ] arg
Final regressor
--readable_model arg
Output human-readable final regressor with numeric features
--invert_hash arg
Output human-readable final regressor with feature names. Computationally expensive.
--save_resume
save extra state so learning can be resumed later with new data
--preserve_performance_counters
reset performance counters when warmstarting
--save_per_pass
Save the model after every pass over data
--output_feature_regularizer_binary arg
Per feature regularization output file
--output_feature_regularizer_text arg Per feature regularization output file,
in text
--id arg
User supplied ID embedded into the final regressor
Output options:
-p [ --predictions ] arg
File to output predictions to
-r [ --raw_predictions ] arg
File to output unnormalized predictions to
Audit Regressor:
--audit_regressor arg
stores feature names and their regressor values. Same dataset must be used for both regressor
training and this mode.
Search options:
--search arg
Use learning to search, argument=maximum action id or 0 for LDF
--search_task arg
the search task (use "--search_task list" to get a list of available tasks)
--search_metatask arg
the search metatask (use "--search_metatask list" to get a list of available metatasks)
--search_interpolation arg
at what level should interpolation happen? [*data|policy]
--search_rollout arg
how should rollouts be executed? [policy|oracle|*mix_per_state|mix_p
er_roll|none]
--search_rollin arg
how should past trajectories be generated? [policy|oracle|*mix_per_stat e|mix_per_roll]
--search_passes_per_policy arg (=1)
number of passes per policy (only valid for search_interpolation=policy)
--search_beta arg (=0.5)
interpolation rate for policies (only valid for search_interpolation=policy)
--search_alpha arg (=1.00000001e-10)
annealed beta = 1-(1-alpha)^t (only valid for search_interpolation=data)
--search_total_nb_policies arg
if we are going to train the policies through multiple separate calls to vw, we need to specify
this parameter and tell vw how many policies are eventually going to be trained
--search_trained_nb_policies arg
the number of trained policies in a file
--search_allowed_transitions arg
read file of allowed transitions [def: all transitions are allowed]
--search_subsample_time arg
instead of training at all timesteps, use a subset. if value in (0,1), train on a random v%. if
v>=1, train on precisely v steps per example, if v<=-1, use active learning
--search_neighbor_features arg
copy features from neighboring lines. argument looks like: '-1:a,+2' meaning copy previous line
namespace a and next next line from namespace _unnamed_, where ',' separates them
--search_rollout_num_steps arg
how many calls of "loss" before we stop really predicting on rollouts and switch to oracle
(default means "infinite")
--search_history_length arg (=1)
some tasks allow you to specify how much history their depend on; specify that here
--search_no_caching
turn off the built-in caching ability (makes things slower, but technically more safe)
--search_xv
train two separate policies, alternating prediction/learning
--search_perturb_oracle arg (=0)
perturb the oracle on rollin with this probability
--search_linear_ordering
insist on generating examples in linear order (def: hoopla permutation)
--search_active_verify arg
verify that active learning is doing the right thing (arg = multiplier, should be = cost_range *
range_c)
--search_save_every_k_runs arg
save model every k runs
Experience Replay:
--replay_c arg
use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost
sensitive] with specified buffer size
--replay_c_count arg (=1)
how many times (in expectation) should each example be played (default: 1 = permuting)
Explore evaluation:
--explore_eval
Evaluate explore_eval adf policies
--multiplier arg
Multiplier used to make all rejection sample probabilities <= 1
Make Multiclass into Contextual Bandit:
--cbify arg
Convert multiclass on <k> classes into a contextual bandit problem
--cbify_cs
consume cost-sensitive classification examples instead of multiclass
--loss0 arg (=0)
loss for correct label
--loss1 arg (=1)
loss for incorrect label
Contextual Bandit Exploration with Action Dependent Features:
--cb_explore_adf
Online explore-exploit for a contextual bandit problem with multiline action dependent features
--first arg
tau-first exploration
--epsilon arg
epsilon-greedy exploration
--bag arg
bagging-based exploration
--cover arg
Online cover based exploration
--psi arg (=1)
disagreement parameter for cover
--nounif
do not explore uniformly on zero-probability actions in cover
--softmax
softmax exploration
--regcb
RegCB-elim exploration
--regcbopt
RegCB optimistic exploration
--mellowness arg (=0.100000001)
RegCB mellowness parameter c_0. Default 0.1
--greedify
always update first policy once in bagging
--cb_min_cost arg (=0)
lower bound on cost
--cb_max_cost arg (=1)
upper bound on cost
--first_only
Only explore the first action in a tie-breaking event
--lambda arg (=-1)
parameter for softmax
Contextual Bandit Exploration:
--cb_explore arg
Online explore-exploit for a <k> action contextual bandit problem
--first arg
tau-first exploration
--epsilon arg (=0.0500000007)
epsilon-greedy exploration
--bag arg
bagging-based exploration
--cover arg
Online cover based exploration
--psi arg (=1)
disagreement parameter for cover
Multiworld Testing Options:
--multiworld_test arg
Evaluate features as a policies
--learn arg
Do Contextual Bandit learning on <n> classes.
--exclude_eval
Discard mwt policy features before learning
Contextual Bandit with Action Dependent Features:
--cb_adf
Do Contextual Bandit learning with multiline action dependent features.
--rank_all
Return actions sorted by score order
--no_predict
Do not do a prediction when training
--cb_type arg (=ips)
contextual bandit method to use in {ips,dm,dr, mtr}
Contextual Bandit Options:
--cb arg
Use contextual bandit learning with <k> costs
--cb_type arg (=dr)
contextual bandit method to use in {ips,dm,dr}
--eval Evaluate a policy rather than optimizing.
Cost Sensitive One Against All with Label Dependent Features:
--csoaa_ldf arg
Use one-against-all multiclass learning with label dependent features.
--ldf_override arg
Override singleline or multiline from csoaa_ldf or wap_ldf, eg if stored in file
--csoaa_rank
Return actions sorted by score order
--probabilities
predict probabilites of all classes
--wap_ldf arg
Use weighted all-pairs multiclass learning with label dependent features.
Specify singleline or multiline.
Interact via elementwise multiplication:
--interact arg
Put weights on feature products from namespaces <n1> and <n2>
Cost Sensitive One Against All:
--csoaa arg
One-against-all multiclass with <k> costs
Cost-sensitive Active Learning:
--cs_active arg
Cost-sensitive active learning with <k> costs
--simulation
cost-sensitive active learning simulation mode
--baseline
cost-sensitive active learning baseline
--domination
cost-sensitive active learning use domination. Default 1
--mellowness arg (=0.100000001)
mellowness parameter c_0. Default 0.1.
--range_c arg (=0.5)
parameter controlling the threshold for per-label cost uncertainty. Default 0.5.
--max_labels arg (=18446744073709551615)
maximum number of label queries.
--min_labels arg (=18446744073709551615)
minimum number of label queries.
--cost_max arg (=1)
cost upper bound. Default 1.
--cost_min arg (=0)
cost lower bound. Default 0.
--csa_debug
print debug stuff for cs_active
Multilabel One Against All:
--multilabel_oaa arg
One-against-all multilabel with <k> labels
importance weight classes:
--classweight arg
importance weight multiplier for class
Recall Tree:
--recall_tree arg
Use online tree for multiclass
--max_candidates arg
maximum number of labels per leaf in the tree
--bern_hyper arg (=1)
recall tree depth penalty
--max_depth arg
maximum depth of the tree, default log_2 (#classes)
--node_only arg (=0)
only use node features, not full path features
--randomized_routing arg (=0)
randomized routing
Logarithmic Time Multiclass Tree:
--log_multi arg
Use online tree for multiclass
--no_progress
disable progressive validation
--swap_resistance arg (=4)
higher = more resistance to swap, default=4
Error Correcting Tournament Options:
--ect arg
Error correcting tournament with <k> labels
--error arg (=0)
errors allowed by ECT
Boosting:
--boosting arg
Online boosting with <N> weak learners
--gamma arg (=0.100000001)
weak learner's edge (=0.1), used only by online BBM
--alg arg (=BBM)
specify the boosting algorithm: BBM (default), logistic (AdaBoost.OL.W), adaptive (AdaBoost.OL)
One Against All Options:
--oaa arg
One-against-all multiclass with <k> labels
--oaa_subsample arg
subsample this number of negative examples when learning
--probabilities
predict probabilites of all classes
--scores
output raw scores per class
Top K:
--top arg
top k recommendation
Experience Replay:
--replay_m arg
use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost
sensitive] with specified buffer size
--replay_m_count arg (=1)
how many times (in expectation) should each example be played (default: 1 = permuting)
Binary loss:
--binary
report loss as binary classification on -1,1
Bootstrap:
--bootstrap arg
k-way bootstrap by online importance resampling
--bs_type arg
prediction type {mean,vote}
scorer options:
--link arg (=identity)
Specify the link function: identity, logistic, glf1 or poisson
Stagewise polynomial options:
--stage_poly
use stagewise polynomial feature learning
--sched_exponent arg (=1)
exponent controlling quantity of included features
--batch_sz arg (=1000)
multiplier on batch size before including more features
--batch_sz_no_doubling
batch_sz does not double
Low Rank Quadratics FA:
--lrqfa arg
use low rank quadratic features with field aware weights
Low Rank Quadratics:
--lrq arg
use low rank quadratic features
--lrqdropout
use dropout training for low rank quadratic features
Autolink:
--autolink arg
create link function with polynomial d
Marginal:
--marginal arg
substitute marginal label estimates for ids
--initial_denominator arg (=1)
initial denominator
--initial_numerator arg (=0.5)
initial numerator
--compete
enable competition with marginal features
--update_before_learn arg (=0)
update marginal values before learning
--unweighted_marginals arg (=0)
ignore importance weights when computing marginals
--decay arg (=0)
decay multiplier per event (1e-3 for example)
Matrix Factorization Reduction:
--new_mf arg
rank for reduction-based matrix factorization
Neural Network:
--nn arg
Sigmoidal feedforward network with <k> hidden units
--inpass
Train or test sigmoidal feedforward network with input passthrough.
--multitask
Share hidden layer across all reduced tasks.
--dropout
Train or test sigmoidal feedforward network using dropout.
--meanfield
Train or test sigmoidal feedforward network using mean field.
Confidence:
--confidence
Get confidence for binary predictions
--confidence_after_training
Confidence after training
Active Learning with Cover:
--active_cover
enable active learning with cover
--mellowness arg (=8)
active learning mellowness parameter c_0. Default 8.
--alpha arg (=1)
active learning variance upper bound parameter alpha. Default 1.
--beta_scale arg (=3.1622777)
active learning variance upper bound parameter beta_scale. Default sqrt(10).
--cover arg (=12)
cover size. Default 12.
--oracular
Use Oracular-CAL style query or not. Default false.
Active Learning:
--active
enable active learning
--simulation
active learning simulation mode
--mellowness arg (=8)
active learning mellowness parameter c_0. Default 8
Experience Replay:
--replay_b arg
use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost
sensitive] with specified buffer size
--replay_b_count arg (=1)
how many times (in expectation) should each example be played (default: 1 = permuting)
Baseline options:
--baseline
Learn an additive baseline (from constant features) and a residual separately in regression.
--lr_multiplier arg
learning rate multiplier for baseline model
--global_only
use separate example with only global constant for baseline predictions
--check_enabled
only use baseline when the example contains enabled flag
OjaNewton options:
--OjaNewton
Online Newton with Oja's Sketch
--sketch_size arg (=10)
size of sketch
--epoch_size arg (=1)
size of epoch
--alpha arg (=1)
mutiplicative constant for indentiy
--alpha_inverse arg
one over alpha, similar to learning rate
--learning_rate_cnt arg (=2)
constant for the learning rate 1/t
--normalize arg (=1)
normalize the features or not
--random_init arg (=1)
randomize initialization of Oja or not
LBFGS and Conjugate Gradient options:
--conjugate_gradient
use conjugate gradient based optimization
--bfgs use bfgs optimization
--hessian_on
use second derivative in line search
--mem arg (=15)
memory in bfgs
--termination arg (=0.00100000005)
Termination threshold
Latent Dirichlet Allocation:
--lda arg
Run lda with <int> topics
--lda_alpha arg (=0.100000001)
Prior on sparsity of per-document topic weights
--lda_rho arg (=0.100000001)
Prior on sparsity of topic distributions
--lda_D arg (=10000)
Number of documents
--lda_epsilon arg (=0.00100000005)
Loop convergence threshold
--minibatch arg (=1)
Minibatch size, for LDA
--math-mode arg (=0)
Math mode: simd, accuracy, fast-approx
--metrics arg (=0)
Compute metrics
Noop Learner:
--noop do no learning
Print psuedolearner:
--print
print examples
Gradient Descent Matrix Factorization:
--rank arg
rank for matrix factorization.
Network sending:
--sendto arg
send examples to <host>
Stochastic Variance Reduced Gradient:
--svrg Streaming Stochastic Variance Reduced Gradient
--stage_size arg (=1)
Number of passes per SVRG stage
Follow the Regularized Leader:
--ftrl FTRL: Follow the Proximal Regularized Leader
--ftrl_alpha arg (=0.00499999989)
Learning rate for FTRL optimization
--ftrl_beta arg (=0.100000001)
FTRL beta parameter
--pistol
FTRL: Parameter-free Stochastic Learning
--ftrl_alpha arg (=1)
Learning rate for FTRL optimization
--ftrl_beta arg (=0.5)
FTRL beta parameter
Kernel SVM:
--ksvm kernel svm
--reprocess arg (=1)
number of reprocess steps for LASVM
--pool_greedy
use greedy selection on mini pools
--para_active
do parallel active learning
--pool_size arg (=1)
size of pools for active learning
--subsample arg (=1)
number of items to subsample from the pool
--kernel arg (=linear)
type of kernel (rbf or linear (default))
--bandwidth arg (=1)
bandwidth of rbf kernel
--degree arg (=2)
degree of poly kernel
--lambda arg
saving regularization for test time
Gradient Descent options:
--sgd use regular stochastic gradient descent update.
--adaptive
use adaptive, individual learning rates.
--adax use adaptive learning rates with x^2 instead of g^2x^2
--invariant
use safe/importance aware updates.
--normalized
use per feature normalized updates
--sparse_l2 arg (=0)
use per feature normalized updates
--l1_state arg (=0)
use per feature normalized updates
--l2_state arg (=1)
use per feature normalized updates
Input options:
-d [ --data ] arg
Example Set
--daemon
persistent daemon mode on port 26542
--foreground
in persistent daemon mode, do not run in the background
--port arg
port to listen on; use 0 to pick unused port
--num_children arg
number of children for persistent daemon mode
--pid_file arg
Write pid file in persistent daemon mode
--port_file arg
Write port used in persistent daemon mode
-c [ --cache ]
Use a cache. The default is <data>.cache
--cache_file arg
The location(s) of cache_file.
--json Enable JSON parsing.
--dsjson
Enable Decision Service JSON parsing.
-k [ --kill_cache ]
do not reuse existing cache: create a new one always
--compressed
use gzip format whenever possible. If a cache file is being created, this option creates a
compressed cache file. A mixture of raw-text & compressed inputs are supported with
autodetection.
--no_stdin
do not default to reading from stdin
vw 8.6.1 December 2020 VW(1)