Fast Neural Machine Translation in C++
Marian: Fast Neural Machine Translation in C++
Version: v1.11.3 b8bf086 2022-02-11 06:04:38 -0800
Usage: ./marian-decoder [OPTIONS]
-h,--help Print this help message and exit
--version Print the version number and exit
--authors Print list of authors and exit
--cite Print citation and exit
--build-info TEXT Print CMake build options and exit. Set to 'all' to print
advanced options
-c,--config VECTOR ... Configuration file(s). If multiple, later overrides earlier
-w,--workspace UINT=512 Preallocate arg MB of work space
--log TEXT Log training process information to file given by arg
--log-level TEXT=info Set verbosity level of logging: trace, debug, info, warn,
err(or), critical, off
--log-time-zone TEXT Set time zone for the date shown on logging
--quiet Suppress all logging to stderr. Logging to files still works
--quiet-translation Suppress logging for translation
--seed UINT Seed for all random number generators. 0 means initialize
randomly
--check-nan Check for NaNs or Infs in forward and backward pass. Will
abort when found. This is a diagnostic option that will
slow down computation significantly
--interpolate-env-vars allow the use of environment variables in paths, of the form
${VAR_NAME}
--relative-paths All paths are relative to the config file location
--dump-config TEXT Dump current (modified) configuration to stdout and exit.
Possible values: full, minimal, expand
-m,--models VECTOR ... Paths to model(s) to be loaded. Supported file extensions:
.npz, .bin
--model-mmap Use memory-mapping when loading model (CPU only)
--ignore-model-config Ignore the model configuration saved in npz file
--type TEXT=amun Model type: amun, nematus, s2s, multi-s2s, transformer
--dim-vocabs VECTOR=0,0 ... Maximum items in vocabulary ordered by rank, 0 uses all
items in the provided/created vocabulary file
--dim-emb INT=512 Size of embedding vector
--factors-dim-emb INT Embedding dimension of the factors. Only used if concat is
selected as factors combining form
--factors-combine TEXT=sum How to combine the factors and lemma embeddings. Options
available: sum, concat
--lemma-dependency TEXT Lemma dependency method to use when predicting target
factors. Options: soft-transformer-layer,
hard-transformer-layer, lemma-dependent-bias, re-embedding
--lemma-dim-emb INT=0 Re-embedding dimension of lemma in factors
--dim-rnn INT=1024 Size of rnn hidden state
--enc-type TEXT=bidirectional Type of encoder RNN : bidirectional, bi-unidirectional,
alternating (s2s)
--enc-cell TEXT=gru Type of RNN cell: gru, lstm, tanh (s2s)
--enc-cell-depth INT=1 Number of transitional cells in encoder layers (s2s)
--enc-depth INT=1 Number of encoder layers (s2s)
--dec-cell TEXT=gru Type of RNN cell: gru, lstm, tanh (s2s)
--dec-cell-base-depth INT=2 Number of transitional cells in first decoder layer (s2s)
--dec-cell-high-depth INT=1 Number of transitional cells in next decoder layers (s2s)
--dec-depth INT=1 Number of decoder layers (s2s)
--skip Use skip connections (s2s)
--layer-normalization Enable layer normalization
--right-left Train right-to-left model
--input-types VECTOR ... Provide type of input data if different than 'sequence'.
Possible values: sequence, class, alignment, weight. You
need to provide one type per input file (if --train-sets)
or per TSV field (if --tsv).
--best-deep Use Edinburgh deep RNN configuration (s2s)
--tied-embeddings Tie target embeddings and output embeddings in output layer
--tied-embeddings-src Tie source and target embeddings
--tied-embeddings-all Tie all embedding layers and output layer
--output-omit-bias Do not use a bias vector in decoder output layer
--transformer-heads INT=8 Number of heads in multi-head attention (transformer)
--transformer-no-projection Omit linear projection after multi-head attention
(transformer)
--transformer-pool Pool encoder states instead of using cross attention
(selects first encoder state, best used with special token)
--transformer-dim-ffn INT=2048 Size of position-wise feed-forward network (transformer)
--transformer-decoder-dim-ffn INT=0 Size of position-wise feed-forward network in decoder
(transformer). Uses --transformer-dim-ffn if 0.
--transformer-ffn-depth INT=2 Depth of filters (transformer)
--transformer-decoder-ffn-depth INT=0 Depth of filters in decoder (transformer). Uses
--transformer-ffn-depth if 0
--transformer-ffn-activation TEXT=swish
Activation between filters: swish or relu (transformer)
--transformer-dim-aan INT=2048 Size of position-wise feed-forward network in AAN
(transformer)
--transformer-aan-depth INT=2 Depth of filter for AAN (transformer)
--transformer-aan-activation TEXT=swish
Activation between filters in AAN: swish or relu (transformer)
--transformer-aan-nogate Omit gate in AAN (transformer)
--transformer-decoder-autoreg TEXT=self-attention
Type of autoregressive layer in transformer decoder:
self-attention, average-attention (transformer)
--transformer-tied-layers VECTOR ... List of tied decoder layers (transformer)
--transformer-guided-alignment-layer TEXT=last
Last or number of layer to use for guided alignment training
in transformer
--transformer-preprocess TEXT Operation before each transformer layer: d = dropout, a =
add, n = normalize
--transformer-postprocess-emb TEXT=d Operation after transformer embedding layer: d = dropout, a
= add, n = normalize
--transformer-postprocess TEXT=dan Operation after each transformer layer: d = dropout, a =
add, n = normalize
--transformer-postprocess-top TEXT Final operation after a full transformer stack: d = dropout,
a = add, n = normalize. The optional skip connection with
'a' by-passes the entire stack.
--transformer-train-position-embeddings
Train positional embeddings instead of using static
sinusoidal embeddings
--transformer-depth-scaling Scale down weight initialization in transformer layers by 1
/ sqrt(depth)
--bert-mask-symbol TEXT=[MASK] Masking symbol for BERT masked-LM training
--bert-sep-symbol TEXT=[SEP] Sentence separator symbol for BERT next sentence prediction
training
--bert-class-symbol TEXT=[CLS] Class symbol BERT classifier training
--bert-masking-fraction FLOAT=0.15 Fraction of masked out tokens during training
--bert-train-type-embeddings=true Train bert type embeddings, set to false to use static
sinusoidal embeddings
--bert-type-vocab-size INT=2 Size of BERT type vocab (sentence A and B)
-i,--input VECTOR=stdin ... Paths to input file(s), stdin by default
-o,--output TEXT=stdout Path to output file, stdout by default
-v,--vocabs VECTOR ... Paths to vocabulary files have to correspond to --input
-b,--beam-size UINT=12 Beam size used during search with validating translator
-n,--normalize FLOAT=0 Divide translation score by pow(translation length, arg)
--max-length-factor FLOAT=3 Maximum target length as source length times factor
--word-penalty FLOAT Subtract (arg * translation length) from translation score
--allow-unk Allow unknown words to appear in output
--allow-special Allow special symbols to appear in output, e.g. for
SentencePiece with byte-fallback do not suppress the
newline symbol
--n-best Generate n-best list
--alignment TEXT Return word alignment. Possible values: 0.0-1.0, hard, soft
--word-scores Print word-level scores. One score per subword unit, not
normalized even if --normalize
--stat-freq TEXT=0 Display speed information every arg mini-batches. Disabled
by default with 0, set to value larger than 0 to activate
--no-spm-decode Keep the output segmented into SentencePiece subwords
--max-length UINT=1000 Maximum length of a sentence in a training sentence pair
--max-length-crop Crop a sentence to max-length instead of omitting it if
longer than max-length
--tsv Tab-separated input
--tsv-fields UINT Number of fields in the TSV input. By default, it is guessed
based on the model type
-d,--devices VECTOR=0 ... Specifies GPU ID(s) to use for training. Defaults to
0..num-devices-1
--num-devices UINT Number of GPUs to use for this process. Defaults to
length(devices) or 1
--cpu-threads UINT=0 Use CPU-based computation with this many independent
threads, 0 means GPU-based computation
--mini-batch INT=1 Size of mini-batch used during batched translation
--mini-batch-words INT Set mini-batch size based on words instead of sentences
--maxi-batch INT=1 Number of batches to preload for length-based sorting
--maxi-batch-sort TEXT=none Sorting strategy for maxi-batch: none, src, trg (not
available for decoder)
--data-threads UINT=8 Number of concurrent threads to use during data reading and
processing
--fp16 Shortcut for mixed precision inference with float16,
corresponds to: --precision float16
--precision VECTOR=float32 ... Mixed precision for inference, set parameter type in
expression graph
--skip-cost Ignore model cost during translation, not recommended for
beam-size > 1
--shortlist VECTOR ... Use softmax shortlist: path first best prune
--weights VECTOR ... Scorer weights
--output-sampling VECTOR ... Noise output layer with gumbel noise. Implicit default is
'full' for sampling from full distribution. Also accepts
'topk num' (e.g. topk 100) for top-100 sampling.
--output-approx-knn VECTOR ... Use approximate knn search in output layer (currently only
in transformer)
--optimize=false Optimize the graph on-the-fly
-g,--gemm-type TEXT=float32 GEMM Type to be used for on-line quantization/packing:
float32, packed16, packed8
--quantize-range FLOAT=0 Range for the on-line quantiziation of weight matrix in
multiple of this range and standard deviation, 0.0 means
min/max quantization