📄️ How can I create custom on-the-fly data transforms?
The code is easily extendable with custom transforms inheriting from the Transform base class.
📄️ Do you support multi-gpu?
First you need to make sure you export CUDAVISIBLEDEVICES=0,1,2,3.
📄️ How can I ensemble Models at inference?
You can specify several models in the onmttranslate command line: -model model1seed1 model2_seed2
📄️ How to use gradient checkpointing when dealing with a big model ?
* use_ckpting: ["ffn", "mha", "lora"]
📄️ How to use LoRa and 8bit loading to finetune a big model ?
Cf paper: LoRa
📄️ How to switch from OpenNMT-py to EOLE?
Configuration conversion
📄️ Performance tips
* use fp16
📄️ Position encoding: Absolute vs Relative vs Rotary Embeddings vs Alibi
The basic feature is absolute position encoding stemming from the original Transformer Paper.
📄️ Compute dtype (precision) and storage dtype
Various compute precisions are supported. Below is a quick recap of the current cases.
📄️ How do I use Pretrained embeddings (e.g. GloVe)?
This is handled in the initial steps of the onmt_train execution.
📄️ What special tokens are used?
There are 4 main special tokens:
📄️ How can I apply on-the-fly tokenization and subword regularization when training?
This is part of the transforms paradigm, which allows to apply various processing to inputs before constituting batches to train models on (or predict). transforms basically is a list of functions that will be applied sequentially to the examples when read from file (or input list).
📄️ How can I update a checkpoint's vocabulary?
New vocabulary can be used to continue training from a checkpoint. Existing vocabulary embeddings will be mapped to the new vocabulary, and new vocabulary tokens will be initialized as usual.
📄️ Can I get word alignments while translating?
Raw alignments from averaging Transformer attention heads