Framework
Model
class eole.models.model.VisionEncoderDecoderModel(**kwargs)
Bases: BaseModel
VisionEncoderDecoderModel Class
See BaseModel for options.
classmethod build_blocks(model_config, vocabs, running_config=None)
Where the blocks (encoder/decoder/etc) are actually instantiated, depending on the actual subclass.
build_hunyuan_position_ids(src, image_locations, image_sizes)
src: [B, L] image_locations: bool mask of same shape as src image_sizes: [num_images, 2] with (height_px, width_px)
This part is specific to HunyunOCR for several reasons:
- it detects continuous sequences of image_token, so does not work with llava (IMG_BREAK)
- it supposed one image_token as new line sep (W+1)
- it adds 2 image_token at the end: H * (W+1) + 2
build_qwen_vl_position_ids(src, image_locations, image_sizes)
Build mRoPE position IDs (3 sections: temporal, height, width) for Qwen3 VL / Qwen3.5 VL.
Follows the HuggingFace get_rope_index logic for mrope_section = [t, h, w]:
- Text tokens at sequential position
p:(p, p, p) - Image tokens in a H×W merged-patch grid starting at position
p:
- temporal:
p(constant — still images have a single frame)- height:
row + p(row ∈ 0..H-1, each row repeated W times)- width:
col + p(col ∈ 0..W-1, repeated H times)
- After an image block, the position counter advances by
max(H, W)(not H*W).
- Parameters:
- src – (B, L) token id tensor
- image_locations – bool mask of same shape as src (True for image_pad tokens)
- image_sizes – (N_images, 2) tensor with (height_px, width_px) per image
- Returns: position_ids of shape (B, L, 3)
forward(src, tgt, src_len, with_align=False, **kwargs)
A DecoderModel forward the src side to the decoder along with the source lengths vector. It is a decoder only LM (cf GPT-2)
class eole.models.model.AudioEncoderDecoderModel(**kwargs)
Bases: BaseModel
AudioEncoderDecoderModel for Whisper-style speech-to-text models.
Audio encoder (no src_emb) + text decoder with cross-attention.
See BaseModel for options.
classmethod build_blocks(model_config, vocabs, running_config=None)
Where the blocks (encoder/decoder/etc) are actually instantiated, depending on the actual subclass.
forward(src, tgt, src_len, with_align=False, **kwargs)
Forward pass: encode mel features, decode with cross-attention.
Trainer
class eole.trainer.Trainer(model, train_loss, valid_loss, scoring_preparator, valid_scorers, optim, config: TrainerConfig, report_manager=None, model_saver=None, earlystopper=None)
Bases: object
Refactored trainer with improved separation of concerns.
train(train_iter, train_steps: int, save_checkpoint_steps: int = 5000, valid_iter=None, valid_steps: int = 10000)
Main training loop.
validate(valid_iter, moving_average=None)
Validate model.
class eole.utils.Statistics(loss=0, auxloss=0, n_batchs=0, n_sents=0, n_tokens=0, n_correct=0, computed_metrics=None, data_stats=None, attention_entropy=0, n_attention_samples=0)
Bases: object
Accumulator for loss statistics. Currently calculates:
- accuracy
- perplexity
- elapsed time
accuracy()
compute accuracy
static all_gather_stats(stat, max_size=4096)
Gather a Statistics object accross multiple process/nodes
- Parameters:
- stat**(** – obj:Statistics): the statistics object to gather accross all processes/nodes
- max_size (int) – max buffer size to use
- Returns: Statistics, the update stats object
static all_gather_stats_list(stat_list, max_size=4096)
Gather a Statistics list accross all processes/nodes
- Parameters:
- stat_list (list([Statistics])) – list of statistics objects to gather accross all processes/nodes
- max_size (int) – max buffer size to use
- Returns: list of updated stats
- Return type: our_stats(list([Statistics]))
avg_attention_entropy()
compute average attention entropy
computed_metric(metric)
check if metric(TER/BLEU) is computed and return it
elapsed_time()
compute elapsed time
log_tensorboard(prefix, writer, learning_rate, patience, step)
display statistics to tensorboard
output(step, num_steps, learning_rate, start)
Write out statistics to stdout.
- Parameters:
- step (int) – current step
- n_batch (int) – total batches
- start (int) – start time of step.
ppl()
compute perplexity
update(stat, update_n_src_tokens=False)
Update statistics by suming values with another Statistics object
- Parameters:
- stat – another statistic object
- update_n_src_tokens (bool) – whether to update (sum) n_src_tokens or not
xent()
compute cross entropy
Loss
class eole.utils.loss.LossCompute(criterion, generator, lambda_coverage=0.0, lambda_align=0.0, tgt_shift_index=1, vocabs=None, lm_generator=None, lm_prior_lambda=None, lm_prior_tau=None, lm_prior_model=None)
Bases: Module
Class for managing efficient loss computation. Handles accumulating multiple loss computations.
- Parameters:
- criterion (
nn. loss function) – NLLoss or customed loss - generator (
nn.Module) - lambda_coverage – Hyper-param to apply coverage attention if any
- lambda_align – Hyper-param for alignment loss
- tgt_shift_index (int) – 1 for NMT, 0 for LM
- vocabs – full vocabs with specials module that maps the output of the decoder to a distribution over the target vocabulary.
- lm_generator (
ctranslate2.Generator) – LM Generator - lm_prior_lambda (float) – weight of LM model in loss
- lm_prior_tau (float) – scaler for LM loss
- criterion (
forward(batch, output, attns, estim=None)
Compute the forward loss
- Parameters:
- batch (batch) – batch of labeled examples
- output (
FloatTensor) – output of decoder model(batch, tgt_len, hidden) - attns (dict) – dictionary of attention weights
(batch, tgt_len, src_len)
- Returns:
A tuple with the loss and a
eole.utils.Statisticsinstance.
classmethod from_config(config, model, vocabs, train=True)
Returns a subclass which wraps around an nn.Module subclass (such as nn.NLLLoss) which defines the loss criterion. The LossCompute object passes relevant data to a Statistics object which handles training/validation logging. The Criterion and LossCompute options are triggered by opt settings.
ignore_prompt(batch)
Mask the prompt in the target side of the batch examples in order : to set the loss of the prompt to zero.
For finetuning on specific tasks. The end of the prompt must be indicated by the DefaultTokens.MASK_BEFORE
placeholder.
The masks are supposed to be properly handled by the loss criterion : (e.g. nn.CrossEntropyLoss ).
- Parameters: batch – The current batch.
Optimizer
class eole.utils.Optimizer(optimizer: TorchOptimizer, learning_rate: float, learning_rate_decay_fn: Callable[[int], float] | None = None, max_grad_norm: float | None = None, use_amp: bool = True)
Bases: object
Optimizer wrapper with learning rate scheduling and gradient scaling.
Wraps a torch.optim.Optimizer with additional functionality:
- Learning rate scheduling
- Gradient clipping
- Automatic mixed precision (AMP) support with gradient scaling
- Parameters:
- optimizer – A torch.optim.Optimizer instance.
- learning_rate – The initial learning rate.
- learning_rate_decay_fn – Optional callable for LR scheduling.
- max_grad_norm – Clip gradients to this global norm (0 = no clipping).
- use_amp – Whether to use automatic mixed precision.
property amp : bool
Whether using automatic mixed precision.
backward(loss: Tensor) → None
Perform backward pass with optional gradient scaling.
- Parameters: loss – The loss tensor to backpropagate.
classmethod from_config(model: Module, config: Any, metadata: dict | None = None) → Optimizer
Build optimizer from configuration.
- Parameters:
- model – The model to optimize.
- config – The configuration object.
- metadata – Optional checkpoint metadata to load states from.
- Returns: An Optimizer instance.
learning_rate(step: int | None = None) → float
Calculate current learning rate.
- Parameters: step – Step to calculate LR for (defaults to current decay_step).
- Returns: The learning rate value.
load_state_dict(state_dict: dict[str, Any]) → None
Load optimizer state from checkpoint.
state_dict() → dict[str, Any]
Get optimizer state for checkpointing.
step() → None
Update model parameters based on gradients.
Handles learning rate updates, gradient clipping, and AMP scaling.
property training_step : int
The current training step.
zero_grad(set_to_none: bool = True) → None
Zero the gradients of optimized parameters.
- Parameters: set_to_none – Set gradients to None instead of zero for memory efficiency.
Inference Engine
class eole.inference_engine.InferenceEngine(config)
Bases: object
Wrapper Class to run Inference.
- Parameters: config – inference options
infer_file() → Tuple[List[List[float]], List[List[float]] | None, List[List[str]]]
File inference. Source file must be the config.src argument.
infer_file_parallel(settings: Dict[str, Any] | None = None)
File inference in multiprocessing with partitioned models.
infer_list(src: List[str], settings: Dict[str, Any] | None = None) → Tuple[List[List[float]], List[List[float]] | None, List[List[str]]]
List of strings inference.
infer_list_parallel(src: List[str], settings: Dict[str, Any] | None = None)
List inference in multiprocessing with partitioned models.
infer_list_stream(src: str, settings: Dict[str, Any] | None = None)
Stream inference results for a single input string.
This is a generator that yields decoded text chunks as they are produced by the model, enabling a chatbot-style streaming interface instead of waiting for the full response.
Only supported for decoder-only (LM) models (GeneratorLM).
Streaming with encoder-decoder or encoder-only models falls back to
returning the complete prediction as a single chunk at the end.
- Parameters:
- src (str) – Single input string to run inference on.
- settings (dict , optional) – Override inference settings
(e.g.
temperature,max_length).
- Yields: str – Decoded text chunks, one per generated token (or slightly larger chunks when the detokenizer defers output to avoid partial multi-byte / subword-piece artefacts).
- Raises: NotImplementedError – If world_size > 1 (parallel mode).
Example:
engine = InferenceEnginePY(config)
for chunk in engine.infer_list_stream("Tell me a joke"):
print(chunk, end="", flush=True)
predict_batch(batch)
Predict a single batch. To be implemented by subclasses.
score_file(settings: Dict[str, Any] | None = None)
File scoring. Source file must be the config.src argument.
score_file_parallel(settings: Dict[str, Any] | None = None)
File scoring in parallel. To be implemented by subclasses.
score_list(src: List[str], settings: Dict[str, Any] | None = None)
List of strings scoring.
score_list_parallel(src: List[str], settings: Dict[str, Any] | None = None)
List scoring in parallel. To be implemented by subclasses.
terminate()
Terminate the inference engine and cleanup resources.
class eole.inference_engine.InferenceEnginePY(config)
Bases: InferenceEngine
Inference engine subclass to run inference with predict.py.
- Parameters: config – inference options
infer_file_parallel(settings: Dict[str, Any] | None = None)
Infer from file in parallel.
infer_list_parallel(src: List[str], settings: Dict[str, Any] | None = None)
Infer from list in parallel.
infer_list_stream(src: str, settings: Dict[str, Any] | None = None)
Stream inference results for a single input string.
Runs inference in a background thread and yields decoded text chunks as they are produced token by token. This is the recommended API for interactive / chatbot-style use cases.
Only supported for single-process mode (world_size <= 1) and
decoder-only (LM) models. Encoder-decoder models are not supported
for streaming.
- Parameters:
- src (str) – A single input string.
- settings (dict , optional) – Override inference settings such as
temperature,max_length,top_k,top_p.
- Yields: str – Decoded text chunks produced by the model.
- Raises:
NotImplementedError – If called when
world_size > 1.
Example:
engine = InferenceEnginePY(config)
for chunk in engine.infer_list_stream("Tell me a joke"):
print(chunk, end="", flush=True)
print()
score_file_parallel(settings: Dict[str, Any] | None = None)
Score a file in parallel.
score_list_parallel(src: List[str], settings: Dict[str, Any] | None = None)
Score a list of strings in parallel.
terminate()
Terminate all worker processes.
class eole.inference_engine.InferenceEngineCT2(config, model_type=None)
Bases: InferenceEngine
Inference engine subclass to run inference with ctranslate2.
- Parameters:
- config – inference options
- model_type – Type of model (DECODER or ENCODER_DECODER)
property ct2_model_path : str
Get the ctranslate2 model path.
predict_batch(batch) → Tuple
Predict a single batch using CT2.