Skip to main content

Framework

Model

class eole.models.model.VisionEncoderDecoderModel(**kwargs)

Bases: BaseModel

VisionEncoderDecoderModel Class See BaseModel for options.

classmethod build_blocks(model_config, vocabs, running_config=None)

Where the blocks (encoder/decoder/etc) are actually instantiated, depending on the actual subclass.

build_hunyuan_position_ids(src, image_locations, image_sizes)

src: [B, L] image_locations: bool mask of same shape as src image_sizes: [num_images, 2] with (height_px, width_px)

This part is specific to HunyunOCR for several reasons:

  • it detects continuous sequences of image_token, so does not work with llava (IMG_BREAK)
  • it supposed one image_token as new line sep (W+1)
  • it adds 2 image_token at the end: H * (W+1) + 2

build_qwen_vl_position_ids(src, image_locations, image_sizes)

Build mRoPE position IDs (3 sections: temporal, height, width) for Qwen3 VL / Qwen3.5 VL.

Follows the HuggingFace get_rope_index logic for mrope_section = [t, h, w]:

  • Text tokens at sequential position p: (p, p, p)
  • Image tokens in a H×W merged-patch grid starting at position p:
  • temporal: p (constant — still images have a single frame)
  • height: row + p (row ∈ 0..H-1, each row repeated W times)
  • width: col + p (col ∈ 0..W-1, repeated H times)
  • After an image block, the position counter advances by max(H, W) (not H*W).
  • Parameters:
    • src – (B, L) token id tensor
    • image_locations – bool mask of same shape as src (True for image_pad tokens)
    • image_sizes – (N_images, 2) tensor with (height_px, width_px) per image
  • Returns: position_ids of shape (B, L, 3)

forward(src, tgt, src_len, with_align=False, **kwargs)

A DecoderModel forward the src side to the decoder along with the source lengths vector. It is a decoder only LM (cf GPT-2)

class eole.models.model.AudioEncoderDecoderModel(**kwargs)

Bases: BaseModel

AudioEncoderDecoderModel for Whisper-style speech-to-text models.

Audio encoder (no src_emb) + text decoder with cross-attention. See BaseModel for options.

classmethod build_blocks(model_config, vocabs, running_config=None)

Where the blocks (encoder/decoder/etc) are actually instantiated, depending on the actual subclass.

forward(src, tgt, src_len, with_align=False, **kwargs)

Forward pass: encode mel features, decode with cross-attention.

Trainer

class eole.trainer.Trainer(model, train_loss, valid_loss, scoring_preparator, valid_scorers, optim, config: TrainerConfig, report_manager=None, model_saver=None, earlystopper=None)

Bases: object

Refactored trainer with improved separation of concerns.

train(train_iter, train_steps: int, save_checkpoint_steps: int = 5000, valid_iter=None, valid_steps: int = 10000)

Main training loop.

validate(valid_iter, moving_average=None)

Validate model.

class eole.utils.Statistics(loss=0, auxloss=0, n_batchs=0, n_sents=0, n_tokens=0, n_correct=0, computed_metrics=None, data_stats=None, attention_entropy=0, n_attention_samples=0)

Bases: object

Accumulator for loss statistics. Currently calculates:

  • accuracy
  • perplexity
  • elapsed time

accuracy()

compute accuracy

static all_gather_stats(stat, max_size=4096)

Gather a Statistics object accross multiple process/nodes

  • Parameters:
    • stat**(** – obj:Statistics): the statistics object to gather accross all processes/nodes
    • max_size (int) – max buffer size to use
  • Returns: Statistics, the update stats object

static all_gather_stats_list(stat_list, max_size=4096)

Gather a Statistics list accross all processes/nodes

  • Parameters:
    • stat_list (list([Statistics])) – list of statistics objects to gather accross all processes/nodes
    • max_size (int) – max buffer size to use
  • Returns: list of updated stats
  • Return type: our_stats(list([Statistics]))

avg_attention_entropy()

compute average attention entropy

computed_metric(metric)

check if metric(TER/BLEU) is computed and return it

elapsed_time()

compute elapsed time

log_tensorboard(prefix, writer, learning_rate, patience, step)

display statistics to tensorboard

output(step, num_steps, learning_rate, start)

Write out statistics to stdout.

  • Parameters:
    • step (int) – current step
    • n_batch (int) – total batches
    • start (int) – start time of step.

ppl()

compute perplexity

update(stat, update_n_src_tokens=False)

Update statistics by suming values with another Statistics object

  • Parameters:
    • stat – another statistic object
    • update_n_src_tokens (bool) – whether to update (sum) n_src_tokens or not

xent()

compute cross entropy

Loss

class eole.utils.loss.LossCompute(criterion, generator, lambda_coverage=0.0, lambda_align=0.0, tgt_shift_index=1, vocabs=None, lm_generator=None, lm_prior_lambda=None, lm_prior_tau=None, lm_prior_model=None)

Bases: Module

Class for managing efficient loss computation. Handles accumulating multiple loss computations.

  • Parameters:
    • criterion (nn. loss function) – NLLoss or customed loss
    • generator (nn.Module)
    • lambda_coverage – Hyper-param to apply coverage attention if any
    • lambda_align – Hyper-param for alignment loss
    • tgt_shift_index (int) – 1 for NMT, 0 for LM
    • vocabs – full vocabs with specials module that maps the output of the decoder to a distribution over the target vocabulary.
    • lm_generator (ctranslate2.Generator) – LM Generator
    • lm_prior_lambda (float) – weight of LM model in loss
    • lm_prior_tau (float) – scaler for LM loss

forward(batch, output, attns, estim=None)

Compute the forward loss

  • Parameters:
    • batch (batch) – batch of labeled examples
    • output (FloatTensor) – output of decoder model (batch, tgt_len, hidden)
    • attns (dict) – dictionary of attention weights (batch, tgt_len, src_len)
  • Returns: A tuple with the loss and a eole.utils.Statistics instance.

classmethod from_config(config, model, vocabs, train=True)

Returns a subclass which wraps around an nn.Module subclass (such as nn.NLLLoss) which defines the loss criterion. The LossCompute object passes relevant data to a Statistics object which handles training/validation logging. The Criterion and LossCompute options are triggered by opt settings.

ignore_prompt(batch)

Mask the prompt in the target side of the batch examples in order : to set the loss of the prompt to zero.

For finetuning on specific tasks. The end of the prompt must be indicated by the DefaultTokens.MASK_BEFORE

placeholder.

The masks are supposed to be properly handled by the loss criterion : (e.g. nn.CrossEntropyLoss ).

  • Parameters: batch – The current batch.

Optimizer

class eole.utils.Optimizer(optimizer: TorchOptimizer, learning_rate: float, learning_rate_decay_fn: Callable[[int], float] | None = None, max_grad_norm: float | None = None, use_amp: bool = True)

Bases: object

Optimizer wrapper with learning rate scheduling and gradient scaling.

Wraps a torch.optim.Optimizer with additional functionality:

  • Learning rate scheduling
  • Gradient clipping
  • Automatic mixed precision (AMP) support with gradient scaling
  • Parameters:
    • optimizer – A torch.optim.Optimizer instance.
    • learning_rate – The initial learning rate.
    • learning_rate_decay_fn – Optional callable for LR scheduling.
    • max_grad_norm – Clip gradients to this global norm (0 = no clipping).
    • use_amp – Whether to use automatic mixed precision.

property amp : bool

Whether using automatic mixed precision.

backward(loss: Tensor) → None

Perform backward pass with optional gradient scaling.

  • Parameters: loss – The loss tensor to backpropagate.

classmethod from_config(model: Module, config: Any, metadata: dict | None = None) → Optimizer

Build optimizer from configuration.

  • Parameters:
    • model – The model to optimize.
    • config – The configuration object.
    • metadata – Optional checkpoint metadata to load states from.
  • Returns: An Optimizer instance.

learning_rate(step: int | None = None) → float

Calculate current learning rate.

  • Parameters: step – Step to calculate LR for (defaults to current decay_step).
  • Returns: The learning rate value.

load_state_dict(state_dict: dict[str, Any]) → None

Load optimizer state from checkpoint.

state_dict() → dict[str, Any]

Get optimizer state for checkpointing.

step() → None

Update model parameters based on gradients.

Handles learning rate updates, gradient clipping, and AMP scaling.

property training_step : int

The current training step.

zero_grad(set_to_none: bool = True) → None

Zero the gradients of optimized parameters.

  • Parameters: set_to_none – Set gradients to None instead of zero for memory efficiency.

Inference Engine

class eole.inference_engine.InferenceEngine(config)

Bases: object

Wrapper Class to run Inference.

  • Parameters: config – inference options

infer_file() → Tuple[List[List[float]], List[List[float]] | None, List[List[str]]]

File inference. Source file must be the config.src argument.

infer_file_parallel(settings: Dict[str, Any] | None = None)

File inference in multiprocessing with partitioned models.

infer_list(src: List[str], settings: Dict[str, Any] | None = None) → Tuple[List[List[float]], List[List[float]] | None, List[List[str]]]

List of strings inference.

infer_list_parallel(src: List[str], settings: Dict[str, Any] | None = None)

List inference in multiprocessing with partitioned models.

infer_list_stream(src: str, settings: Dict[str, Any] | None = None)

Stream inference results for a single input string.

This is a generator that yields decoded text chunks as they are produced by the model, enabling a chatbot-style streaming interface instead of waiting for the full response.

Only supported for decoder-only (LM) models (GeneratorLM). Streaming with encoder-decoder or encoder-only models falls back to returning the complete prediction as a single chunk at the end.

  • Parameters:
    • src (str) – Single input string to run inference on.
    • settings (dict , optional) – Override inference settings (e.g. temperature, max_length).
  • Yields: str – Decoded text chunks, one per generated token (or slightly larger chunks when the detokenizer defers output to avoid partial multi-byte / subword-piece artefacts).
  • Raises: NotImplementedError – If world_size > 1 (parallel mode).

Example:

engine = InferenceEnginePY(config)
for chunk in engine.infer_list_stream("Tell me a joke"):
print(chunk, end="", flush=True)

predict_batch(batch)

Predict a single batch. To be implemented by subclasses.

score_file(settings: Dict[str, Any] | None = None)

File scoring. Source file must be the config.src argument.

score_file_parallel(settings: Dict[str, Any] | None = None)

File scoring in parallel. To be implemented by subclasses.

score_list(src: List[str], settings: Dict[str, Any] | None = None)

List of strings scoring.

score_list_parallel(src: List[str], settings: Dict[str, Any] | None = None)

List scoring in parallel. To be implemented by subclasses.

terminate()

Terminate the inference engine and cleanup resources.

class eole.inference_engine.InferenceEnginePY(config)

Bases: InferenceEngine

Inference engine subclass to run inference with predict.py.

  • Parameters: config – inference options

infer_file_parallel(settings: Dict[str, Any] | None = None)

Infer from file in parallel.

infer_list_parallel(src: List[str], settings: Dict[str, Any] | None = None)

Infer from list in parallel.

infer_list_stream(src: str, settings: Dict[str, Any] | None = None)

Stream inference results for a single input string.

Runs inference in a background thread and yields decoded text chunks as they are produced token by token. This is the recommended API for interactive / chatbot-style use cases.

Only supported for single-process mode (world_size <= 1) and decoder-only (LM) models. Encoder-decoder models are not supported for streaming.

  • Parameters:
    • src (str) – A single input string.
    • settings (dict , optional) – Override inference settings such as temperature, max_length, top_k, top_p.
  • Yields: str – Decoded text chunks produced by the model.
  • Raises: NotImplementedError – If called when world_size > 1.

Example:

engine = InferenceEnginePY(config)
for chunk in engine.infer_list_stream("Tell me a joke"):
print(chunk, end="", flush=True)
print()

score_file_parallel(settings: Dict[str, Any] | None = None)

Score a file in parallel.

score_list_parallel(src: List[str], settings: Dict[str, Any] | None = None)

Score a list of strings in parallel.

terminate()

Terminate all worker processes.

class eole.inference_engine.InferenceEngineCT2(config, model_type=None)

Bases: InferenceEngine

Inference engine subclass to run inference with ctranslate2.

  • Parameters:
    • config – inference options
    • model_type – Type of model (DECODER or ENCODER_DECODER)

property ct2_model_path : str

Get the ctranslate2 model path.

predict_batch(batch) → Tuple

Predict a single batch using CT2.