Modules
Embeddings
class eole.modules.transformer_mlp.MLP(model_config, running_config=None, moe_transformer_ff=None)
Bases: Module
A two/three-layer Feed-Forward-Network.
- Parameters:
- model_config – eole.config.models.ModelConfig object
- running_config – TrainingConfig or InferenceConfig derived from RunningConfig
gated_forward(x)
Layer definition. Legacy Gated operations. No fusion.
- Parameters:
x –
(batch_size, input_len, model_dim) - Returns:
Output
(batch_size, input_len, model_dim). - Return type: (FloatTensor)
simple_forward(x)
Layer definition.
- Parameters:
x –
(batch_size, input_len, model_dim) - Returns:
Output
(batch_size, input_len, model_dim). - Return type: (FloatTensor)
Encoders
class eole.encoders.encoder.EncoderBase(*args: Any, **kwargs: Any)
Bases: Module, ABC
Abstract base encoder class defining the interface for all encoders.
Used by: : - eole.Models.EncoderDecoderModel
- eole.Models.EncoderModel
abstractmethod forward(emb: Tensor | list, pad_mask: Tensor | None = None, **kwargs) → Tuple[Tensor, Any | None]
Encode input embeddings or images.
- Parameters:
- emb – Input embeddings (batch, src_len, dim) for text encoders, or list of images for vision encoders
- pad_mask – Padding mask (batch, src_len) for text encoders. False for actual values, True for padding. May be None for vision encoders.
- **kwargs – Additional encoder-specific arguments
- Returns:
- enc_out: Encoder output for attention (batch, src_len, hidden_size)
- enc_final_hs: Final hidden state or None For RNN: (num_layers * directions, batch, hidden_size) For LSTM: tuple of (hidden, cell) For Transformer/CNN/Vision: None
- Return type: Tuple containing
update_dropout(dropout: float, attention_dropout: float | None = None) → None
Update dropout rates dynamically.
- Parameters:
- dropout – General dropout rate
- attention_dropout – Attention-specific dropout rate (if applicable)
class eole.encoders.TransformerEncoder(encoder_config, running_config=None)
Bases: EncoderBase
Transformer encoder from ‘Attention is All You Need’.
Reference: : Vaswani et al. (2017) https://arxiv.org/abs/1706.03762
- Parameters:
- encoder_config – Complete encoder configuration
- running_config – Runtime configuration (optional)
forward(emb: Tensor, pad_mask: Tensor | None = None, **kwargs) → Tuple[Tensor, None]
Encode input embeddings.
- Parameters:
- emb – Input embeddings with positional encodings Shape: (batch_size, src_len, model_dim)
- pad_mask – Padding mask (batch, src_len) False for values, True for padding
- **kwargs – Additional arguments (ignored)
- Returns:
- Encoded output (batch_size, src_len, model_dim)
- None (transformers don’t return final state)
- Return type: Tuple of
- Raises: ValueError – If pad_mask is not provided
update_dropout(dropout: float, attention_dropout: float) → None
Update dropout rates for all transformer layers.
class eole.encoders.RNNEncoder(encoder_config, running_config=None)
Bases: EncoderBase
Generic recurrent neural network encoder supporting LSTM, GRU, and RNN.
- Parameters:
- encoder_config – Encoder configuration
- running_config – Runtime configuration (optional)
forward(emb: Tensor, pad_mask: Tensor | None = None, **kwargs) → Tuple[Tensor, Tensor | Tuple[Tensor, Tensor]]
Encode input embeddings through RNN.
- Parameters:
- emb – Input embeddings (batch, src_len, dim)
- pad_mask – Padding mask (optional, not used by base RNN)
- **kwargs – Additional arguments
- Returns:
- RNN outputs (batch, src_len, hidden_size)
- Final hidden state(s)
- Return type: Tuple of
update_dropout(dropout: float, attention_dropout: float | None = None) → None
Update RNN dropout rate.
class eole.encoders.CNNEncoder(encoder_config, running_config=None)
Bases: EncoderBase
Convolutional sequence-to-sequence encoder.
Based on “Convolutional Sequence to Sequence Learning” (Gehring et al., 2017) Reference: https://arxiv.org/abs/1705.03122
- Parameters:
- encoder_config – Encoder configuration
- running_config – Runtime configuration (optional)
forward(emb: Tensor, pad_mask: Tensor | None = None, **kwargs) → Tuple[Tensor, Tensor]
Encode input embeddings through CNN layers.
- Parameters:
- emb – Input embeddings (batch, src_len, dim)
- pad_mask – Padding mask (optional, not used)
- **kwargs – Additional arguments
- Returns:
- CNN output (batch, src_len, hidden_size)
- Projected embeddings (batch, src_len, hidden_size)
- Return type: Tuple of
update_dropout(dropout: float, attention_dropout: float | None = None) → None
Update CNN dropout rate.
class eole.encoders.MeanEncoder(encoder_config, running_config=None)
Bases: EncoderBase
Minimal encoder that applies mean pooling over the sequence.
Returns the input embeddings unchanged as encoder output, and provides mean-pooled representations as the final hidden state.
- Parameters:
- encoder_config – Encoder configuration
- running_config – Runtime configuration (optional, unused)
forward(emb: Tensor, pad_mask: Tensor | None = None, **kwargs) → Tuple[Tensor, Tuple[Tensor, Tensor]]
Apply mean pooling over sequence dimension.
- Parameters:
- emb – Input embeddings (batch, seq_len, emb_dim)
- pad_mask – Padding mask (batch, seq_len) False for values, True for padding
- **kwargs – Additional arguments (ignored)
- Returns:
- Encoder output: unchanged input embeddings (batch, seq_len, emb_dim)
- Final hidden state: tuple of (mean, mean) where mean has shape (num_layers, batch, emb_dim)
- Return type: Tuple of
class eole.encoders.VisionEncoder(encoder_config, running_config=None)
Bases: EncoderBase
Vision encoder for processing images into token representations.
Supports various vision architectures:
- CLIP-style with learned positional embeddings
- Pixtral with RoPE 2D embeddings
- SAM (Segment Anything Model) preprocessing
- Parameters:
- encoder_config – Vision encoder configuration
- running_config – Runtime configuration (optional)
forward(emb: List[Tensor], pad_mask: Tensor | None = None, sam_patches: Tensor | None = None, **kwargs) → Tuple[Tensor, None]
Encode images into token representations.
- Parameters:
- emb – List of N images of variable sizes, each (C, H, W)
- pad_mask – Not used for vision encoder (uses block diagonal masks)
- sam_patches – Pre-computed SAM patches (optional)
- **kwargs – Additional arguments
- Returns:
- Encoded image features (N_img, total_tokens, hidden_size)
- None (vision encoders don’t return hidden states)
- Return type: Tuple of
update_dropout(dropout: float, attention_dropout: float) → None
Update dropout rates for all transformer layers.
class eole.encoders.AudioEncoder(encoder_config, running_config=None)
Bases: EncoderBase
Audio encoder: Conv1d stem + learned positional embeddings + transformer layers.
Processes mel spectrograms into encoder hidden states for cross-attention with the decoder.
Input: mel spectrogram (batch, num_mels, time) Output: (batch, time // 2, hidden_size)
- Parameters:
- encoder_config – Audio encoder configuration
- running_config – Runtime configuration (optional)
forward(emb: Tensor, pad_mask: Tensor | None = None, **kwargs) → Tuple[Tensor, None]
Encode mel spectrogram features.
- Parameters:
- emb – Mel spectrogram tensor (batch, num_mels, time)
- pad_mask – Not used (fixed-length input)
- **kwargs – Additional arguments (ignored)
- Returns:
- Encoded output (batch, time//2, hidden_size)
- None (transformers don’t return final state)
- Return type: Tuple of
update_dropout(dropout: float, attention_dropout: float) → None
Update dropout rates for all transformer layers.