Skip to main content

Main Entrypoints

Common Base Config​

pydantic model eole.config.common.DistributedConfig[source]​

Bases: Config

Show JSON schema
{
"title": "DistributedConfig",
"type": "object",
"properties": {
"gpu_ranks": {
"default": [],
"description": "List of ranks for each process.",
"items": {
"type": "integer"
},
"title": "Gpu Ranks",
"type": "array"
},
"world_size": {
"default": 1,
"description": "Total number of distributed processes.",
"title": "World Size",
"type": "integer"
},
"parallel_mode": {
"default": "data_parallel",
"description": "Distributed mode.",
"enum": [
"data_parallel",
"tensor_parallel"
],
"title": "Parallel Mode",
"type": "string"
},
"gpu_backend": {
"default": "nccl",
"description": "Type of torch distributed backend.",
"title": "Gpu Backend",
"type": "string"
},
"gpu_verbose_level": {
"default": 0,
"description": "Gives more info on each process per GPU.",
"title": "Gpu Verbose Level",
"type": "integer"
},
"master_ip": {
"default": "localhost",
"description": "IP of master for torch.distributed training.",
"title": "Master Ip",
"type": "string"
},
"master_port": {
"default": 10000,
"description": "Port of master for torch.distributed training.",
"title": "Master Port",
"type": "integer"
},
"timeout": {
"default": 60,
"description": "Timeout for one GPU to wait for the others.",
"title": "Timeout",
"type": "integer"
}
},
"additionalProperties": false
}

field gpu_backend : str = 'nccl'​

Type of torch distributed backend.

field gpu_ranks : List[int] = []​

List of ranks for each process.

field gpu_verbose_level : int = 0​

Gives more info on each process per GPU.

field master_ip : str = 'localhost'​

IP of master for torch.distributed training.

field master_port : int = 10000​

Port of master for torch.distributed training.

field parallel_mode : Literal['data_parallel', 'tensor_parallel'] = 'data_parallel'​

Distributed mode.

field timeout : int = 60​

Timeout for one GPU to wait for the others.

field world_size : int = 1​

Total number of distributed processes.

property parallel_gpu : int[source]​

pydantic model eole.config.common.LoggingConfig[source]​

Bases: Config

Show JSON schema
{
"title": "LoggingConfig",
"type": "object",
"properties": {
"log_file": {
"default": "",
"description": "Output logs to a file under this path.",
"title": "Log File",
"type": "string"
},
"report_every": {
"default": 50,
"description": "Print stats at this interval (in steps).",
"title": "Report Every",
"type": "integer"
},
"valid_metrics": {
"default": [],
"description": "List of names of additional validation metrics.",
"items": {
"type": "string"
},
"title": "Valid Metrics",
"type": "array"
},
"scoring_debug": {
"default": false,
"description": "Dump src/ref/pred of the current batch.",
"title": "Scoring Debug",
"type": "boolean"
},
"dump_preds": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Folder to dump predictions to.",
"title": "Dump Preds"
},
"tensorboard": {
"default": false,
"description": "Use tensorboard for visualization during training.",
"title": "Tensorboard",
"type": "boolean"
},
"tensorboard_log_dir": {
"default": "runs/eole",
"description": "Log directory for tensorboard (also the name of the run).",
"title": "Tensorboard Log Dir",
"type": "string"
},
"tensorboard_log_dir_dated": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Tensorboard Log Dir Dated"
}
},
"additionalProperties": false
}

field dump_preds : str | None = None​

Folder to dump predictions to.

field log_file : str = ''​

Output logs to a file under this path.

field report_every : int = 50​

Print stats at this interval (in steps).

field scoring_debug : bool = False​

Dump src/ref/pred of the current batch.

field tensorboard : bool = False​

Use tensorboard for visualization during training.

field tensorboard_log_dir : str = 'runs/eole'​

Log directory for tensorboard (also the name of the run).

field tensorboard_log_dir_dated : str | None = None​

field valid_metrics : List[str] = []​

List of names of additional validation metrics.

pydantic model eole.config.common.LoRaConfig[source]​

Bases: Config

Show JSON schema
{
"title": "LoRaConfig",
"type": "object",
"properties": {
"lora_layers": {
"default": [],
"description": "List of layers to be replaced by LoRa layers. E.g. ['linear_values', 'linear_query'] (\u00a74.2 in https://arxiv.org/abs/2106.09685)",
"items": {
"type": "string"
},
"title": "Lora Layers",
"type": "array"
},
"lora_embedding": {
"default": false,
"description": "Replace embeddings with LoRa Embeddings (\u00a75.1)",
"title": "Lora Embedding",
"type": "boolean"
},
"lora_rank": {
"default": 2,
"description": "r=2 successfully tested with NLLB-200 3.3B",
"title": "Lora Rank",
"type": "integer"
},
"lora_alpha": {
"default": 1,
"description": "\u00a74.1 https://arxiv.org/abs/2106.09685",
"title": "Lora Alpha",
"type": "integer"
},
"lora_dropout": {
"default": 0.0,
"description": "Rule of thumb: same value as in main model.",
"title": "Lora Dropout",
"type": "number"
}
},
"additionalProperties": false
}

field lora_alpha : int = 1​

Β§4.1 https://arxiv.org/abs/2106.09685

field lora_dropout : float = 0.0​

Rule of thumb: same value as in main model.

field lora_embedding : bool = False​

Replace embeddings with LoRa Embeddings (Β§5.1)

field lora_layers : List[str] = []​

List of layers to be replaced by LoRa layers. E.g. [β€˜linear_values’, β€˜linear_query’] (Β§4.2 in https://arxiv.org/abs/2106.09685)

field lora_rank : int = 2​

r=2 successfully tested with NLLB-200 3.3B

pydantic model eole.config.common.QuantizeConfig[source]​

Bases: Config

Show JSON schema
{
"title": "QuantizeConfig",
"type": "object",
"properties": {
"quant_layers": {
"default": [],
"description": "List of layers to be compressed in 4/8bit.",
"items": {
"type": "string"
},
"title": "Quant Layers",
"type": "array"
},
"quant_type": {
"default": "",
"description": "Type of compression.",
"enum": [
"",
"bnb_9bit",
"bnb_FP4",
"bnb_NF4",
"awq_gemm",
"awq_gemv"
],
"title": "Quant Type",
"type": "string"
},
"w_bit": {
"default": 4,
"description": "W_bit quantization",
"title": "W Bit",
"type": "integer"
},
"group_size": {
"default": 128,
"description": "Group size quantization.",
"title": "Group Size",
"type": "integer"
}
},
"additionalProperties": false
}

field group_size : int = 128​

Group size quantization.

field quant_layers : List[str] = []​

List of layers to be compressed in 4/8bit.

field quant_type : Literal['', 'bnb_9bit', 'bnb_FP4', 'bnb_NF4', 'awq_gemm', 'awq_gemv'] = ''​

Type of compression.

field w_bit : int = 4​

W_bit quantization

pydantic model eole.config.common.MiscConfig[source]​

Bases: Config

Show JSON schema
{
"title": "MiscConfig",
"type": "object",
"properties": {
"seed": {
"default": -1,
"description": "Set random seed used for better reproducibility between experiments.",
"title": "Seed",
"type": "integer"
}
},
"additionalProperties": false
}

  • Config:
    • validate_assignment: bool = True
    • validate_default: bool = True
    • use_enum_values: bool = True
    • extra: str = forbid
    • protected_namespaces: tuple = ()
  • Fields:

field seed : int = -1​

Set random seed used for better reproducibility between experiments.

Run Config​

pydantic model eole.config.run.TrainConfig[source]​

Bases: LoggingConfig, MiscConfig, DataConfig, VocabConfig

Show JSON schema
{
"title": "TrainConfig",
"type": "object",
"properties": {
"src_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"description": "Path to src (or shared) vocabulary file. Format: one <word> or <word>\t<count> per line.",
"title": "Src Vocab"
},
"tgt_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to tgt vocabulary file. Format: one <word> or <word>\t<count> per line.",
"title": "Tgt Vocab"
},
"share_vocab": {
"default": false,
"description": "Share source and target vocabulary.",
"title": "Share Vocab",
"type": "boolean"
},
"decoder_start_token": {
"default": "&lt;s&gt;",
"description": "Default decoder start token. For most models it is &lt;s&gt; = BOS. Some fairseq models require &lt;/s&gt;.",
"title": "Decoder Start Token",
"type": "string"
},
"default_specials": {
"default": [
"<unk>",
"<blank>",
"&lt;s&gt;",
"&lt;/s&gt;"
],
"description": "Default specials used for vocab initialization. UNK, PAD, BOS, EOS will take IDs 0, 1, 2, 3.",
"items": {},
"title": "Default Specials",
"type": "array"
},
"both_embeddings": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to the embeddings file to use for both source and target tokens.",
"title": "Both Embeddings"
},
"src_embeddings": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to the embeddings file to use for source tokens.",
"title": "Src Embeddings"
},
"tgt_embeddings": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to the embeddings file to use for target tokens.",
"title": "Tgt Embeddings"
},
"embeddings_type": {
"anyOf": [
{
"enum": [
"GloVe",
"word2vec"
],
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Type of embeddings file.",
"title": "Embeddings Type"
},
"src_vocab_size": {
"default": 32758,
"description": "Maximum size of the source vocabulary.",
"title": "Src Vocab Size",
"type": "integer"
},
"tgt_vocab_size": {
"default": 32768,
"description": "Maximum size of the target vocabulary.",
"title": "Tgt Vocab Size",
"type": "integer"
},
"vocab_size_multiple": {
"default": 8,
"description": "Make the vocabulary size a multiple of this value. (Adds dummy tokens if needed.)",
"title": "Vocab Size Multiple",
"type": "integer"
},
"src_words_min_frequency": {
"default": 0,
"description": "Discard source words with lower frequency.",
"title": "Src Words Min Frequency",
"type": "integer"
},
"tgt_words_min_frequency": {
"default": 0,
"description": "Discard target words with lower frequency.",
"title": "Tgt Words Min Frequency",
"type": "integer"
},
"data": {
"anyOf": [
{
"additionalProperties": {
"$ref": "#/$defs/Dataset"
},
"type": "object"
},
{
"type": "null"
}
],
"description": "All datasets and their specifications. See examples/*.yaml for further details.",
"title": "Data"
},
"transforms": {
"default": [],
"description": "Default transform pipeline to apply to data. Can be specified in each corpus of data to override.",
"items": {
"type": "string"
},
"title": "Transforms",
"type": "array"
},
"transforms_configs": {
"anyOf": [
{
"$ref": "#/$defs/NestedAllTransformsConfig"
},
{
"type": "null"
}
]
},
"skip_empty_level": {
"default": "warning",
"description": "Logging level when encoutering empty examples. (silent: silently ignore/skip empty examples, warning: warn when ignoring/skipping empty examples, error: raise an error and stop execution when any empty example)",
"enum": [
"silent",
"warning",
"error"
],
"title": "Skip Empty Level",
"type": "string"
},
"n_sample": {
"default": 0,
"description": "Number of transformed samples per corpus to use to build the vocabulary. Set to -1 to use the full corpora.",
"title": "N Sample",
"type": "integer"
},
"save_data": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Output base path for objects that will be saved (vocab, transforms, embeddings, ...)",
"title": "Save Data"
},
"overwrite": {
"default": false,
"description": "Overwrite existing objects if any.",
"title": "Overwrite",
"type": "boolean"
},
"seed": {
"default": -1,
"description": "Set random seed used for better reproducibility between experiments.",
"title": "Seed",
"type": "integer"
},
"log_file": {
"default": "",
"description": "Output logs to a file under this path.",
"title": "Log File",
"type": "string"
},
"report_every": {
"default": 50,
"description": "Print stats at this interval (in steps).",
"title": "Report Every",
"type": "integer"
},
"valid_metrics": {
"default": [],
"description": "List of names of additional validation metrics.",
"items": {
"type": "string"
},
"title": "Valid Metrics",
"type": "array"
},
"scoring_debug": {
"default": false,
"description": "Dump src/ref/pred of the current batch.",
"title": "Scoring Debug",
"type": "boolean"
},
"dump_preds": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Folder to dump predictions to.",
"title": "Dump Preds"
},
"tensorboard": {
"default": false,
"description": "Use tensorboard for visualization during training.",
"title": "Tensorboard",
"type": "boolean"
},
"tensorboard_log_dir": {
"default": "runs/eole",
"description": "Log directory for tensorboard (also the name of the run).",
"title": "Tensorboard Log Dir",
"type": "string"
},
"tensorboard_log_dir_dated": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Tensorboard Log Dir Dated"
},
"verbose": {
"default": false,
"description": "Print data loading and statistics for all process (default only logs the first process shard).",
"title": "Verbose",
"type": "boolean"
},
"model": {
"anyOf": [
{
"discriminator": {
"mapping": {
"cnn": "#/$defs/CnnModelConfig",
"custom": "#/$defs/CustomModelConfig",
"rnn": "#/$defs/RnnModelConfig",
"transformer": "#/$defs/TransformerModelConfig",
"transformer_encoder": "#/$defs/TransformerEncoderModelConfig",
"transformer_lm": "#/$defs/TransformerLMModelConfig"
},
"propertyName": "architecture"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerModelConfig"
},
{
"$ref": "#/$defs/TransformerLMModelConfig"
},
{
"$ref": "#/$defs/TransformerEncoderModelConfig"
},
{
"$ref": "#/$defs/RnnModelConfig"
},
{
"$ref": "#/$defs/CnnModelConfig"
},
{
"$ref": "#/$defs/CustomModelConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"title": "Model"
},
"training": {
"anyOf": [
{
"$ref": "#/$defs/TrainingConfig"
},
{
"type": "null"
}
]
}
},
"$defs": {
"ActivationFunction": {
"enum": [
"relu",
"gelu",
"silu",
"gated-gelu",
"gated-silu"
],
"title": "ActivationFunction",
"type": "string"
},
"BARTNoiseConfig": {
"additionalProperties": false,
"properties": {
"permute_sent_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Permute this proportion of sentences (boundaries defined by ['.', '?', '!']) in all inputs.",
"title": "Permute Sent Ratio"
},
"rotate_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Rotate this proportion of inputs.",
"title": "Rotate Ratio"
},
"insert_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Insert this percentage of additional random tokens.",
"title": "Insert Ratio"
},
"random_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Instead of using <mask>, use random token this often.",
"title": "Random Ratio"
},
"mask_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Fraction of words/subwords that will be masked.",
"title": "Mask Ratio"
},
"mask_length": {
"anyOf": [
{
"enum": [
"subword",
"word",
"span-poisson"
],
"type": "string"
},
{
"type": "null"
}
],
"default": "subword",
"description": "Length of masking window to apply.",
"title": "Mask Length"
},
"poisson_lambda": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 3.0,
"description": "Lambda for Poisson distribution to sample span length if `-mask_length` set to span-poisson.",
"title": "Poisson Lambda"
},
"replace_length": {
"anyOf": [
{
"maximum": 1,
"minimum": -1,
"type": "integer"
},
{
"type": "null"
}
],
"default": -1,
"description": "When masking N tokens, replace with 0, 1, or N tokens. (use -1 for N)",
"title": "Replace Length"
}
},
"title": "BARTNoiseConfig",
"type": "object"
},
"BaseTokenizerConfig": {
"additionalProperties": false,
"properties": {
"src_subword_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path of subword model for src (or shared).",
"title": "Src Subword Model"
},
"tgt_subword_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path of subword model for tgt.",
"title": "Tgt Subword Model"
},
"src_subword_nbest": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (source side)",
"title": "Src Subword Nbest"
},
"tgt_subword_nbest": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (target side)",
"title": "Tgt Subword Nbest"
},
"src_subword_alpha": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0,
"description": "Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (source side)",
"title": "Src Subword Alpha"
},
"tgt_subword_alpha": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0,
"description": "Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (target side)",
"title": "Tgt Subword Alpha"
},
"src_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Path to the vocabulary file for src subword. Format: <word>\\t<count> per line.",
"title": "Src Subword Vocab"
},
"tgt_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Path to the vocabulary file for tgt subword. Format: <word>\\t<count> per line.",
"title": "Tgt Subword Vocab"
},
"src_vocab_threshold": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Only produce src subword in src_subword_vocab with frequency >= src_vocab_threshold.",
"title": "Src Vocab Threshold"
},
"tgt_vocab_threshold": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Only produce tgt subword in tgt_subword_vocab with frequency >= tgt_vocab_threshold.",
"title": "Tgt Vocab Threshold"
}
},
"title": "BaseTokenizerConfig",
"type": "object"
},
"CleanConfig": {
"additionalProperties": false,
"properties": {
"src_eq_tgt": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Remove ex src==tgt",
"title": "Src Eq Tgt"
},
"same_char": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Remove ex with same char more than 4 times",
"title": "Same Char"
},
"same_word": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Remove ex with same word more than 3 times",
"title": "Same Word"
},
"scripts_ok": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
"Latin",
"Common"
],
"description": "list of unicodata scripts accepted",
"title": "Scripts Ok"
},
"scripts_nok": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [],
"description": "list of unicodata scripts not accepted",
"title": "Scripts Nok"
},
"src_tgt_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 2.0,
"description": "ratio between src and tgt",
"title": "Src Tgt Ratio"
},
"avg_tok_min": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 3.0,
"description": "average length of tokens min",
"title": "Avg Tok Min"
},
"avg_tok_max": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 20.0,
"description": "average length of tokens max",
"title": "Avg Tok Max"
},
"langid": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [],
"description": "list of languages accepted",
"title": "Langid"
}
},
"title": "CleanConfig",
"type": "object"
},
"CnnDecoderConfig": {
"additionalProperties": false,
"properties": {
"decoder_type": {
"const": "cnn",
"default": "cnn",
"enum": [
"cnn"
],
"title": "Decoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the decoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of decoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"tgt_word_vec_size": {
"default": 512,
"description": "Word embedding size for tgt.",
"title": "Tgt Word Vec Size",
"type": "integer"
},
"coverage_attn": {
"default": false,
"description": "Train a coverage attention layer.",
"title": "Coverage Attn",
"type": "boolean"
},
"lambda_coverage": {
"default": 0.0,
"description": "Lambda value for coverage loss of See et al (2017)",
"title": "Lambda Coverage",
"type": "number"
},
"global_attention": {
"default": "general",
"description": "The attention type to use. (Luong=general, Bahdanau=MLP)",
"enum": [
"dot",
"general",
"mlp",
null
],
"title": "Global Attention"
},
"global_attention_function": {
"default": "softmax",
"description": "Global attention function to use.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Global Attention Function",
"type": "string"
},
"cnn_kernel_width": {
"default": 3,
"description": "Size of windows in the cnn, the kernel_size is (cnn_kernel_width, 1) in convolution layers.",
"title": "Cnn Kernel Width",
"type": "integer"
}
},
"title": "CnnDecoderConfig",
"type": "object"
},
"CnnEncoderConfig": {
"additionalProperties": false,
"properties": {
"encoder_type": {
"const": "cnn",
"default": "cnn",
"enum": [
"cnn"
],
"title": "Encoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the encoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of encoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"src_word_vec_size": {
"default": 512,
"description": "Word embedding size for src.",
"title": "Src Word Vec Size",
"type": "integer"
},
"cnn_kernel_width": {
"default": 3,
"description": "Size of windows in the cnn, the kernel_size is (cnn_kernel_width, 1) in convolution layers.",
"title": "Cnn Kernel Width",
"type": "integer"
}
},
"title": "CnnEncoderConfig",
"type": "object"
},
"CnnModelConfig": {
"additionalProperties": false,
"properties": {
"embeddings": {
"$ref": "#/$defs/EmbeddingsConfig",
"description": "Contains most of the args useful to build the Embeddings module."
},
"encoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"brnn": "#/$defs/RnnEncoderConfig",
"cnn": "#/$defs/CnnEncoderConfig",
"mean": "#/$defs/MeanEncoderConfig",
"rnn": "#/$defs/RnnEncoderConfig",
"transformer": "#/$defs/TransformerEncoderConfig"
},
"propertyName": "encoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerEncoderConfig"
},
{
"$ref": "#/$defs/RnnEncoderConfig"
},
{
"$ref": "#/$defs/CnnEncoderConfig"
},
{
"$ref": "#/$defs/MeanEncoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of an encoder.",
"title": "Encoder"
},
"decoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"cnn": "#/$defs/CnnDecoderConfig",
"rnn": "#/$defs/RnnDecoderConfig",
"transformer": "#/$defs/TransformerDecoderConfig",
"transformer_lm": "#/$defs/TransformerLMDecoderConfig"
},
"propertyName": "decoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerDecoderConfig"
},
{
"$ref": "#/$defs/TransformerLMDecoderConfig"
},
{
"$ref": "#/$defs/RnnDecoderConfig"
},
{
"$ref": "#/$defs/CnnDecoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of a decoder.",
"title": "Decoder"
},
"hidden_size": {
"default": -1,
"description": "Size of hidden states. Overwrites [encoder/decoder].hidden_size if set.",
"title": "Hidden Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"layers": {
"default": -1,
"description": "Number of layers in both encoder and decoder (will overwrite enc_layers/dec_layers).",
"title": "Layers",
"type": "integer"
},
"transformer_ff": {
"default": -1,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"share_decoder_embeddings": {
"default": false,
"description": "Use a share weight matrix for the input and output word embeddings in the decoder.",
"title": "Share Decoder Embeddings",
"type": "boolean"
},
"share_embeddings": {
"default": false,
"description": "Share the word embeddings between encoder and decoder. Need to use shared vocabulary for this option.",
"title": "Share Embeddings",
"type": "boolean"
},
"input_feed": {
"default": 1,
"description": "Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.",
"title": "Input Feed",
"type": "integer"
},
"generator_function": {
"default": "softmax",
"description": "Which function to use for generating probabilities over the target vocabulary.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Generator Function",
"type": "string"
},
"add_estimator": {
"default": false,
"description": "Add estimator layer",
"title": "Add Estimator",
"type": "boolean"
},
"left_pad": {
"default": false,
"description": "Enable left-padding, useful for some LLMs.",
"title": "Left Pad",
"type": "boolean"
},
"architecture": {
"const": "cnn",
"default": "cnn",
"enum": [
"cnn"
],
"title": "Architecture",
"type": "string"
},
"cnn_kernel_width": {
"default": 3,
"description": "Size of windows in the cnn, the kernel_size is (cnn_kernel_width, 1) in convolution layers.",
"title": "Cnn Kernel Width",
"type": "integer"
}
},
"title": "CnnModelConfig",
"type": "object"
},
"CustomModelConfig": {
"additionalProperties": false,
"description": "Wrap anything that does not fit a set common architecture.",
"properties": {
"embeddings": {
"$ref": "#/$defs/EmbeddingsConfig",
"description": "Contains most of the args useful to build the Embeddings module."
},
"encoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"brnn": "#/$defs/RnnEncoderConfig",
"cnn": "#/$defs/CnnEncoderConfig",
"mean": "#/$defs/MeanEncoderConfig",
"rnn": "#/$defs/RnnEncoderConfig",
"transformer": "#/$defs/TransformerEncoderConfig"
},
"propertyName": "encoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerEncoderConfig"
},
{
"$ref": "#/$defs/RnnEncoderConfig"
},
{
"$ref": "#/$defs/CnnEncoderConfig"
},
{
"$ref": "#/$defs/MeanEncoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of an encoder.",
"title": "Encoder"
},
"decoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"cnn": "#/$defs/CnnDecoderConfig",
"rnn": "#/$defs/RnnDecoderConfig",
"transformer": "#/$defs/TransformerDecoderConfig",
"transformer_lm": "#/$defs/TransformerLMDecoderConfig"
},
"propertyName": "decoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerDecoderConfig"
},
{
"$ref": "#/$defs/TransformerLMDecoderConfig"
},
{
"$ref": "#/$defs/RnnDecoderConfig"
},
{
"$ref": "#/$defs/CnnDecoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of a decoder.",
"title": "Decoder"
},
"hidden_size": {
"default": -1,
"description": "Size of hidden states. Overwrites [encoder/decoder].hidden_size if set.",
"title": "Hidden Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"layers": {
"default": -1,
"description": "Number of layers in both encoder and decoder (will overwrite enc_layers/dec_layers).",
"title": "Layers",
"type": "integer"
},
"transformer_ff": {
"default": -1,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"share_decoder_embeddings": {
"default": false,
"description": "Use a share weight matrix for the input and output word embeddings in the decoder.",
"title": "Share Decoder Embeddings",
"type": "boolean"
},
"share_embeddings": {
"default": false,
"description": "Share the word embeddings between encoder and decoder. Need to use shared vocabulary for this option.",
"title": "Share Embeddings",
"type": "boolean"
},
"input_feed": {
"default": 1,
"description": "Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.",
"title": "Input Feed",
"type": "integer"
},
"generator_function": {
"default": "softmax",
"description": "Which function to use for generating probabilities over the target vocabulary.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Generator Function",
"type": "string"
},
"add_estimator": {
"default": false,
"description": "Add estimator layer",
"title": "Add Estimator",
"type": "boolean"
},
"left_pad": {
"default": false,
"description": "Enable left-padding, useful for some LLMs.",
"title": "Left Pad",
"type": "boolean"
},
"architecture": {
"const": "custom",
"default": "custom",
"enum": [
"custom"
],
"title": "Architecture",
"type": "string"
}
},
"title": "CustomModelConfig",
"type": "object"
},
"Dataset": {
"additionalProperties": false,
"properties": {
"name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Name"
},
"weight": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"title": "Weight"
},
"transforms": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"title": "Transforms"
},
"path_src": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Path Src"
},
"path_tgt": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Path Tgt"
},
"path_sco": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Path Sco"
},
"path_txt": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Path Txt"
},
"path_align": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Path Align"
},
"src_prefix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Src Prefix"
},
"tgt_prefix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Tgt Prefix"
},
"src_suffix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Src Suffix"
},
"tgt_suffix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Tgt Suffix"
},
"src_lang": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Src Lang"
},
"tgt_lang": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Tgt Lang"
},
"penn": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Penn"
},
"norm_quote_commas": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Norm Quote Commas"
},
"norm_numbers": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Norm Numbers"
},
"pre_replace_unicode_punct": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"title": "Pre Replace Unicode Punct"
},
"post_remove_control_chars": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"title": "Post Remove Control Chars"
},
"src_eq_tgt": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Src Eq Tgt"
},
"same_char": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Same Char"
},
"same_word": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Same Word"
},
"scripts_ok": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
"Latin",
"Common"
],
"title": "Scripts Ok"
},
"scripts_nok": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [],
"title": "Scripts Nok"
},
"src_tgt_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 2,
"title": "Src Tgt Ratio"
},
"avg_tok_min": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 3,
"title": "Avg Tok Min"
},
"avg_tok_max": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 20,
"title": "Avg Tok Max"
},
"lang_id": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
"en",
"fr"
],
"title": "Lang Id"
}
},
"title": "Dataset",
"type": "object"
},
"DocifyConfig": {
"additionalProperties": false,
"properties": {
"doc_length": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 200,
"description": "Number of tokens per doc.",
"title": "Doc Length"
},
"max_context": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Max context segments.",
"title": "Max Context"
}
},
"title": "DocifyConfig",
"type": "object"
},
"EmbeddingsConfig": {
"additionalProperties": false,
"properties": {
"src_word_vec_size": {
"default": 512,
"description": "Word embedding size for src.",
"title": "Src Word Vec Size",
"type": "integer"
},
"tgt_word_vec_size": {
"default": 512,
"description": "Word embedding size for tgt.",
"title": "Tgt Word Vec Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"freeze_word_vecs_enc": {
"default": false,
"description": "Freeze word embeddings on the encoder side.",
"title": "Freeze Word Vecs Enc",
"type": "boolean"
},
"freeze_word_vecs_dec": {
"default": false,
"description": "Freeze word embeddings on the encoder side.",
"title": "Freeze Word Vecs Dec",
"type": "boolean"
},
"position_encoding": {
"default": false,
"description": "Absolute position encoding, see position_encoding_type. Necessary for non-RNN style models.",
"title": "Position Encoding",
"type": "boolean"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"position_shift": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Positions IDS shift before making position embed dirty patch to cover for xlm-roberta-xl",
"title": "Position Shift"
}
},
"title": "EmbeddingsConfig",
"type": "object"
},
"FilterTooLongConfig": {
"additionalProperties": false,
"properties": {
"src_seq_length": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 192,
"description": "Maximum source sequence length.",
"title": "Src Seq Length"
},
"tgt_seq_length": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 192,
"description": "Maximum target sequence length.",
"title": "Tgt Seq Length"
}
},
"title": "FilterTooLongConfig",
"type": "object"
},
"InlineTagsConfig": {
"additionalProperties": false,
"properties": {
"tags_dictionary_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to a flat term dictionary.",
"title": "Tags Dictionary Path"
},
"tags_corpus_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.1,
"description": "Ratio of corpus to augment with tags.",
"title": "Tags Corpus Ratio"
},
"max_tags": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 12,
"description": "Maximum number of tags that can be added to a single sentence.",
"title": "Max Tags"
},
"paired_stag": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5fph_#_beg\uff60",
"description": "The format of an opening paired inline tag. Must include the character #.",
"title": "Paired Stag"
},
"paired_etag": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5fph_#_end\uff60",
"description": "The format of a closing paired inline tag. Must include the character #.",
"title": "Paired Etag"
},
"isolated_tag": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5fph_#_std\uff60",
"description": "The format of an isolated inline tag. Must include the character #.",
"title": "Isolated Tag"
},
"src_delimiter": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5ffuzzy\uff60",
"description": "Any special token used for augmented src sentences. The default is the fuzzy token used in the FuzzyMatch transform.",
"title": "Src Delimiter"
}
},
"title": "InlineTagsConfig",
"type": "object"
},
"InsertMaskBeforePlaceholderConfig": {
"additionalProperties": false,
"properties": {
"response_patterns": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
"Response : \uff5fnewline\uff60"
],
"description": "Response pattern to locate the end of the prompt.",
"title": "Response Patterns"
}
},
"title": "InsertMaskBeforePlaceholderConfig",
"type": "object"
},
"MeanEncoderConfig": {
"additionalProperties": false,
"properties": {
"encoder_type": {
"const": "mean",
"default": "mean",
"enum": [
"mean"
],
"title": "Encoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the encoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of encoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"src_word_vec_size": {
"default": 512,
"description": "Word embedding size for src.",
"title": "Src Word Vec Size",
"type": "integer"
}
},
"title": "MeanEncoderConfig",
"type": "object"
},
"NestedAllTransformsConfig": {
"additionalProperties": false,
"properties": {
"docify": {
"$ref": "#/$defs/DocifyConfig",
"default": {
"doc_length": 200,
"max_context": 1
}
},
"inlinetags": {
"$ref": "#/$defs/InlineTagsConfig",
"default": {
"tags_dictionary_path": null,
"tags_corpus_ratio": 0.1,
"max_tags": 12,
"paired_stag": "\uff5fph_#_beg\uff60",
"paired_etag": "\uff5fph_#_end\uff60",
"isolated_tag": "\uff5fph_#_std\uff60",
"src_delimiter": "\uff5ffuzzy\uff60"
}
},
"terminology": {
"$ref": "#/$defs/TerminologyConfig",
"default": {
"termbase_path": null,
"src_spacy_language_model": null,
"tgt_spacy_language_model": null,
"term_corpus_ratio": 0.3,
"term_example_ratio": 0.2,
"src_term_stoken": "\uff5fsrc_term_start\uff60",
"tgt_term_stoken": "\uff5ftgt_term_start\uff60",
"tgt_term_etoken": "\uff5ftgt_term_end\uff60",
"term_source_delimiter": "\uff5ffuzzy\uff60"
}
},
"bart": {
"$ref": "#/$defs/BARTNoiseConfig",
"default": {
"permute_sent_ratio": 0.0,
"rotate_ratio": 0.0,
"insert_ratio": 0.0,
"random_ratio": 0.0,
"mask_ratio": 0.0,
"mask_length": "subword",
"poisson_lambda": 3.0,
"replace_length": -1
}
},
"uppercase": {
"$ref": "#/$defs/UpperCaseConfig",
"default": {
"upper_corpus_ratio": 0.01
}
},
"clean": {
"$ref": "#/$defs/CleanConfig",
"default": {
"src_eq_tgt": false,
"same_char": false,
"same_word": false,
"scripts_ok": [
"Latin",
"Common"
],
"scripts_nok": [],
"src_tgt_ratio": 2.0,
"avg_tok_min": 3.0,
"avg_tok_max": 20.0,
"langid": []
}
},
"switchout": {
"$ref": "#/$defs/SwitchOutConfig",
"default": {
"switchout_temperature": 1.0
}
},
"tokendrop": {
"$ref": "#/$defs/TokenDropConfig",
"default": {
"tokendrop_temperature": 1.0
}
},
"tokenmask": {
"$ref": "#/$defs/TokenMaskConfig",
"default": {
"tokenmask_temperature": 1.0
}
},
"insert_mask_before_placeholder": {
"$ref": "#/$defs/InsertMaskBeforePlaceholderConfig",
"default": {
"response_patterns": [
"Response : \uff5fnewline\uff60"
]
}
},
"filtertoolong": {
"$ref": "#/$defs/FilterTooLongConfig",
"default": {
"src_seq_length": 192,
"tgt_seq_length": 192
}
},
"prefix": {
"$ref": "#/$defs/PrefixConfig",
"default": {
"src_prefix": "",
"tgt_prefix": ""
}
},
"suffix": {
"$ref": "#/$defs/SuffixConfig",
"default": {
"src_suffix": "",
"tgt_suffix": ""
}
},
"sentencepiece": {
"$ref": "#/$defs/BaseTokenizerConfig",
"default": {
"src_subword_model": null,
"tgt_subword_model": null,
"src_subword_nbest": 1,
"tgt_subword_nbest": 1,
"src_subword_alpha": 0.0,
"tgt_subword_alpha": 0.0,
"src_subword_vocab": "",
"tgt_subword_vocab": "",
"src_vocab_threshold": 0,
"tgt_vocab_threshold": 0
}
},
"bpe": {
"$ref": "#/$defs/BaseTokenizerConfig",
"default": {
"src_subword_model": null,
"tgt_subword_model": null,
"src_subword_nbest": 1,
"tgt_subword_nbest": 1,
"src_subword_alpha": 0.0,
"tgt_subword_alpha": 0.0,
"src_subword_vocab": "",
"tgt_subword_vocab": "",
"src_vocab_threshold": 0,
"tgt_vocab_threshold": 0
}
},
"onmt_tokenize": {
"$ref": "#/$defs/ONMTTokenizerConfig",
"default": {
"src_subword_model": null,
"tgt_subword_model": null,
"src_subword_nbest": 1,
"tgt_subword_nbest": 1,
"src_subword_alpha": 0.0,
"tgt_subword_alpha": 0.0,
"src_subword_vocab": "",
"tgt_subword_vocab": "",
"src_vocab_threshold": 0,
"tgt_vocab_threshold": 0,
"src_subword_type": "none",
"tgt_subword_type": "none",
"src_onmttok_kwargs": {
"mode": "none"
},
"tgt_onmttok_kwargs": {
"mode": "none"
},
"gpt2_pretok": false,
"mapped_tokens": null
}
},
"normalize": {
"$ref": "#/$defs/NormalizeConfig",
"default": {
"src_lang": "",
"tgt_lang": "",
"penn": true,
"norm_quote_commas": true,
"norm_numbers": true,
"pre_replace_unicode_punct": false,
"post_remove_control_chars": false
}
}
},
"title": "NestedAllTransformsConfig",
"type": "object"
},
"NormalizeConfig": {
"additionalProperties": false,
"properties": {
"src_lang": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Source language code",
"title": "Src Lang"
},
"tgt_lang": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Target language code",
"title": "Tgt Lang"
},
"penn": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Penn substitution",
"title": "Penn"
},
"norm_quote_commas": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Normalize quotations and commas",
"title": "Norm Quote Commas"
},
"norm_numbers": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Normalize numbers",
"title": "Norm Numbers"
},
"pre_replace_unicode_punct": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Replace unicode punct",
"title": "Pre Replace Unicode Punct"
},
"post_remove_control_chars": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Remove control chars",
"title": "Post Remove Control Chars"
}
},
"title": "NormalizeConfig",
"type": "object"
},
"ONMTTokenizerConfig": {
"additionalProperties": false,
"properties": {
"src_subword_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path of subword model for src (or shared).",
"title": "Src Subword Model"
},
"tgt_subword_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path of subword model for tgt.",
"title": "Tgt Subword Model"
},
"src_subword_nbest": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (source side)",
"title": "Src Subword Nbest"
},
"tgt_subword_nbest": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (target side)",
"title": "Tgt Subword Nbest"
},
"src_subword_alpha": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0,
"description": "Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (source side)",
"title": "Src Subword Alpha"
},
"tgt_subword_alpha": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0,
"description": "Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (target side)",
"title": "Tgt Subword Alpha"
},
"src_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Path to the vocabulary file for src subword. Format: <word>\\t<count> per line.",
"title": "Src Subword Vocab"
},
"tgt_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Path to the vocabulary file for tgt subword. Format: <word>\\t<count> per line.",
"title": "Tgt Subword Vocab"
},
"src_vocab_threshold": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Only produce src subword in src_subword_vocab with frequency >= src_vocab_threshold.",
"title": "Src Vocab Threshold"
},
"tgt_vocab_threshold": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Only produce tgt subword in tgt_subword_vocab with frequency >= tgt_vocab_threshold.",
"title": "Tgt Vocab Threshold"
},
"src_subword_type": {
"anyOf": [
{
"enum": [
"none",
"sentencepiece",
"bpe"
],
"type": "string"
},
{
"type": "null"
}
],
"default": "none",
"description": "Type of subword model for src (or shared) in pyonmttok.",
"title": "Src Subword Type"
},
"tgt_subword_type": {
"anyOf": [
{
"enum": [
"none",
"sentencepiece",
"bpe"
],
"type": "string"
},
{
"type": "null"
}
],
"default": "none",
"description": "Type of subword model for tgt in pyonmttok.",
"title": "Tgt Subword Type"
},
"src_onmttok_kwargs": {
"anyOf": [
{
"type": "object"
},
{
"type": "null"
}
],
"default": {
"mode": "none"
},
"description": "Other pyonmttok options for src in dict string, except subword related options listed earlier.",
"title": "Src Onmttok Kwargs"
},
"tgt_onmttok_kwargs": {
"anyOf": [
{
"type": "object"
},
{
"type": "null"
}
],
"default": {
"mode": "none"
},
"description": "Other pyonmttok options for tgt in dict string, except subword related options listed earlier.",
"title": "Tgt Onmttok Kwargs"
},
"gpt2_pretok": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Preprocess sentence with byte-level mapping.",
"title": "Gpt2 Pretok"
},
"mapped_tokens": {
"anyOf": [
{
"items": {
"maxItems": 2,
"minItems": 2,
"prefixItems": [
{
"type": "string"
},
{
"type": "string"
}
],
"type": "array"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"description": "Mapped tokens for placeholders preservation",
"title": "Mapped Tokens"
}
},
"title": "ONMTTokenizerConfig",
"type": "object"
},
"PositionEncodingType": {
"enum": [
"SinusoidalInterleaved",
"SinusoidalConcat",
"Learned",
"Relative",
"Rotary",
"Alibi"
],
"title": "PositionEncodingType",
"type": "string"
},
"PrefixConfig": {
"additionalProperties": false,
"properties": {
"src_prefix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "String to prepend to all source examples.",
"title": "Src Prefix"
},
"tgt_prefix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "String to prepend to all target examples.",
"title": "Tgt Prefix"
}
},
"title": "PrefixConfig",
"type": "object"
},
"RnnDecoderConfig": {
"additionalProperties": false,
"properties": {
"decoder_type": {
"const": "rnn",
"default": "rnn",
"enum": [
"rnn"
],
"title": "Decoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the decoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of decoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"tgt_word_vec_size": {
"default": 512,
"description": "Word embedding size for tgt.",
"title": "Tgt Word Vec Size",
"type": "integer"
},
"coverage_attn": {
"default": false,
"description": "Train a coverage attention layer.",
"title": "Coverage Attn",
"type": "boolean"
},
"lambda_coverage": {
"default": 0.0,
"description": "Lambda value for coverage loss of See et al (2017)",
"title": "Lambda Coverage",
"type": "number"
},
"global_attention": {
"default": "general",
"description": "The attention type to use. (Luong=general, Bahdanau=MLP)",
"enum": [
"dot",
"general",
"mlp",
null
],
"title": "Global Attention"
},
"global_attention_function": {
"default": "softmax",
"description": "Global attention function to use.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Global Attention Function",
"type": "string"
},
"bridge": {
"default": false,
"description": "Have an additional layer between the last encoder state and the first decoder state (RNN specific).",
"title": "Bridge",
"type": "boolean"
},
"rnn_type": {
"default": "LSTM",
"description": "The gate type to use in the RNNs.",
"enum": [
"LSTM",
"GRU"
],
"title": "Rnn Type",
"type": "string"
},
"context_gate": {
"default": null,
"description": "Type of context gate to use.",
"enum": [
"source",
"target",
"both",
null
],
"title": "Context Gate"
},
"bidirectional_encoder": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"title": "Bidirectional Encoder"
}
},
"title": "RnnDecoderConfig",
"type": "object"
},
"RnnEncoderConfig": {
"additionalProperties": false,
"properties": {
"encoder_type": {
"default": "rnn",
"enum": [
"rnn",
"brnn"
],
"title": "Encoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the encoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of encoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"src_word_vec_size": {
"default": 512,
"description": "Word embedding size for src.",
"title": "Src Word Vec Size",
"type": "integer"
},
"bridge": {
"default": false,
"description": "Have an additional layer between the last encoder state and the first decoder state (RNN specific).",
"title": "Bridge",
"type": "boolean"
},
"rnn_type": {
"default": "LSTM",
"description": "The gate type to use in the RNNs.",
"enum": [
"LSTM",
"GRU"
],
"title": "Rnn Type",
"type": "string"
}
},
"title": "RnnEncoderConfig",
"type": "object"
},
"RnnModelConfig": {
"additionalProperties": false,
"properties": {
"embeddings": {
"$ref": "#/$defs/EmbeddingsConfig",
"description": "Contains most of the args useful to build the Embeddings module."
},
"encoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"brnn": "#/$defs/RnnEncoderConfig",
"cnn": "#/$defs/CnnEncoderConfig",
"mean": "#/$defs/MeanEncoderConfig",
"rnn": "#/$defs/RnnEncoderConfig",
"transformer": "#/$defs/TransformerEncoderConfig"
},
"propertyName": "encoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerEncoderConfig"
},
{
"$ref": "#/$defs/RnnEncoderConfig"
},
{
"$ref": "#/$defs/CnnEncoderConfig"
},
{
"$ref": "#/$defs/MeanEncoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of an encoder.",
"title": "Encoder"
},
"decoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"cnn": "#/$defs/CnnDecoderConfig",
"rnn": "#/$defs/RnnDecoderConfig",
"transformer": "#/$defs/TransformerDecoderConfig",
"transformer_lm": "#/$defs/TransformerLMDecoderConfig"
},
"propertyName": "decoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerDecoderConfig"
},
{
"$ref": "#/$defs/TransformerLMDecoderConfig"
},
{
"$ref": "#/$defs/RnnDecoderConfig"
},
{
"$ref": "#/$defs/CnnDecoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of a decoder.",
"title": "Decoder"
},
"hidden_size": {
"default": -1,
"description": "Size of hidden states. Overwrites [encoder/decoder].hidden_size if set.",
"title": "Hidden Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"layers": {
"default": -1,
"description": "Number of layers in both encoder and decoder (will overwrite enc_layers/dec_layers).",
"title": "Layers",
"type": "integer"
},
"transformer_ff": {
"default": -1,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"share_decoder_embeddings": {
"default": false,
"description": "Use a share weight matrix for the input and output word embeddings in the decoder.",
"title": "Share Decoder Embeddings",
"type": "boolean"
},
"share_embeddings": {
"default": false,
"description": "Share the word embeddings between encoder and decoder. Need to use shared vocabulary for this option.",
"title": "Share Embeddings",
"type": "boolean"
},
"input_feed": {
"default": 1,
"description": "Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.",
"title": "Input Feed",
"type": "integer"
},
"generator_function": {
"default": "softmax",
"description": "Which function to use for generating probabilities over the target vocabulary.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Generator Function",
"type": "string"
},
"add_estimator": {
"default": false,
"description": "Add estimator layer",
"title": "Add Estimator",
"type": "boolean"
},
"left_pad": {
"default": false,
"description": "Enable left-padding, useful for some LLMs.",
"title": "Left Pad",
"type": "boolean"
},
"architecture": {
"const": "rnn",
"default": "rnn",
"enum": [
"rnn"
],
"title": "Architecture",
"type": "string"
},
"bridge": {
"default": false,
"description": "Have an additional layer between the last encoder state and the first decoder state (RNN specific).",
"title": "Bridge",
"type": "boolean"
},
"rnn_type": {
"default": "LSTM",
"description": "The gate type to use in the RNNs.",
"enum": [
"LSTM",
"GRU"
],
"title": "Rnn Type",
"type": "string"
}
},
"title": "RnnModelConfig",
"type": "object"
},
"RotaryPositionConfig": {
"additionalProperties": false,
"description": "Configuration for rotary position embeddings used in transformer models.",
"properties": {
"rotary_interleave": {
"default": true,
"description": "Interleave the head dimensions when rotary embeddings are applied. Otherwise the head dimensions are sliced in half. (True=default Llama from Meta (original), False= used by all HuggingFace models)",
"title": "Rotary Interleave",
"type": "boolean"
},
"rotary_theta": {
"default": 10000,
"description": "Rotary theta base length, 1e4 for Llama2.Mistral, 1e6 for Mixtral",
"title": "Rotary Theta",
"type": "integer"
},
"rotary_dim": {
"default": 0,
"description": "Rotary dim when model requires it to be different to head dim.",
"title": "Rotary Dim",
"type": "integer"
},
"scaling_type": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Specifies the type of RoPE scaling to be applied, if any.",
"title": "Scaling Type"
},
"scaling_factor": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 8.0,
"description": "Factor by which to scale RoPE embeddings.",
"title": "Scaling Factor"
},
"low_freq_factor": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 1.0,
"description": "Scaling factor applied to the lower frequency components of RoPE.",
"title": "Low Freq Factor"
},
"high_freq_factor": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 4.0,
"description": "Scaling factor applied to the higher frequency components of RoPE.",
"title": "High Freq Factor"
},
"original_max_position_embeddings": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 8192,
"description": "Original maximum position embeddings for RoPE scaling.",
"title": "Original Max Position Embeddings"
}
},
"title": "RotaryPositionConfig",
"type": "object"
},
"SuffixConfig": {
"additionalProperties": false,
"properties": {
"src_suffix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "String to append to all source examples.",
"title": "Src Suffix"
},
"tgt_suffix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "String to append to all target examples.",
"title": "Tgt Suffix"
}
},
"title": "SuffixConfig",
"type": "object"
},
"SwitchOutConfig": {
"additionalProperties": false,
"properties": {
"switchout_temperature": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 1.0,
"description": "Sampling temperature for SwitchOut. :math:`\\tau^{-1}` in :cite:`DBLP:journals/corr/abs-1808-07512`. Smaller value makes data more diverse.",
"title": "Switchout Temperature"
}
},
"title": "SwitchOutConfig",
"type": "object"
},
"TerminologyConfig": {
"additionalProperties": false,
"properties": {
"termbase_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to a dictionary file with terms.",
"title": "Termbase Path"
},
"src_spacy_language_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Name of the spaCy language model for the source corpus.",
"title": "Src Spacy Language Model"
},
"tgt_spacy_language_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Name of the spaCy language model for the target corpus.",
"title": "Tgt Spacy Language Model"
},
"term_corpus_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.3,
"description": "Ratio of corpus to augment with terms.",
"title": "Term Corpus Ratio"
},
"term_example_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.2,
"description": "Maximum terms allowed in an example.",
"title": "Term Example Ratio"
},
"src_term_stoken": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5fsrc_term_start\uff60",
"description": "The source term start token.",
"title": "Src Term Stoken"
},
"tgt_term_stoken": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5ftgt_term_start\uff60",
"description": "The target term start token.",
"title": "Tgt Term Stoken"
},
"tgt_term_etoken": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5ftgt_term_end\uff60",
"description": "The target term end token.",
"title": "Tgt Term Etoken"
},
"term_source_delimiter": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5ffuzzy\uff60",
"description": "Any special token used for augmented source sentences. The default is the fuzzy token used in the FuzzyMatch transform.",
"title": "Term Source Delimiter"
}
},
"title": "TerminologyConfig",
"type": "object"
},
"TokenDropConfig": {
"additionalProperties": false,
"properties": {
"tokendrop_temperature": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 1.0,
"description": "Sampling temperature for token deletion.",
"title": "Tokendrop Temperature"
}
},
"title": "TokenDropConfig",
"type": "object"
},
"TokenMaskConfig": {
"additionalProperties": false,
"properties": {
"tokenmask_temperature": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 1.0,
"description": "Sampling temperature for token masking.",
"title": "Tokenmask Temperature"
}
},
"title": "TokenMaskConfig",
"type": "object"
},
"TrainingConfig": {
"additionalProperties": false,
"properties": {
"quant_layers": {
"default": [],
"description": "List of layers to be compressed in 4/8bit.",
"items": {
"type": "string"
},
"title": "Quant Layers",
"type": "array"
},
"quant_type": {
"default": "",
"description": "Type of compression.",
"enum": [
"",
"bnb_9bit",
"bnb_FP4",
"bnb_NF4",
"awq_gemm",
"awq_gemv"
],
"title": "Quant Type",
"type": "string"
},
"w_bit": {
"default": 4,
"description": "W_bit quantization",
"title": "W Bit",
"type": "integer"
},
"group_size": {
"default": 128,
"description": "Group size quantization.",
"title": "Group Size",
"type": "integer"
},
"lora_layers": {
"default": [],
"description": "List of layers to be replaced by LoRa layers. E.g. ['linear_values', 'linear_query'] (\u00a74.2 in https://arxiv.org/abs/2106.09685)",
"items": {
"type": "string"
},
"title": "Lora Layers",
"type": "array"
},
"lora_embedding": {
"default": false,
"description": "Replace embeddings with LoRa Embeddings (\u00a75.1)",
"title": "Lora Embedding",
"type": "boolean"
},
"lora_rank": {
"default": 2,
"description": "r=2 successfully tested with NLLB-200 3.3B",
"title": "Lora Rank",
"type": "integer"
},
"lora_alpha": {
"default": 1,
"description": "\u00a74.1 https://arxiv.org/abs/2106.09685",
"title": "Lora Alpha",
"type": "integer"
},
"lora_dropout": {
"default": 0.0,
"description": "Rule of thumb: same value as in main model.",
"title": "Lora Dropout",
"type": "number"
},
"optim": {
"default": "sgd",
"description": "Optimization method.",
"enum": [
"sgd",
"adagrad",
"adadelta",
"adam",
"sparseadam",
"adafactor",
"fusedadam",
"adamw8bit",
"pagedadamw8bit",
"pagedadamw32bit"
],
"title": "Optim",
"type": "string"
},
"adagrad_accumulator_init": {
"default": 0,
"description": "Initialize the accumulator values in adagrad. Mirrors initial_accumulator_value flag from tensorflow adagrad implementation (default 0.1 there).",
"title": "Adagrad Accumulator Init",
"type": "number"
},
"adam_beta1": {
"default": 0.9,
"description": "Beta1 parameter used by Adam. Almost without exception a value of 0.9 is used in the literature, seemingly giving good results, so we would discourage changing this value from the default without due consideration.",
"title": "Adam Beta1",
"type": "number"
},
"adam_beta2": {
"default": 0.999,
"description": "Beta2 parameter used by Adam. Typically a value of 0.999 is recommended, as this is the value suggested by the original paper describing Adam, and is also the value adopted in other frameworks such as Tensorflow (https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) and Keras (https://keras.io/optimizers/). Whereas recently the paper Attention is All You Need suggested a value of 0.98 for beta2, this parameter may not work well for normal models / default baselines.",
"title": "Adam Beta2",
"type": "number"
},
"learning_rate": {
"default": 1.0,
"description": "Starting learning rate. Recommended settings: sgd=1, adagrad=0.1, adadelta=1, adam=0.001.",
"title": "Learning Rate",
"type": "number"
},
"learning_rate_decay": {
"default": 0.5,
"description": "Decay learning rate by this much if steps have gone past start_decay_steps.",
"title": "Learning Rate Decay",
"type": "number"
},
"start_decay_steps": {
"default": 50000,
"description": "Start decaying every decay_steps after this many steps.",
"title": "Start Decay Steps",
"type": "integer"
},
"decay_steps": {
"default": 10000,
"description": "Frequency for learning rate decay, in steps.",
"title": "Decay Steps",
"type": "integer"
},
"decay_method": {
"default": "none",
"description": "Custom decay method to use.",
"enum": [
"noam",
"noamwd",
"rsqrt",
"none"
],
"title": "Decay Method",
"type": "string"
},
"warmup_steps": {
"default": 4000,
"description": "Number of warmup steps for custom decay.",
"title": "Warmup Steps",
"type": "integer"
},
"reset_optim": {
"default": "none",
"description": "Optimization resetter when using train_from.",
"enum": [
"none",
"all",
"states",
"keep_states"
],
"title": "Reset Optim",
"type": "string"
},
"gpu_ranks": {
"default": [],
"description": "List of ranks for each process.",
"items": {
"type": "integer"
},
"title": "Gpu Ranks",
"type": "array"
},
"world_size": {
"default": 1,
"description": "Total number of distributed processes.",
"title": "World Size",
"type": "integer"
},
"parallel_mode": {
"default": "data_parallel",
"description": "Distributed mode.",
"enum": [
"data_parallel",
"tensor_parallel"
],
"title": "Parallel Mode",
"type": "string"
},
"gpu_backend": {
"default": "nccl",
"description": "Type of torch distributed backend.",
"title": "Gpu Backend",
"type": "string"
},
"gpu_verbose_level": {
"default": 0,
"description": "Gives more info on each process per GPU.",
"title": "Gpu Verbose Level",
"type": "integer"
},
"master_ip": {
"default": "localhost",
"description": "IP of master for torch.distributed training.",
"title": "Master Ip",
"type": "string"
},
"master_port": {
"default": 10000,
"description": "Port of master for torch.distributed training.",
"title": "Master Port",
"type": "integer"
},
"timeout": {
"default": 60,
"description": "Timeout for one GPU to wait for the others.",
"title": "Timeout",
"type": "integer"
},
"model_path": {
"default": "model",
"description": "Path to directory containing all model components.",
"title": "Model Path",
"type": "string"
},
"self_attn_backend": {
"default": "flash",
"description": "Self-attention backend.",
"enum": [
"flash",
"pytorch"
],
"title": "Self Attn Backend",
"type": "string"
},
"compute_dtype": {
"description": "Compute dtype (precision) to use for main compute. Some parameters might have other dtypes for specific cases (e.g. torch.amp -- See eole.config.training.TrainingConfig.storage_dtype) fp32 to force slow fp16 model on gtx1080, int8 to enable pytorch native 8-bit quantization (cpu only).",
"enum": [
"fp32",
"fp16",
"int8",
"bf16"
],
"title": "Compute Dtype",
"type": "string"
},
"param_init": {
"default": 0.1,
"description": "Support value for uniform distribution parameters initialization. Set to 0 not to use initialization.",
"title": "Param Init",
"type": "number"
},
"param_init_glorot": {
"default": false,
"description": "Initialize parameters with xavier_uniform. Required for transformer.",
"title": "Param Init Glorot",
"type": "boolean"
},
"freeze_encoder": {
"default": false,
"description": "Freeze parameters in encoder.",
"title": "Freeze Encoder",
"type": "boolean"
},
"freeze_decoder": {
"default": false,
"description": "Freeze parameters in decoder.",
"title": "Freeze Decoder",
"type": "boolean"
},
"pre_word_vecs_enc": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "If a valid path is specified, will load pretrained word embeddings on the encoder side.",
"title": "Pre Word Vecs Enc"
},
"pre_word_vecs_dec": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "If a valid path is specified, will load pretrained word embeddings on the decoder side.",
"title": "Pre Word Vecs Dec"
},
"data_type": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "text",
"title": "Data Type"
},
"bucket_size": {
"default": 262144,
"description": "A bucket is a buffer of bucket_size examples to pick from the various corpora. The dynamic iterator batches batch_size items from the bucket and shuffle them.",
"title": "Bucket Size",
"type": "integer"
},
"bucket_size_init": {
"default": -1,
"description": "Bucket size is initialized with this amount of examples (see bucket_size_increment).",
"title": "Bucket Size Init",
"type": "integer"
},
"bucket_size_increment": {
"default": 0,
"description": "Bucket size incremented with this amount of examples at each new bucket (up to bucket_size).",
"title": "Bucket Size Increment",
"type": "integer"
},
"prefetch_factor": {
"default": 200,
"description": "Number of mini-batches loaded in advance to avoid the GPU waiting during processing of next bucket.",
"title": "Prefetch Factor",
"type": "integer"
},
"save_format": {
"default": "pytorch",
"description": "Format to save the model weights.",
"enum": [
"pytorch",
"safetensors"
],
"title": "Save Format",
"type": "string"
},
"save_checkpoint_steps": {
"default": 5000,
"description": "Frequency of checkpoint saving (in steps).",
"title": "Save Checkpoint Steps",
"type": "integer"
},
"keep_checkpoint": {
"default": -1,
"description": "Number of checkpoints to retain. (-1 retains all)",
"title": "Keep Checkpoint",
"type": "integer"
},
"train_from": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Pretrained model/checkpoint weights to continue training from.",
"title": "Train From"
},
"num_workers": {
"default": 2,
"description": "Number of workers for pytorch.DataLoader objects.",
"title": "Num Workers",
"type": "integer"
},
"batch_size": {
"default": 64,
"description": "Maximum batch size for training.",
"title": "Batch Size",
"type": "integer"
},
"batch_size_multiple": {
"default": 1,
"description": "Batch size multiple for token batches.",
"title": "Batch Size Multiple",
"type": "integer"
},
"batch_type": {
"default": "sents",
"description": "Batch grouping for batch_size.",
"enum": [
"sents",
"tokens"
],
"title": "Batch Type",
"type": "string"
},
"normalization": {
"default": "sents",
"description": "Normalization method of the gradient.",
"enum": [
"sents",
"tokens"
],
"title": "Normalization",
"type": "string"
},
"accum_count": {
"default": [
1
],
"description": "Accumulate gradient this many times. Approximately equivalent to updating batch_size * accum_count batches at once. Recommended for transformer.",
"items": {
"type": "integer"
},
"title": "Accum Count",
"type": "array"
},
"accum_steps": {
"default": [
0
],
"description": "Steps at which accum_count values change.",
"items": {
"type": "integer"
},
"title": "Accum Steps",
"type": "array"
},
"valid_steps": {
"default": 10000,
"description": "Frequency of validation, in steps.",
"title": "Valid Steps",
"type": "integer"
},
"valid_batch_size": {
"default": 32,
"description": "Maximum batch size for validation.",
"title": "Valid Batch Size",
"type": "integer"
},
"train_steps": {
"default": 100000,
"description": "Number of training steps.",
"title": "Train Steps",
"type": "integer"
},
"single_pass": {
"default": false,
"description": "Make a single pass over the training dataset.",
"title": "Single Pass",
"type": "boolean"
},
"early_stopping": {
"default": 0,
"description": "Number of validation steps without improving that will trigger early stop of training.",
"title": "Early Stopping",
"type": "integer"
},
"early_stopping_criteria": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Criteria to use for early stopping.",
"title": "Early Stopping Criteria"
},
"max_grad_norm": {
"default": 5,
"description": "If the norm of the gradient vector exceeds this value, renormalize it to have the norm equal to max_grad_norm.",
"title": "Max Grad Norm",
"type": "number"
},
"dropout": {
"default": [
0.3
],
"description": "Dropout probability.",
"items": {
"type": "number"
},
"title": "Dropout",
"type": "array"
},
"attention_dropout": {
"default": [
0.1
],
"description": "Attention dropout probability.",
"items": {
"type": "number"
},
"title": "Attention Dropout",
"type": "array"
},
"dropout_steps": {
"default": [
0
],
"description": "Steps at which dropout changes.",
"items": {
"type": "integer"
},
"title": "Dropout Steps",
"type": "array"
},
"truncated_decoder": {
"default": 0,
"description": "Truncated bptt.",
"title": "Truncated Decoder",
"type": "integer"
},
"label_smoothing": {
"default": 0.0,
"description": "Label smoothing value epsilon. Probability of all non-true labels will be smoothed by epsilon/(vocab_size-1). Set to 0 to turn off label smoothing. (https://arxiv.org/abs/1512.00567)",
"title": "Label Smoothing",
"type": "number"
},
"average_decay": {
"default": 0.0,
"description": "Exponential moving average decay (https://en.wikipedia.org/wiki/Moving_average). Set to other than 0 (e.g. 1e-4) to activate. Similar to Marian NMT implementation (http://www.aclweb.org/anthology/P18-4020).",
"title": "Average Decay",
"type": "number"
},
"average_every": {
"default": 1,
"description": "Step for moving average. Default is every update if average_decay is set.",
"title": "Average Every",
"type": "integer"
},
"loss_scale": {
"default": 0.0,
"description": "For FP16 training, the static loss scale to use. If not set, the loss scale is dynamically computed.",
"title": "Loss Scale",
"type": "number"
},
"apex_opt_level": {
"default": "",
"description": "For FP16 training, the opt_level to use. See https://nvidia.github.io/apex/amp.html#opt-levels.",
"enum": [
"",
"O0",
"O1",
"O2",
"O3"
],
"title": "Apex Opt Level",
"type": "string"
},
"zero_out_prompt_loss": {
"default": false,
"description": "Set the prompt loss to zero. Mostly for LLM finetuning. Will be enabled only if the `insert_mask_before_placeholder` transform is applied.",
"title": "Zero Out Prompt Loss",
"type": "boolean"
},
"use_ckpting": {
"default": [],
"description": "Use gradient checkpointing for those modules.",
"items": {
"type": "string"
},
"title": "Use Ckpting",
"type": "array"
},
"update_vocab": {
"default": false,
"description": "Update source and target existing vocabularies.",
"title": "Update Vocab",
"type": "boolean"
},
"lm_prior_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "LM model to use to train the TM.",
"title": "Lm Prior Model"
},
"lm_prior_lambda": {
"default": 0.0,
"description": "LM Prior Lambda",
"title": "Lm Prior Lambda",
"type": "number"
},
"lm_prior_tau": {
"default": 1.0,
"description": "LM Prior Tau",
"title": "Lm Prior Tau",
"type": "number"
},
"estim_loss_lambda": {
"default": [
1.0
],
"description": "Weight applied to estimator loss",
"items": {
"type": "number"
},
"title": "Estim Loss Lambda",
"type": "array"
},
"estim_loss_lambda_steps": {
"default": [
0
],
"description": "Steps at which estimator loss lambda changes",
"items": {
"type": "integer"
},
"title": "Estim Loss Lambda Steps",
"type": "array"
},
"score_threshold": {
"default": 0.68,
"description": "Threshold to filterout data",
"title": "Score Threshold",
"type": "number"
}
},
"title": "TrainingConfig",
"type": "object"
},
"TransformerDecoderConfig": {
"additionalProperties": false,
"properties": {
"decoder_type": {
"const": "transformer",
"default": "transformer",
"enum": [
"transformer"
],
"title": "Decoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the decoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of decoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"tgt_word_vec_size": {
"default": 512,
"description": "Word embedding size for tgt.",
"title": "Tgt Word Vec Size",
"type": "integer"
},
"coverage_attn": {
"default": false,
"description": "Train a coverage attention layer.",
"title": "Coverage Attn",
"type": "boolean"
},
"lambda_coverage": {
"default": 0.0,
"description": "Lambda value for coverage loss of See et al (2017)",
"title": "Lambda Coverage",
"type": "number"
},
"global_attention": {
"default": "general",
"description": "The attention type to use. (Luong=general, Bahdanau=MLP)",
"enum": [
"dot",
"general",
"mlp",
null
],
"title": "Global Attention"
},
"global_attention_function": {
"default": "softmax",
"description": "Global attention function to use.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Global Attention Function",
"type": "string"
},
"sliding_window": {
"default": 0,
"description": "Sliding window for transformer self-attention.",
"title": "Sliding Window",
"type": "integer"
},
"heads": {
"default": 8,
"description": "Number of heads for transformer self-attention.",
"title": "Heads",
"type": "integer"
},
"transformer_ff": {
"default": 2048,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"relative_positions_buckets": {
"default": 0,
"description": "Enable relative position bias (https://github.com/google-research/text-to-text-transfer-transformer).",
"title": "Relative Positions Buckets",
"type": "integer"
},
"mlp_activation_fn": {
"$ref": "#/$defs/ActivationFunction",
"default": "relu",
"description": "The activation function to use in MLP layer."
},
"layer_norm": {
"default": "standard",
"description": "Type of layer normalization in transformer architecture.",
"enum": [
"standard",
"rms"
],
"title": "Layer Norm",
"type": "string"
},
"norm_eps": {
"default": 1e-06,
"description": "Layer norm epsilon.",
"title": "Norm Eps",
"type": "number"
},
"shared_layer_norm": {
"default": false,
"description": "Use a shared layer_norm in parallel residual attention. Note: must be True for Falcon 7B, False for Falcon 40B, same for GPT-J and GPT-NeoX models.",
"title": "Shared Layer Norm",
"type": "boolean"
},
"add_qkvbias": {
"default": false,
"description": "Add bias to nn.Linear of Query/Key/Value in MHA. Note: this will add bias to output projection layer too.",
"title": "Add Qkvbias",
"type": "boolean"
},
"heads_kv": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number of heads for KV. heads_kv=heads if None, else number of heads for KV(e.g. Falcon 40B)",
"title": "Heads Kv"
},
"head_dim": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Head dimension when this needs to be different vs hidden_size // heads",
"title": "Head Dim"
},
"add_ffnbias": {
"default": false,
"description": "Add bias to nn.Linear of MLP FFN.",
"title": "Add Ffnbias",
"type": "boolean"
},
"parallel_residual": {
"default": false,
"description": "Use parallel residual in decoder layer. Note: this is used by GPT-J / Falcon Architecture.",
"title": "Parallel Residual",
"type": "boolean"
},
"num_experts": {
"default": 0,
"description": "Number of experts for MoE models.",
"title": "Num Experts",
"type": "integer"
},
"num_experts_per_tok": {
"default": 2,
"description": "Number of experts per token.",
"title": "Num Experts Per Tok",
"type": "integer"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"rope_config": {
"anyOf": [
{
"$ref": "#/$defs/RotaryPositionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Rotary position config, if relevant."
},
"aan_useffn": {
"default": false,
"description": "Turn on the FFN layer in the AAN decoder.",
"title": "Aan Useffn",
"type": "boolean"
},
"alignment_layer": {
"default": -2,
"description": "Layer number which has to be supervised.",
"title": "Alignment Layer",
"type": "integer"
},
"alignment_heads": {
"default": 0,
"description": "Number of cross attention heads per layer to supervise with.",
"title": "Alignment Heads",
"type": "integer"
},
"full_context_alignment": {
"default": false,
"description": "Whether alignment is conditioned on full target context.",
"title": "Full Context Alignment",
"type": "boolean"
},
"lambda_align": {
"default": 0.0,
"description": "Lambda value for alignement loss of Garg et al, 2019 (https://arxiv.org/abs/1909.02074)",
"title": "Lambda Align",
"type": "number"
}
},
"title": "TransformerDecoderConfig",
"type": "object"
},
"TransformerEncoderConfig": {
"additionalProperties": false,
"properties": {
"encoder_type": {
"const": "transformer",
"default": "transformer",
"enum": [
"transformer"
],
"title": "Encoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the encoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of encoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"src_word_vec_size": {
"default": 512,
"description": "Word embedding size for src.",
"title": "Src Word Vec Size",
"type": "integer"
},
"sliding_window": {
"default": 0,
"description": "Sliding window for transformer self-attention.",
"title": "Sliding Window",
"type": "integer"
},
"heads": {
"default": 8,
"description": "Number of heads for transformer self-attention.",
"title": "Heads",
"type": "integer"
},
"transformer_ff": {
"default": 2048,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"relative_positions_buckets": {
"default": 0,
"description": "Enable relative position bias (https://github.com/google-research/text-to-text-transfer-transformer).",
"title": "Relative Positions Buckets",
"type": "integer"
},
"mlp_activation_fn": {
"$ref": "#/$defs/ActivationFunction",
"default": "relu",
"description": "The activation function to use in MLP layer."
},
"layer_norm": {
"default": "standard",
"description": "Type of layer normalization in transformer architecture.",
"enum": [
"standard",
"rms"
],
"title": "Layer Norm",
"type": "string"
},
"norm_eps": {
"default": 1e-06,
"description": "Layer norm epsilon.",
"title": "Norm Eps",
"type": "number"
},
"shared_layer_norm": {
"default": false,
"description": "Use a shared layer_norm in parallel residual attention. Note: must be True for Falcon 7B, False for Falcon 40B, same for GPT-J and GPT-NeoX models.",
"title": "Shared Layer Norm",
"type": "boolean"
},
"add_qkvbias": {
"default": false,
"description": "Add bias to nn.Linear of Query/Key/Value in MHA. Note: this will add bias to output projection layer too.",
"title": "Add Qkvbias",
"type": "boolean"
},
"heads_kv": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number of heads for KV. heads_kv=heads if None, else number of heads for KV(e.g. Falcon 40B)",
"title": "Heads Kv"
},
"head_dim": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Head dimension when this needs to be different vs hidden_size // heads",
"title": "Head Dim"
},
"add_ffnbias": {
"default": false,
"description": "Add bias to nn.Linear of MLP FFN.",
"title": "Add Ffnbias",
"type": "boolean"
},
"parallel_residual": {
"default": false,
"description": "Use parallel residual in decoder layer. Note: this is used by GPT-J / Falcon Architecture.",
"title": "Parallel Residual",
"type": "boolean"
},
"num_experts": {
"default": 0,
"description": "Number of experts for MoE models.",
"title": "Num Experts",
"type": "integer"
},
"num_experts_per_tok": {
"default": 2,
"description": "Number of experts per token.",
"title": "Num Experts Per Tok",
"type": "integer"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"rope_config": {
"anyOf": [
{
"$ref": "#/$defs/RotaryPositionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Rotary position config, if relevant."
}
},
"title": "TransformerEncoderConfig",
"type": "object"
},
"TransformerEncoderModelConfig": {
"additionalProperties": false,
"description": "Facilitate setting some transformer specific params at model level.",
"properties": {
"embeddings": {
"$ref": "#/$defs/EmbeddingsConfig",
"description": "Contains most of the args useful to build the Embeddings module."
},
"encoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"brnn": "#/$defs/RnnEncoderConfig",
"cnn": "#/$defs/CnnEncoderConfig",
"mean": "#/$defs/MeanEncoderConfig",
"rnn": "#/$defs/RnnEncoderConfig",
"transformer": "#/$defs/TransformerEncoderConfig"
},
"propertyName": "encoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerEncoderConfig"
},
{
"$ref": "#/$defs/RnnEncoderConfig"
},
{
"$ref": "#/$defs/CnnEncoderConfig"
},
{
"$ref": "#/$defs/MeanEncoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of an encoder.",
"title": "Encoder"
},
"decoder": {
"default": null,
"description": "Major parameters of a decoder.",
"title": "Decoder",
"type": "null"
},
"hidden_size": {
"default": -1,
"description": "Size of hidden states. Overwrites [encoder/decoder].hidden_size if set.",
"title": "Hidden Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"layers": {
"default": -1,
"description": "Number of layers in both encoder and decoder (will overwrite enc_layers/dec_layers).",
"title": "Layers",
"type": "integer"
},
"transformer_ff": {
"default": 2048,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"share_decoder_embeddings": {
"default": false,
"description": "Use a share weight matrix for the input and output word embeddings in the decoder.",
"title": "Share Decoder Embeddings",
"type": "boolean"
},
"share_embeddings": {
"default": false,
"description": "Share the word embeddings between encoder and decoder. Need to use shared vocabulary for this option.",
"title": "Share Embeddings",
"type": "boolean"
},
"input_feed": {
"default": 1,
"description": "Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.",
"title": "Input Feed",
"type": "integer"
},
"generator_function": {
"default": "softmax",
"description": "Which function to use for generating probabilities over the target vocabulary.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Generator Function",
"type": "string"
},
"add_estimator": {
"default": false,
"description": "Add estimator layer",
"title": "Add Estimator",
"type": "boolean"
},
"left_pad": {
"default": false,
"description": "Enable left-padding, useful for some LLMs.",
"title": "Left Pad",
"type": "boolean"
},
"architecture": {
"const": "transformer_encoder",
"default": "transformer_encoder",
"enum": [
"transformer_encoder"
],
"title": "Architecture",
"type": "string"
},
"sliding_window": {
"default": 0,
"description": "Sliding window for transformer self-attention.",
"title": "Sliding Window",
"type": "integer"
},
"heads": {
"default": 8,
"description": "Number of heads for transformer self-attention.",
"title": "Heads",
"type": "integer"
},
"relative_positions_buckets": {
"default": 0,
"description": "Enable relative position bias (https://github.com/google-research/text-to-text-transfer-transformer).",
"title": "Relative Positions Buckets",
"type": "integer"
},
"mlp_activation_fn": {
"$ref": "#/$defs/ActivationFunction",
"default": "relu",
"description": "The activation function to use in MLP layer."
},
"layer_norm": {
"default": "standard",
"description": "Type of layer normalization in transformer architecture.",
"enum": [
"standard",
"rms"
],
"title": "Layer Norm",
"type": "string"
},
"norm_eps": {
"default": 1e-06,
"description": "Layer norm epsilon.",
"title": "Norm Eps",
"type": "number"
},
"shared_layer_norm": {
"default": false,
"description": "Use a shared layer_norm in parallel residual attention. Note: must be True for Falcon 7B, False for Falcon 40B, same for GPT-J and GPT-NeoX models.",
"title": "Shared Layer Norm",
"type": "boolean"
},
"add_qkvbias": {
"default": false,
"description": "Add bias to nn.Linear of Query/Key/Value in MHA. Note: this will add bias to output projection layer too.",
"title": "Add Qkvbias",
"type": "boolean"
},
"heads_kv": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number of heads for KV. heads_kv=heads if None, else number of heads for KV(e.g. Falcon 40B)",
"title": "Heads Kv"
},
"head_dim": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Head dimension when this needs to be different vs hidden_size // heads",
"title": "Head Dim"
},
"add_ffnbias": {
"default": false,
"description": "Add bias to nn.Linear of MLP FFN.",
"title": "Add Ffnbias",
"type": "boolean"
},
"parallel_residual": {
"default": false,
"description": "Use parallel residual in decoder layer. Note: this is used by GPT-J / Falcon Architecture.",
"title": "Parallel Residual",
"type": "boolean"
},
"num_experts": {
"default": 0,
"description": "Number of experts for MoE models.",
"title": "Num Experts",
"type": "integer"
},
"num_experts_per_tok": {
"default": 2,
"description": "Number of experts per token.",
"title": "Num Experts Per Tok",
"type": "integer"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"rope_config": {
"anyOf": [
{
"$ref": "#/$defs/RotaryPositionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Rotary position config, if relevant."
}
},
"title": "TransformerEncoderModelConfig",
"type": "object"
},
"TransformerLMDecoderConfig": {
"additionalProperties": false,
"description": "Right now just wraps TransformerDecoderConfig for simplicity.\nMight merge in a single class later once TransformerLM path is clarified.",
"properties": {
"decoder_type": {
"const": "transformer_lm",
"default": "transformer_lm",
"enum": [
"transformer_lm"
],
"title": "Decoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the decoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of decoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"tgt_word_vec_size": {
"default": 512,
"description": "Word embedding size for tgt.",
"title": "Tgt Word Vec Size",
"type": "integer"
},
"coverage_attn": {
"default": false,
"description": "Train a coverage attention layer.",
"title": "Coverage Attn",
"type": "boolean"
},
"lambda_coverage": {
"default": 0.0,
"description": "Lambda value for coverage loss of See et al (2017)",
"title": "Lambda Coverage",
"type": "number"
},
"global_attention": {
"default": "general",
"description": "The attention type to use. (Luong=general, Bahdanau=MLP)",
"enum": [
"dot",
"general",
"mlp",
null
],
"title": "Global Attention"
},
"global_attention_function": {
"default": "softmax",
"description": "Global attention function to use.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Global Attention Function",
"type": "string"
},
"sliding_window": {
"default": 0,
"description": "Sliding window for transformer self-attention.",
"title": "Sliding Window",
"type": "integer"
},
"heads": {
"default": 8,
"description": "Number of heads for transformer self-attention.",
"title": "Heads",
"type": "integer"
},
"transformer_ff": {
"default": 2048,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"relative_positions_buckets": {
"default": 0,
"description": "Enable relative position bias (https://github.com/google-research/text-to-text-transfer-transformer).",
"title": "Relative Positions Buckets",
"type": "integer"
},
"mlp_activation_fn": {
"$ref": "#/$defs/ActivationFunction",
"default": "relu",
"description": "The activation function to use in MLP layer."
},
"layer_norm": {
"default": "standard",
"description": "Type of layer normalization in transformer architecture.",
"enum": [
"standard",
"rms"
],
"title": "Layer Norm",
"type": "string"
},
"norm_eps": {
"default": 1e-06,
"description": "Layer norm epsilon.",
"title": "Norm Eps",
"type": "number"
},
"shared_layer_norm": {
"default": false,
"description": "Use a shared layer_norm in parallel residual attention. Note: must be True for Falcon 7B, False for Falcon 40B, same for GPT-J and GPT-NeoX models.",
"title": "Shared Layer Norm",
"type": "boolean"
},
"add_qkvbias": {
"default": false,
"description": "Add bias to nn.Linear of Query/Key/Value in MHA. Note: this will add bias to output projection layer too.",
"title": "Add Qkvbias",
"type": "boolean"
},
"heads_kv": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number of heads for KV. heads_kv=heads if None, else number of heads for KV(e.g. Falcon 40B)",
"title": "Heads Kv"
},
"head_dim": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Head dimension when this needs to be different vs hidden_size // heads",
"title": "Head Dim"
},
"add_ffnbias": {
"default": false,
"description": "Add bias to nn.Linear of MLP FFN.",
"title": "Add Ffnbias",
"type": "boolean"
},
"parallel_residual": {
"default": false,
"description": "Use parallel residual in decoder layer. Note: this is used by GPT-J / Falcon Architecture.",
"title": "Parallel Residual",
"type": "boolean"
},
"num_experts": {
"default": 0,
"description": "Number of experts for MoE models.",
"title": "Num Experts",
"type": "integer"
},
"num_experts_per_tok": {
"default": 2,
"description": "Number of experts per token.",
"title": "Num Experts Per Tok",
"type": "integer"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"rope_config": {
"anyOf": [
{
"$ref": "#/$defs/RotaryPositionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Rotary position config, if relevant."
},
"aan_useffn": {
"default": false,
"description": "Turn on the FFN layer in the AAN decoder.",
"title": "Aan Useffn",
"type": "boolean"
},
"alignment_layer": {
"default": -2,
"description": "Layer number which has to be supervised.",
"title": "Alignment Layer",
"type": "integer"
},
"alignment_heads": {
"default": 0,
"description": "Number of cross attention heads per layer to supervise with.",
"title": "Alignment Heads",
"type": "integer"
},
"full_context_alignment": {
"default": false,
"description": "Whether alignment is conditioned on full target context.",
"title": "Full Context Alignment",
"type": "boolean"
},
"lambda_align": {
"default": 0.0,
"description": "Lambda value for alignement loss of Garg et al, 2019 (https://arxiv.org/abs/1909.02074)",
"title": "Lambda Align",
"type": "number"
}
},
"title": "TransformerLMDecoderConfig",
"type": "object"
},
"TransformerLMModelConfig": {
"additionalProperties": false,
"description": "Facilitate setting some transformer specific params at model level.",
"properties": {
"embeddings": {
"$ref": "#/$defs/EmbeddingsConfig",
"description": "Contains most of the args useful to build the Embeddings module."
},
"encoder": {
"default": null,
"description": "Major parameters of an encoder.",
"title": "Encoder",
"type": "null"
},
"decoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"cnn": "#/$defs/CnnDecoderConfig",
"rnn": "#/$defs/RnnDecoderConfig",
"transformer": "#/$defs/TransformerDecoderConfig",
"transformer_lm": "#/$defs/TransformerLMDecoderConfig"
},
"propertyName": "decoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerDecoderConfig"
},
{
"$ref": "#/$defs/TransformerLMDecoderConfig"
},
{
"$ref": "#/$defs/RnnDecoderConfig"
},
{
"$ref": "#/$defs/CnnDecoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of a decoder.",
"title": "Decoder"
},
"hidden_size": {
"default": -1,
"description": "Size of hidden states. Overwrites [encoder/decoder].hidden_size if set.",
"title": "Hidden Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"layers": {
"default": -1,
"description": "Number of layers in both encoder and decoder (will overwrite enc_layers/dec_layers).",
"title": "Layers",
"type": "integer"
},
"transformer_ff": {
"default": 2048,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"share_decoder_embeddings": {
"default": false,
"description": "Use a share weight matrix for the input and output word embeddings in the decoder.",
"title": "Share Decoder Embeddings",
"type": "boolean"
},
"share_embeddings": {
"default": false,
"description": "Share the word embeddings between encoder and decoder. Need to use shared vocabulary for this option.",
"title": "Share Embeddings",
"type": "boolean"
},
"input_feed": {
"default": 1,
"description": "Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.",
"title": "Input Feed",
"type": "integer"
},
"generator_function": {
"default": "softmax",
"description": "Which function to use for generating probabilities over the target vocabulary.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Generator Function",
"type": "string"
},
"add_estimator": {
"default": false,
"description": "Add estimator layer",
"title": "Add Estimator",
"type": "boolean"
},
"left_pad": {
"default": false,
"description": "Enable left-padding, useful for some LLMs.",
"title": "Left Pad",
"type": "boolean"
},
"architecture": {
"const": "transformer_lm",
"default": "transformer_lm",
"enum": [
"transformer_lm"
],
"title": "Architecture",
"type": "string"
},
"sliding_window": {
"default": 0,
"description": "Sliding window for transformer self-attention.",
"title": "Sliding Window",
"type": "integer"
},
"heads": {
"default": 8,
"description": "Number of heads for transformer self-attention.",
"title": "Heads",
"type": "integer"
},
"relative_positions_buckets": {
"default": 0,
"description": "Enable relative position bias (https://github.com/google-research/text-to-text-transfer-transformer).",
"title": "Relative Positions Buckets",
"type": "integer"
},
"mlp_activation_fn": {
"$ref": "#/$defs/ActivationFunction",
"default": "relu",
"description": "The activation function to use in MLP layer."
},
"layer_norm": {
"default": "standard",
"description": "Type of layer normalization in transformer architecture.",
"enum": [
"standard",
"rms"
],
"title": "Layer Norm",
"type": "string"
},
"norm_eps": {
"default": 1e-06,
"description": "Layer norm epsilon.",
"title": "Norm Eps",
"type": "number"
},
"shared_layer_norm": {
"default": false,
"description": "Use a shared layer_norm in parallel residual attention. Note: must be True for Falcon 7B, False for Falcon 40B, same for GPT-J and GPT-NeoX models.",
"title": "Shared Layer Norm",
"type": "boolean"
},
"add_qkvbias": {
"default": false,
"description": "Add bias to nn.Linear of Query/Key/Value in MHA. Note: this will add bias to output projection layer too.",
"title": "Add Qkvbias",
"type": "boolean"
},
"heads_kv": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number of heads for KV. heads_kv=heads if None, else number of heads for KV(e.g. Falcon 40B)",
"title": "Heads Kv"
},
"head_dim": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Head dimension when this needs to be different vs hidden_size // heads",
"title": "Head Dim"
},
"add_ffnbias": {
"default": false,
"description": "Add bias to nn.Linear of MLP FFN.",
"title": "Add Ffnbias",
"type": "boolean"
},
"parallel_residual": {
"default": false,
"description": "Use parallel residual in decoder layer. Note: this is used by GPT-J / Falcon Architecture.",
"title": "Parallel Residual",
"type": "boolean"
},
"num_experts": {
"default": 0,
"description": "Number of experts for MoE models.",
"title": "Num Experts",
"type": "integer"
},
"num_experts_per_tok": {
"default": 2,
"description": "Number of experts per token.",
"title": "Num Experts Per Tok",
"type": "integer"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"rope_config": {
"anyOf": [
{
"$ref": "#/$defs/RotaryPositionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Rotary position config, if relevant."
}
},
"title": "TransformerLMModelConfig",
"type": "object"
},
"TransformerModelConfig": {
"additionalProperties": false,
"description": "Facilitate setting some transformer specific params at model level.",
"properties": {
"embeddings": {
"$ref": "#/$defs/EmbeddingsConfig",
"description": "Contains most of the args useful to build the Embeddings module."
},
"encoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"brnn": "#/$defs/RnnEncoderConfig",
"cnn": "#/$defs/CnnEncoderConfig",
"mean": "#/$defs/MeanEncoderConfig",
"rnn": "#/$defs/RnnEncoderConfig",
"transformer": "#/$defs/TransformerEncoderConfig"
},
"propertyName": "encoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerEncoderConfig"
},
{
"$ref": "#/$defs/RnnEncoderConfig"
},
{
"$ref": "#/$defs/CnnEncoderConfig"
},
{
"$ref": "#/$defs/MeanEncoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of an encoder.",
"title": "Encoder"
},
"decoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"cnn": "#/$defs/CnnDecoderConfig",
"rnn": "#/$defs/RnnDecoderConfig",
"transformer": "#/$defs/TransformerDecoderConfig",
"transformer_lm": "#/$defs/TransformerLMDecoderConfig"
},
"propertyName": "decoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerDecoderConfig"
},
{
"$ref": "#/$defs/TransformerLMDecoderConfig"
},
{
"$ref": "#/$defs/RnnDecoderConfig"
},
{
"$ref": "#/$defs/CnnDecoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of a decoder.",
"title": "Decoder"
},
"hidden_size": {
"default": -1,
"description": "Size of hidden states. Overwrites [encoder/decoder].hidden_size if set.",
"title": "Hidden Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"layers": {
"default": -1,
"description": "Number of layers in both encoder and decoder (will overwrite enc_layers/dec_layers).",
"title": "Layers",
"type": "integer"
},
"transformer_ff": {
"default": 2048,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"share_decoder_embeddings": {
"default": false,
"description": "Use a share weight matrix for the input and output word embeddings in the decoder.",
"title": "Share Decoder Embeddings",
"type": "boolean"
},
"share_embeddings": {
"default": false,
"description": "Share the word embeddings between encoder and decoder. Need to use shared vocabulary for this option.",
"title": "Share Embeddings",
"type": "boolean"
},
"input_feed": {
"default": 1,
"description": "Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.",
"title": "Input Feed",
"type": "integer"
},
"generator_function": {
"default": "softmax",
"description": "Which function to use for generating probabilities over the target vocabulary.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Generator Function",
"type": "string"
},
"add_estimator": {
"default": false,
"description": "Add estimator layer",
"title": "Add Estimator",
"type": "boolean"
},
"left_pad": {
"default": false,
"description": "Enable left-padding, useful for some LLMs.",
"title": "Left Pad",
"type": "boolean"
},
"architecture": {
"const": "transformer",
"default": "transformer",
"enum": [
"transformer"
],
"title": "Architecture",
"type": "string"
},
"sliding_window": {
"default": 0,
"description": "Sliding window for transformer self-attention.",
"title": "Sliding Window",
"type": "integer"
},
"heads": {
"default": 8,
"description": "Number of heads for transformer self-attention.",
"title": "Heads",
"type": "integer"
},
"relative_positions_buckets": {
"default": 0,
"description": "Enable relative position bias (https://github.com/google-research/text-to-text-transfer-transformer).",
"title": "Relative Positions Buckets",
"type": "integer"
},
"mlp_activation_fn": {
"$ref": "#/$defs/ActivationFunction",
"default": "relu",
"description": "The activation function to use in MLP layer."
},
"layer_norm": {
"default": "standard",
"description": "Type of layer normalization in transformer architecture.",
"enum": [
"standard",
"rms"
],
"title": "Layer Norm",
"type": "string"
},
"norm_eps": {
"default": 1e-06,
"description": "Layer norm epsilon.",
"title": "Norm Eps",
"type": "number"
},
"shared_layer_norm": {
"default": false,
"description": "Use a shared layer_norm in parallel residual attention. Note: must be True for Falcon 7B, False for Falcon 40B, same for GPT-J and GPT-NeoX models.",
"title": "Shared Layer Norm",
"type": "boolean"
},
"add_qkvbias": {
"default": false,
"description": "Add bias to nn.Linear of Query/Key/Value in MHA. Note: this will add bias to output projection layer too.",
"title": "Add Qkvbias",
"type": "boolean"
},
"heads_kv": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number of heads for KV. heads_kv=heads if None, else number of heads for KV(e.g. Falcon 40B)",
"title": "Heads Kv"
},
"head_dim": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Head dimension when this needs to be different vs hidden_size // heads",
"title": "Head Dim"
},
"add_ffnbias": {
"default": false,
"description": "Add bias to nn.Linear of MLP FFN.",
"title": "Add Ffnbias",
"type": "boolean"
},
"parallel_residual": {
"default": false,
"description": "Use parallel residual in decoder layer. Note: this is used by GPT-J / Falcon Architecture.",
"title": "Parallel Residual",
"type": "boolean"
},
"num_experts": {
"default": 0,
"description": "Number of experts for MoE models.",
"title": "Num Experts",
"type": "integer"
},
"num_experts_per_tok": {
"default": 2,
"description": "Number of experts per token.",
"title": "Num Experts Per Tok",
"type": "integer"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"rope_config": {
"anyOf": [
{
"$ref": "#/$defs/RotaryPositionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Rotary position config, if relevant."
}
},
"title": "TransformerModelConfig",
"type": "object"
},
"UpperCaseConfig": {
"additionalProperties": false,
"properties": {
"upper_corpus_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.01,
"description": "Corpus ratio to apply uppercasing.",
"title": "Upper Corpus Ratio"
}
},
"title": "UpperCaseConfig",
"type": "object"
}
},
"additionalProperties": false,
"required": [
"src_vocab",
"data"
]
}

field model : TransformerModelConfig | TransformerLMModelConfig | TransformerEncoderModelConfig | RnnModelConfig | CnnModelConfig | CustomModelConfig | None = None​

field n_sample : int = 0​

Number of transformed samples per corpus to use to build the vocabulary. Set to -1 to use the full corpora.

field training : TrainingConfig | None [Optional]​

field verbose : bool = False​

Print data loading and statistics for all process (default only logs the first process shard).

validator default_architecture Β» all fields[source]​

classmethod get_defaults(architecture)[source]​

get_model_path()[source]​

model_post_init(context: Any, /)​

We need to both initialize private attributes and call the user-defined model_post_init method.

validator str_to_dict Β» model , training[source]​

pydantic model eole.config.run.PredictConfig[source]​

Bases: InferenceConfig, LoggingConfig, MiscConfig

Show JSON schema
{
"title": "PredictConfig",
"type": "object",
"properties": {
"seed": {
"default": -1,
"description": "Set random seed used for better reproducibility between experiments.",
"title": "Seed",
"type": "integer"
},
"log_file": {
"default": "",
"description": "Output logs to a file under this path.",
"title": "Log File",
"type": "string"
},
"report_every": {
"default": 50,
"description": "Print stats at this interval (in steps).",
"title": "Report Every",
"type": "integer"
},
"valid_metrics": {
"default": [],
"description": "List of names of additional validation metrics.",
"items": {
"type": "string"
},
"title": "Valid Metrics",
"type": "array"
},
"scoring_debug": {
"default": false,
"description": "Dump src/ref/pred of the current batch.",
"title": "Scoring Debug",
"type": "boolean"
},
"dump_preds": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Folder to dump predictions to.",
"title": "Dump Preds"
},
"tensorboard": {
"default": false,
"description": "Use tensorboard for visualization during training.",
"title": "Tensorboard",
"type": "boolean"
},
"tensorboard_log_dir": {
"default": "runs/eole",
"description": "Log directory for tensorboard (also the name of the run).",
"title": "Tensorboard Log Dir",
"type": "string"
},
"tensorboard_log_dir_dated": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Tensorboard Log Dir Dated"
},
"quant_layers": {
"default": [],
"description": "List of layers to be compressed in 4/8bit.",
"items": {
"type": "string"
},
"title": "Quant Layers",
"type": "array"
},
"quant_type": {
"default": "",
"description": "Type of compression.",
"enum": [
"",
"bnb_9bit",
"bnb_FP4",
"bnb_NF4",
"awq_gemm",
"awq_gemv"
],
"title": "Quant Type",
"type": "string"
},
"w_bit": {
"default": 4,
"description": "W_bit quantization",
"title": "W Bit",
"type": "integer"
},
"group_size": {
"default": 128,
"description": "Group size quantization.",
"title": "Group Size",
"type": "integer"
},
"lora_layers": {
"default": [],
"description": "List of layers to be replaced by LoRa layers. E.g. ['linear_values', 'linear_query'] (\u00a74.2 in https://arxiv.org/abs/2106.09685)",
"items": {
"type": "string"
},
"title": "Lora Layers",
"type": "array"
},
"lora_embedding": {
"default": false,
"description": "Replace embeddings with LoRa Embeddings (\u00a75.1)",
"title": "Lora Embedding",
"type": "boolean"
},
"lora_rank": {
"default": 2,
"description": "r=2 successfully tested with NLLB-200 3.3B",
"title": "Lora Rank",
"type": "integer"
},
"lora_alpha": {
"default": 1,
"description": "\u00a74.1 https://arxiv.org/abs/2106.09685",
"title": "Lora Alpha",
"type": "integer"
},
"lora_dropout": {
"default": 0.0,
"description": "Rule of thumb: same value as in main model.",
"title": "Lora Dropout",
"type": "number"
},
"beam_size": {
"default": 5,
"description": "Beam size.",
"title": "Beam Size",
"type": "integer"
},
"ratio": {
"default": -0.0,
"description": "Ratio based beam stop condition.",
"title": "Ratio",
"type": "number"
},
"top_k": {
"default": 0,
"description": "Set this to -1 to do random sampling from full distribution. Set this to value k>1 to do random sampling restricted to the k most likely next tokens. Set this to 1 to use argmax.",
"title": "Top K",
"type": "integer"
},
"top_p": {
"default": 0.0,
"description": "Probability for top-p/nucleus sampling. Restrict tokens to the most likely until the cumulated probability is over p. In range [0,1]. (https://arxiv.org/abs/1904.09751)",
"lte": 1.0,
"minimum": 0.0,
"title": "Top P",
"type": "number"
},
"temperature": {
"default": 1.0,
"description": "If doing random sampling, divide the logits by this before computing softmax during decoding.",
"title": "Temperature",
"type": "number"
},
"length_penalty": {
"default": "avg",
"description": "Length penalty to use.",
"enum": [
"avg",
"wu",
"none"
],
"title": "Length Penalty",
"type": "string"
},
"alpha": {
"default": 1.0,
"description": "Length penalty parameter (higher = longer generation)",
"title": "Alpha",
"type": "number"
},
"coverage_penalty": {
"default": "none",
"description": "Coverage penalty to use. Only available in beam search.",
"enum": [
"none",
"wu",
"summary"
],
"title": "Coverage Penalty",
"type": "string"
},
"beta": {
"default": -0.0,
"description": "Coverage penalty parameter.",
"title": "Beta",
"type": "number"
},
"stepwise_penalty": {
"default": false,
"description": "Apply coverage penalty at every decoding step. Helpful for summary penalty.",
"title": "Stepwise Penalty",
"type": "boolean"
},
"min_length": {
"default": 0,
"description": "Minimum prediction length.",
"minimum": 0,
"title": "Min Length",
"type": "integer"
},
"max_length": {
"default": 250,
"description": "Maximum prediction length.",
"title": "Max Length",
"type": "integer"
},
"max_length_ratio": {
"default": 2,
"description": "Maximum prediction length ratio. For European languages, 2 is large enough, for target Asian charageters, need to increase to 2-3, for special languages (Burmese, Amharic) to 10.",
"minimum": 1.0,
"title": "Max Length Ratio",
"type": "number"
},
"block_ngram_repeat": {
"default": 0,
"description": "Block repetition of ngrams during decoding.",
"title": "Block Ngram Repeat",
"type": "integer"
},
"ignore_when_blocking": {
"default": [],
"description": "Ignore these strings when blocking repeats. You want to block sentence delimiters.",
"items": {
"type": "string"
},
"title": "Ignore When Blocking",
"type": "array"
},
"replace_unk": {
"default": false,
"description": "Replace the generated UNK tokens with the source token that had the highest attention weight. If phrase_table is provided, it will lok up the identified source token and give the corresponding target token. If it is not provided (or the identified source token does not exist in the table), then it will copy the source token.",
"title": "Replace Unk",
"type": "boolean"
},
"ban_unk_token": {
"default": false,
"description": "Prevent unk token generation by setting unk probability to 0.",
"title": "Ban Unk Token",
"type": "boolean"
},
"phrase_table": {
"default": "",
"description": "If phrase_table is provided (with replace_unk), it will look up the identified source token and give the corresponding target token.",
"title": "Phrase Table",
"type": "string"
},
"n_best": {
"default": 1,
"description": "Output the n_best decoded sentences.",
"title": "N Best",
"type": "integer"
},
"dump_beam": {
"default": "",
"description": "File to dump beam information to.",
"title": "Dump Beam",
"type": "string"
},
"verbose": {
"default": false,
"description": "Print scores and predictions for each input.",
"title": "Verbose",
"type": "boolean"
},
"with_score": {
"default": false,
"description": "Add a tab separated score to each output.",
"title": "With Score",
"type": "boolean"
},
"attn_debug": {
"default": false,
"description": "Print best attn for each word.",
"title": "Attn Debug",
"type": "boolean"
},
"align_debug": {
"default": false,
"description": "Print best align for each word.",
"title": "Align Debug",
"type": "boolean"
},
"gpu_ranks": {
"default": [],
"description": "List of ranks for each process.",
"items": {
"type": "integer"
},
"title": "Gpu Ranks",
"type": "array"
},
"world_size": {
"default": 1,
"description": "Total number of distributed processes.",
"title": "World Size",
"type": "integer"
},
"parallel_mode": {
"default": "data_parallel",
"description": "Distributed mode.",
"enum": [
"data_parallel",
"tensor_parallel"
],
"title": "Parallel Mode",
"type": "string"
},
"gpu_backend": {
"default": "nccl",
"description": "Type of torch distributed backend.",
"title": "Gpu Backend",
"type": "string"
},
"gpu_verbose_level": {
"default": 0,
"description": "Gives more info on each process per GPU.",
"title": "Gpu Verbose Level",
"type": "integer"
},
"master_ip": {
"default": "localhost",
"description": "IP of master for torch.distributed training.",
"title": "Master Ip",
"type": "string"
},
"master_port": {
"default": 10000,
"description": "Port of master for torch.distributed training.",
"title": "Master Port",
"type": "integer"
},
"timeout": {
"default": 60,
"description": "Timeout for one GPU to wait for the others.",
"title": "Timeout",
"type": "integer"
},
"model_path": {
"anyOf": [
{
"type": "string"
},
{
"items": {
"type": "string"
},
"type": "array"
}
],
"description": "Path to model .pt file(s). Multiple models can be specified for ensemble decoding.",
"title": "Model Path"
},
"self_attn_backend": {
"default": "flash",
"description": "Self-attention backend.",
"enum": [
"flash",
"pytorch"
],
"title": "Self Attn Backend",
"type": "string"
},
"compute_dtype": {
"description": "Compute dtype (precision) to use for main compute. Some parameters might have other dtypes for specific cases (e.g. torch.amp -- See eole.config.training.TrainingConfig.storage_dtype) fp32 to force slow fp16 model on gtx1080, int8 to enable pytorch native 8-bit quantization (cpu only).",
"enum": [
"fp32",
"fp16",
"int8",
"bf16"
],
"title": "Compute Dtype",
"type": "string"
},
"src": {
"description": "Source file to decode (one line per sequence).",
"title": "Src",
"type": "string"
},
"tgt": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "True target sequences, useful for scoring or prefix decoding.",
"title": "Tgt"
},
"tgt_file_prefix": {
"default": false,
"description": "Generate predictions using provided tgt as prefix.",
"title": "Tgt File Prefix",
"type": "boolean"
},
"output": {
"default": "pred.txt",
"description": "Path to output the predictions (each line will be the decoded sequence).",
"title": "Output",
"type": "string"
},
"report_align": {
"default": false,
"description": "Report alignment for each translation.",
"title": "Report Align",
"type": "boolean"
},
"gold_align": {
"default": false,
"description": "Report alignment between source and gold target. Useful to test the performance of learnt alignments.",
"title": "Gold Align",
"type": "boolean"
},
"report_time": {
"default": false,
"description": "Report some translation time metrics.",
"title": "Report Time",
"type": "boolean"
},
"profile": {
"default": false,
"description": "Report pytorch profiling stats.",
"title": "Profile",
"type": "boolean"
},
"batch_size": {
"default": 30,
"description": "Batch size.",
"title": "Batch Size",
"type": "integer"
},
"batch_type": {
"default": "sents",
"description": "Batch grouping for batch size.",
"enum": [
"sents",
"tokens"
],
"title": "Batch Type",
"type": "string"
},
"avg_raw_probs": {
"default": false,
"description": "If set, during ensembling scores from different models will be combined by averaging their raw probabilities and then taking the log. Otherwise, the log probabilities will be averaged directly. Necessary for models whose output layers can assign zero probability.",
"title": "Avg Raw Probs",
"type": "boolean"
},
"data_type": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "text",
"title": "Data Type"
},
"transforms": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [],
"title": "Transforms"
},
"transforms_configs": {
"anyOf": [
{
"$ref": "#/$defs/NestedAllTransformsConfig"
},
{
"type": "null"
}
]
},
"share_vocab": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"title": "Share Vocab"
},
"src_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Src Subword Vocab"
},
"model": {
"anyOf": [
{
"discriminator": {
"mapping": {
"cnn": "#/$defs/CnnModelConfig",
"custom": "#/$defs/CustomModelConfig",
"rnn": "#/$defs/RnnModelConfig",
"transformer": "#/$defs/TransformerModelConfig",
"transformer_encoder": "#/$defs/TransformerEncoderModelConfig",
"transformer_lm": "#/$defs/TransformerLMModelConfig"
},
"propertyName": "architecture"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerModelConfig"
},
{
"$ref": "#/$defs/TransformerLMModelConfig"
},
{
"$ref": "#/$defs/TransformerEncoderModelConfig"
},
{
"$ref": "#/$defs/RnnModelConfig"
},
{
"$ref": "#/$defs/CnnModelConfig"
},
{
"$ref": "#/$defs/CustomModelConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"title": "Model"
},
"chat_template": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Chat Template"
},
"optional_eos": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [],
"description": "Optional EOS tokens that would stop generation, e.g. <|eot_id|> for Llama3",
"title": "Optional Eos"
}
},
"$defs": {
"ActivationFunction": {
"enum": [
"relu",
"gelu",
"silu",
"gated-gelu",
"gated-silu"
],
"title": "ActivationFunction",
"type": "string"
},
"BARTNoiseConfig": {
"additionalProperties": false,
"properties": {
"permute_sent_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Permute this proportion of sentences (boundaries defined by ['.', '?', '!']) in all inputs.",
"title": "Permute Sent Ratio"
},
"rotate_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Rotate this proportion of inputs.",
"title": "Rotate Ratio"
},
"insert_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Insert this percentage of additional random tokens.",
"title": "Insert Ratio"
},
"random_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Instead of using <mask>, use random token this often.",
"title": "Random Ratio"
},
"mask_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Fraction of words/subwords that will be masked.",
"title": "Mask Ratio"
},
"mask_length": {
"anyOf": [
{
"enum": [
"subword",
"word",
"span-poisson"
],
"type": "string"
},
{
"type": "null"
}
],
"default": "subword",
"description": "Length of masking window to apply.",
"title": "Mask Length"
},
"poisson_lambda": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 3.0,
"description": "Lambda for Poisson distribution to sample span length if `-mask_length` set to span-poisson.",
"title": "Poisson Lambda"
},
"replace_length": {
"anyOf": [
{
"maximum": 1,
"minimum": -1,
"type": "integer"
},
{
"type": "null"
}
],
"default": -1,
"description": "When masking N tokens, replace with 0, 1, or N tokens. (use -1 for N)",
"title": "Replace Length"
}
},
"title": "BARTNoiseConfig",
"type": "object"
},
"BaseTokenizerConfig": {
"additionalProperties": false,
"properties": {
"src_subword_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path of subword model for src (or shared).",
"title": "Src Subword Model"
},
"tgt_subword_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path of subword model for tgt.",
"title": "Tgt Subword Model"
},
"src_subword_nbest": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (source side)",
"title": "Src Subword Nbest"
},
"tgt_subword_nbest": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (target side)",
"title": "Tgt Subword Nbest"
},
"src_subword_alpha": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0,
"description": "Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (source side)",
"title": "Src Subword Alpha"
},
"tgt_subword_alpha": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0,
"description": "Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (target side)",
"title": "Tgt Subword Alpha"
},
"src_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Path to the vocabulary file for src subword. Format: <word>\\t<count> per line.",
"title": "Src Subword Vocab"
},
"tgt_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Path to the vocabulary file for tgt subword. Format: <word>\\t<count> per line.",
"title": "Tgt Subword Vocab"
},
"src_vocab_threshold": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Only produce src subword in src_subword_vocab with frequency >= src_vocab_threshold.",
"title": "Src Vocab Threshold"
},
"tgt_vocab_threshold": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Only produce tgt subword in tgt_subword_vocab with frequency >= tgt_vocab_threshold.",
"title": "Tgt Vocab Threshold"
}
},
"title": "BaseTokenizerConfig",
"type": "object"
},
"CleanConfig": {
"additionalProperties": false,
"properties": {
"src_eq_tgt": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Remove ex src==tgt",
"title": "Src Eq Tgt"
},
"same_char": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Remove ex with same char more than 4 times",
"title": "Same Char"
},
"same_word": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Remove ex with same word more than 3 times",
"title": "Same Word"
},
"scripts_ok": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
"Latin",
"Common"
],
"description": "list of unicodata scripts accepted",
"title": "Scripts Ok"
},
"scripts_nok": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [],
"description": "list of unicodata scripts not accepted",
"title": "Scripts Nok"
},
"src_tgt_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 2.0,
"description": "ratio between src and tgt",
"title": "Src Tgt Ratio"
},
"avg_tok_min": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 3.0,
"description": "average length of tokens min",
"title": "Avg Tok Min"
},
"avg_tok_max": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 20.0,
"description": "average length of tokens max",
"title": "Avg Tok Max"
},
"langid": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [],
"description": "list of languages accepted",
"title": "Langid"
}
},
"title": "CleanConfig",
"type": "object"
},
"CnnDecoderConfig": {
"additionalProperties": false,
"properties": {
"decoder_type": {
"const": "cnn",
"default": "cnn",
"enum": [
"cnn"
],
"title": "Decoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the decoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of decoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"tgt_word_vec_size": {
"default": 512,
"description": "Word embedding size for tgt.",
"title": "Tgt Word Vec Size",
"type": "integer"
},
"coverage_attn": {
"default": false,
"description": "Train a coverage attention layer.",
"title": "Coverage Attn",
"type": "boolean"
},
"lambda_coverage": {
"default": 0.0,
"description": "Lambda value for coverage loss of See et al (2017)",
"title": "Lambda Coverage",
"type": "number"
},
"global_attention": {
"default": "general",
"description": "The attention type to use. (Luong=general, Bahdanau=MLP)",
"enum": [
"dot",
"general",
"mlp",
null
],
"title": "Global Attention"
},
"global_attention_function": {
"default": "softmax",
"description": "Global attention function to use.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Global Attention Function",
"type": "string"
},
"cnn_kernel_width": {
"default": 3,
"description": "Size of windows in the cnn, the kernel_size is (cnn_kernel_width, 1) in convolution layers.",
"title": "Cnn Kernel Width",
"type": "integer"
}
},
"title": "CnnDecoderConfig",
"type": "object"
},
"CnnEncoderConfig": {
"additionalProperties": false,
"properties": {
"encoder_type": {
"const": "cnn",
"default": "cnn",
"enum": [
"cnn"
],
"title": "Encoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the encoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of encoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"src_word_vec_size": {
"default": 512,
"description": "Word embedding size for src.",
"title": "Src Word Vec Size",
"type": "integer"
},
"cnn_kernel_width": {
"default": 3,
"description": "Size of windows in the cnn, the kernel_size is (cnn_kernel_width, 1) in convolution layers.",
"title": "Cnn Kernel Width",
"type": "integer"
}
},
"title": "CnnEncoderConfig",
"type": "object"
},
"CnnModelConfig": {
"additionalProperties": false,
"properties": {
"embeddings": {
"$ref": "#/$defs/EmbeddingsConfig",
"description": "Contains most of the args useful to build the Embeddings module."
},
"encoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"brnn": "#/$defs/RnnEncoderConfig",
"cnn": "#/$defs/CnnEncoderConfig",
"mean": "#/$defs/MeanEncoderConfig",
"rnn": "#/$defs/RnnEncoderConfig",
"transformer": "#/$defs/TransformerEncoderConfig"
},
"propertyName": "encoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerEncoderConfig"
},
{
"$ref": "#/$defs/RnnEncoderConfig"
},
{
"$ref": "#/$defs/CnnEncoderConfig"
},
{
"$ref": "#/$defs/MeanEncoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of an encoder.",
"title": "Encoder"
},
"decoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"cnn": "#/$defs/CnnDecoderConfig",
"rnn": "#/$defs/RnnDecoderConfig",
"transformer": "#/$defs/TransformerDecoderConfig",
"transformer_lm": "#/$defs/TransformerLMDecoderConfig"
},
"propertyName": "decoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerDecoderConfig"
},
{
"$ref": "#/$defs/TransformerLMDecoderConfig"
},
{
"$ref": "#/$defs/RnnDecoderConfig"
},
{
"$ref": "#/$defs/CnnDecoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of a decoder.",
"title": "Decoder"
},
"hidden_size": {
"default": -1,
"description": "Size of hidden states. Overwrites [encoder/decoder].hidden_size if set.",
"title": "Hidden Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"layers": {
"default": -1,
"description": "Number of layers in both encoder and decoder (will overwrite enc_layers/dec_layers).",
"title": "Layers",
"type": "integer"
},
"transformer_ff": {
"default": -1,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"share_decoder_embeddings": {
"default": false,
"description": "Use a share weight matrix for the input and output word embeddings in the decoder.",
"title": "Share Decoder Embeddings",
"type": "boolean"
},
"share_embeddings": {
"default": false,
"description": "Share the word embeddings between encoder and decoder. Need to use shared vocabulary for this option.",
"title": "Share Embeddings",
"type": "boolean"
},
"input_feed": {
"default": 1,
"description": "Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.",
"title": "Input Feed",
"type": "integer"
},
"generator_function": {
"default": "softmax",
"description": "Which function to use for generating probabilities over the target vocabulary.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Generator Function",
"type": "string"
},
"add_estimator": {
"default": false,
"description": "Add estimator layer",
"title": "Add Estimator",
"type": "boolean"
},
"left_pad": {
"default": false,
"description": "Enable left-padding, useful for some LLMs.",
"title": "Left Pad",
"type": "boolean"
},
"architecture": {
"const": "cnn",
"default": "cnn",
"enum": [
"cnn"
],
"title": "Architecture",
"type": "string"
},
"cnn_kernel_width": {
"default": 3,
"description": "Size of windows in the cnn, the kernel_size is (cnn_kernel_width, 1) in convolution layers.",
"title": "Cnn Kernel Width",
"type": "integer"
}
},
"title": "CnnModelConfig",
"type": "object"
},
"CustomModelConfig": {
"additionalProperties": false,
"description": "Wrap anything that does not fit a set common architecture.",
"properties": {
"embeddings": {
"$ref": "#/$defs/EmbeddingsConfig",
"description": "Contains most of the args useful to build the Embeddings module."
},
"encoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"brnn": "#/$defs/RnnEncoderConfig",
"cnn": "#/$defs/CnnEncoderConfig",
"mean": "#/$defs/MeanEncoderConfig",
"rnn": "#/$defs/RnnEncoderConfig",
"transformer": "#/$defs/TransformerEncoderConfig"
},
"propertyName": "encoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerEncoderConfig"
},
{
"$ref": "#/$defs/RnnEncoderConfig"
},
{
"$ref": "#/$defs/CnnEncoderConfig"
},
{
"$ref": "#/$defs/MeanEncoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of an encoder.",
"title": "Encoder"
},
"decoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"cnn": "#/$defs/CnnDecoderConfig",
"rnn": "#/$defs/RnnDecoderConfig",
"transformer": "#/$defs/TransformerDecoderConfig",
"transformer_lm": "#/$defs/TransformerLMDecoderConfig"
},
"propertyName": "decoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerDecoderConfig"
},
{
"$ref": "#/$defs/TransformerLMDecoderConfig"
},
{
"$ref": "#/$defs/RnnDecoderConfig"
},
{
"$ref": "#/$defs/CnnDecoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of a decoder.",
"title": "Decoder"
},
"hidden_size": {
"default": -1,
"description": "Size of hidden states. Overwrites [encoder/decoder].hidden_size if set.",
"title": "Hidden Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"layers": {
"default": -1,
"description": "Number of layers in both encoder and decoder (will overwrite enc_layers/dec_layers).",
"title": "Layers",
"type": "integer"
},
"transformer_ff": {
"default": -1,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"share_decoder_embeddings": {
"default": false,
"description": "Use a share weight matrix for the input and output word embeddings in the decoder.",
"title": "Share Decoder Embeddings",
"type": "boolean"
},
"share_embeddings": {
"default": false,
"description": "Share the word embeddings between encoder and decoder. Need to use shared vocabulary for this option.",
"title": "Share Embeddings",
"type": "boolean"
},
"input_feed": {
"default": 1,
"description": "Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.",
"title": "Input Feed",
"type": "integer"
},
"generator_function": {
"default": "softmax",
"description": "Which function to use for generating probabilities over the target vocabulary.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Generator Function",
"type": "string"
},
"add_estimator": {
"default": false,
"description": "Add estimator layer",
"title": "Add Estimator",
"type": "boolean"
},
"left_pad": {
"default": false,
"description": "Enable left-padding, useful for some LLMs.",
"title": "Left Pad",
"type": "boolean"
},
"architecture": {
"const": "custom",
"default": "custom",
"enum": [
"custom"
],
"title": "Architecture",
"type": "string"
}
},
"title": "CustomModelConfig",
"type": "object"
},
"DocifyConfig": {
"additionalProperties": false,
"properties": {
"doc_length": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 200,
"description": "Number of tokens per doc.",
"title": "Doc Length"
},
"max_context": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Max context segments.",
"title": "Max Context"
}
},
"title": "DocifyConfig",
"type": "object"
},
"EmbeddingsConfig": {
"additionalProperties": false,
"properties": {
"src_word_vec_size": {
"default": 512,
"description": "Word embedding size for src.",
"title": "Src Word Vec Size",
"type": "integer"
},
"tgt_word_vec_size": {
"default": 512,
"description": "Word embedding size for tgt.",
"title": "Tgt Word Vec Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"freeze_word_vecs_enc": {
"default": false,
"description": "Freeze word embeddings on the encoder side.",
"title": "Freeze Word Vecs Enc",
"type": "boolean"
},
"freeze_word_vecs_dec": {
"default": false,
"description": "Freeze word embeddings on the encoder side.",
"title": "Freeze Word Vecs Dec",
"type": "boolean"
},
"position_encoding": {
"default": false,
"description": "Absolute position encoding, see position_encoding_type. Necessary for non-RNN style models.",
"title": "Position Encoding",
"type": "boolean"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"position_shift": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Positions IDS shift before making position embed dirty patch to cover for xlm-roberta-xl",
"title": "Position Shift"
}
},
"title": "EmbeddingsConfig",
"type": "object"
},
"FilterTooLongConfig": {
"additionalProperties": false,
"properties": {
"src_seq_length": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 192,
"description": "Maximum source sequence length.",
"title": "Src Seq Length"
},
"tgt_seq_length": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 192,
"description": "Maximum target sequence length.",
"title": "Tgt Seq Length"
}
},
"title": "FilterTooLongConfig",
"type": "object"
},
"InlineTagsConfig": {
"additionalProperties": false,
"properties": {
"tags_dictionary_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to a flat term dictionary.",
"title": "Tags Dictionary Path"
},
"tags_corpus_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.1,
"description": "Ratio of corpus to augment with tags.",
"title": "Tags Corpus Ratio"
},
"max_tags": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 12,
"description": "Maximum number of tags that can be added to a single sentence.",
"title": "Max Tags"
},
"paired_stag": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5fph_#_beg\uff60",
"description": "The format of an opening paired inline tag. Must include the character #.",
"title": "Paired Stag"
},
"paired_etag": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5fph_#_end\uff60",
"description": "The format of a closing paired inline tag. Must include the character #.",
"title": "Paired Etag"
},
"isolated_tag": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5fph_#_std\uff60",
"description": "The format of an isolated inline tag. Must include the character #.",
"title": "Isolated Tag"
},
"src_delimiter": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5ffuzzy\uff60",
"description": "Any special token used for augmented src sentences. The default is the fuzzy token used in the FuzzyMatch transform.",
"title": "Src Delimiter"
}
},
"title": "InlineTagsConfig",
"type": "object"
},
"InsertMaskBeforePlaceholderConfig": {
"additionalProperties": false,
"properties": {
"response_patterns": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
"Response : \uff5fnewline\uff60"
],
"description": "Response pattern to locate the end of the prompt.",
"title": "Response Patterns"
}
},
"title": "InsertMaskBeforePlaceholderConfig",
"type": "object"
},
"MeanEncoderConfig": {
"additionalProperties": false,
"properties": {
"encoder_type": {
"const": "mean",
"default": "mean",
"enum": [
"mean"
],
"title": "Encoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the encoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of encoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"src_word_vec_size": {
"default": 512,
"description": "Word embedding size for src.",
"title": "Src Word Vec Size",
"type": "integer"
}
},
"title": "MeanEncoderConfig",
"type": "object"
},
"NestedAllTransformsConfig": {
"additionalProperties": false,
"properties": {
"docify": {
"$ref": "#/$defs/DocifyConfig",
"default": {
"doc_length": 200,
"max_context": 1
}
},
"inlinetags": {
"$ref": "#/$defs/InlineTagsConfig",
"default": {
"tags_dictionary_path": null,
"tags_corpus_ratio": 0.1,
"max_tags": 12,
"paired_stag": "\uff5fph_#_beg\uff60",
"paired_etag": "\uff5fph_#_end\uff60",
"isolated_tag": "\uff5fph_#_std\uff60",
"src_delimiter": "\uff5ffuzzy\uff60"
}
},
"terminology": {
"$ref": "#/$defs/TerminologyConfig",
"default": {
"termbase_path": null,
"src_spacy_language_model": null,
"tgt_spacy_language_model": null,
"term_corpus_ratio": 0.3,
"term_example_ratio": 0.2,
"src_term_stoken": "\uff5fsrc_term_start\uff60",
"tgt_term_stoken": "\uff5ftgt_term_start\uff60",
"tgt_term_etoken": "\uff5ftgt_term_end\uff60",
"term_source_delimiter": "\uff5ffuzzy\uff60"
}
},
"bart": {
"$ref": "#/$defs/BARTNoiseConfig",
"default": {
"permute_sent_ratio": 0.0,
"rotate_ratio": 0.0,
"insert_ratio": 0.0,
"random_ratio": 0.0,
"mask_ratio": 0.0,
"mask_length": "subword",
"poisson_lambda": 3.0,
"replace_length": -1
}
},
"uppercase": {
"$ref": "#/$defs/UpperCaseConfig",
"default": {
"upper_corpus_ratio": 0.01
}
},
"clean": {
"$ref": "#/$defs/CleanConfig",
"default": {
"src_eq_tgt": false,
"same_char": false,
"same_word": false,
"scripts_ok": [
"Latin",
"Common"
],
"scripts_nok": [],
"src_tgt_ratio": 2.0,
"avg_tok_min": 3.0,
"avg_tok_max": 20.0,
"langid": []
}
},
"switchout": {
"$ref": "#/$defs/SwitchOutConfig",
"default": {
"switchout_temperature": 1.0
}
},
"tokendrop": {
"$ref": "#/$defs/TokenDropConfig",
"default": {
"tokendrop_temperature": 1.0
}
},
"tokenmask": {
"$ref": "#/$defs/TokenMaskConfig",
"default": {
"tokenmask_temperature": 1.0
}
},
"insert_mask_before_placeholder": {
"$ref": "#/$defs/InsertMaskBeforePlaceholderConfig",
"default": {
"response_patterns": [
"Response : \uff5fnewline\uff60"
]
}
},
"filtertoolong": {
"$ref": "#/$defs/FilterTooLongConfig",
"default": {
"src_seq_length": 192,
"tgt_seq_length": 192
}
},
"prefix": {
"$ref": "#/$defs/PrefixConfig",
"default": {
"src_prefix": "",
"tgt_prefix": ""
}
},
"suffix": {
"$ref": "#/$defs/SuffixConfig",
"default": {
"src_suffix": "",
"tgt_suffix": ""
}
},
"sentencepiece": {
"$ref": "#/$defs/BaseTokenizerConfig",
"default": {
"src_subword_model": null,
"tgt_subword_model": null,
"src_subword_nbest": 1,
"tgt_subword_nbest": 1,
"src_subword_alpha": 0.0,
"tgt_subword_alpha": 0.0,
"src_subword_vocab": "",
"tgt_subword_vocab": "",
"src_vocab_threshold": 0,
"tgt_vocab_threshold": 0
}
},
"bpe": {
"$ref": "#/$defs/BaseTokenizerConfig",
"default": {
"src_subword_model": null,
"tgt_subword_model": null,
"src_subword_nbest": 1,
"tgt_subword_nbest": 1,
"src_subword_alpha": 0.0,
"tgt_subword_alpha": 0.0,
"src_subword_vocab": "",
"tgt_subword_vocab": "",
"src_vocab_threshold": 0,
"tgt_vocab_threshold": 0
}
},
"onmt_tokenize": {
"$ref": "#/$defs/ONMTTokenizerConfig",
"default": {
"src_subword_model": null,
"tgt_subword_model": null,
"src_subword_nbest": 1,
"tgt_subword_nbest": 1,
"src_subword_alpha": 0.0,
"tgt_subword_alpha": 0.0,
"src_subword_vocab": "",
"tgt_subword_vocab": "",
"src_vocab_threshold": 0,
"tgt_vocab_threshold": 0,
"src_subword_type": "none",
"tgt_subword_type": "none",
"src_onmttok_kwargs": {
"mode": "none"
},
"tgt_onmttok_kwargs": {
"mode": "none"
},
"gpt2_pretok": false,
"mapped_tokens": null
}
},
"normalize": {
"$ref": "#/$defs/NormalizeConfig",
"default": {
"src_lang": "",
"tgt_lang": "",
"penn": true,
"norm_quote_commas": true,
"norm_numbers": true,
"pre_replace_unicode_punct": false,
"post_remove_control_chars": false
}
}
},
"title": "NestedAllTransformsConfig",
"type": "object"
},
"NormalizeConfig": {
"additionalProperties": false,
"properties": {
"src_lang": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Source language code",
"title": "Src Lang"
},
"tgt_lang": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Target language code",
"title": "Tgt Lang"
},
"penn": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Penn substitution",
"title": "Penn"
},
"norm_quote_commas": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Normalize quotations and commas",
"title": "Norm Quote Commas"
},
"norm_numbers": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Normalize numbers",
"title": "Norm Numbers"
},
"pre_replace_unicode_punct": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Replace unicode punct",
"title": "Pre Replace Unicode Punct"
},
"post_remove_control_chars": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Remove control chars",
"title": "Post Remove Control Chars"
}
},
"title": "NormalizeConfig",
"type": "object"
},
"ONMTTokenizerConfig": {
"additionalProperties": false,
"properties": {
"src_subword_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path of subword model for src (or shared).",
"title": "Src Subword Model"
},
"tgt_subword_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path of subword model for tgt.",
"title": "Tgt Subword Model"
},
"src_subword_nbest": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (source side)",
"title": "Src Subword Nbest"
},
"tgt_subword_nbest": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (target side)",
"title": "Tgt Subword Nbest"
},
"src_subword_alpha": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0,
"description": "Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (source side)",
"title": "Src Subword Alpha"
},
"tgt_subword_alpha": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0,
"description": "Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (target side)",
"title": "Tgt Subword Alpha"
},
"src_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Path to the vocabulary file for src subword. Format: <word>\\t<count> per line.",
"title": "Src Subword Vocab"
},
"tgt_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Path to the vocabulary file for tgt subword. Format: <word>\\t<count> per line.",
"title": "Tgt Subword Vocab"
},
"src_vocab_threshold": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Only produce src subword in src_subword_vocab with frequency >= src_vocab_threshold.",
"title": "Src Vocab Threshold"
},
"tgt_vocab_threshold": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Only produce tgt subword in tgt_subword_vocab with frequency >= tgt_vocab_threshold.",
"title": "Tgt Vocab Threshold"
},
"src_subword_type": {
"anyOf": [
{
"enum": [
"none",
"sentencepiece",
"bpe"
],
"type": "string"
},
{
"type": "null"
}
],
"default": "none",
"description": "Type of subword model for src (or shared) in pyonmttok.",
"title": "Src Subword Type"
},
"tgt_subword_type": {
"anyOf": [
{
"enum": [
"none",
"sentencepiece",
"bpe"
],
"type": "string"
},
{
"type": "null"
}
],
"default": "none",
"description": "Type of subword model for tgt in pyonmttok.",
"title": "Tgt Subword Type"
},
"src_onmttok_kwargs": {
"anyOf": [
{
"type": "object"
},
{
"type": "null"
}
],
"default": {
"mode": "none"
},
"description": "Other pyonmttok options for src in dict string, except subword related options listed earlier.",
"title": "Src Onmttok Kwargs"
},
"tgt_onmttok_kwargs": {
"anyOf": [
{
"type": "object"
},
{
"type": "null"
}
],
"default": {
"mode": "none"
},
"description": "Other pyonmttok options for tgt in dict string, except subword related options listed earlier.",
"title": "Tgt Onmttok Kwargs"
},
"gpt2_pretok": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Preprocess sentence with byte-level mapping.",
"title": "Gpt2 Pretok"
},
"mapped_tokens": {
"anyOf": [
{
"items": {
"maxItems": 2,
"minItems": 2,
"prefixItems": [
{
"type": "string"
},
{
"type": "string"
}
],
"type": "array"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"description": "Mapped tokens for placeholders preservation",
"title": "Mapped Tokens"
}
},
"title": "ONMTTokenizerConfig",
"type": "object"
},
"PositionEncodingType": {
"enum": [
"SinusoidalInterleaved",
"SinusoidalConcat",
"Learned",
"Relative",
"Rotary",
"Alibi"
],
"title": "PositionEncodingType",
"type": "string"
},
"PrefixConfig": {
"additionalProperties": false,
"properties": {
"src_prefix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "String to prepend to all source examples.",
"title": "Src Prefix"
},
"tgt_prefix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "String to prepend to all target examples.",
"title": "Tgt Prefix"
}
},
"title": "PrefixConfig",
"type": "object"
},
"RnnDecoderConfig": {
"additionalProperties": false,
"properties": {
"decoder_type": {
"const": "rnn",
"default": "rnn",
"enum": [
"rnn"
],
"title": "Decoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the decoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of decoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"tgt_word_vec_size": {
"default": 512,
"description": "Word embedding size for tgt.",
"title": "Tgt Word Vec Size",
"type": "integer"
},
"coverage_attn": {
"default": false,
"description": "Train a coverage attention layer.",
"title": "Coverage Attn",
"type": "boolean"
},
"lambda_coverage": {
"default": 0.0,
"description": "Lambda value for coverage loss of See et al (2017)",
"title": "Lambda Coverage",
"type": "number"
},
"global_attention": {
"default": "general",
"description": "The attention type to use. (Luong=general, Bahdanau=MLP)",
"enum": [
"dot",
"general",
"mlp",
null
],
"title": "Global Attention"
},
"global_attention_function": {
"default": "softmax",
"description": "Global attention function to use.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Global Attention Function",
"type": "string"
},
"bridge": {
"default": false,
"description": "Have an additional layer between the last encoder state and the first decoder state (RNN specific).",
"title": "Bridge",
"type": "boolean"
},
"rnn_type": {
"default": "LSTM",
"description": "The gate type to use in the RNNs.",
"enum": [
"LSTM",
"GRU"
],
"title": "Rnn Type",
"type": "string"
},
"context_gate": {
"default": null,
"description": "Type of context gate to use.",
"enum": [
"source",
"target",
"both",
null
],
"title": "Context Gate"
},
"bidirectional_encoder": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"title": "Bidirectional Encoder"
}
},
"title": "RnnDecoderConfig",
"type": "object"
},
"RnnEncoderConfig": {
"additionalProperties": false,
"properties": {
"encoder_type": {
"default": "rnn",
"enum": [
"rnn",
"brnn"
],
"title": "Encoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the encoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of encoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"src_word_vec_size": {
"default": 512,
"description": "Word embedding size for src.",
"title": "Src Word Vec Size",
"type": "integer"
},
"bridge": {
"default": false,
"description": "Have an additional layer between the last encoder state and the first decoder state (RNN specific).",
"title": "Bridge",
"type": "boolean"
},
"rnn_type": {
"default": "LSTM",
"description": "The gate type to use in the RNNs.",
"enum": [
"LSTM",
"GRU"
],
"title": "Rnn Type",
"type": "string"
}
},
"title": "RnnEncoderConfig",
"type": "object"
},
"RnnModelConfig": {
"additionalProperties": false,
"properties": {
"embeddings": {
"$ref": "#/$defs/EmbeddingsConfig",
"description": "Contains most of the args useful to build the Embeddings module."
},
"encoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"brnn": "#/$defs/RnnEncoderConfig",
"cnn": "#/$defs/CnnEncoderConfig",
"mean": "#/$defs/MeanEncoderConfig",
"rnn": "#/$defs/RnnEncoderConfig",
"transformer": "#/$defs/TransformerEncoderConfig"
},
"propertyName": "encoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerEncoderConfig"
},
{
"$ref": "#/$defs/RnnEncoderConfig"
},
{
"$ref": "#/$defs/CnnEncoderConfig"
},
{
"$ref": "#/$defs/MeanEncoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of an encoder.",
"title": "Encoder"
},
"decoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"cnn": "#/$defs/CnnDecoderConfig",
"rnn": "#/$defs/RnnDecoderConfig",
"transformer": "#/$defs/TransformerDecoderConfig",
"transformer_lm": "#/$defs/TransformerLMDecoderConfig"
},
"propertyName": "decoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerDecoderConfig"
},
{
"$ref": "#/$defs/TransformerLMDecoderConfig"
},
{
"$ref": "#/$defs/RnnDecoderConfig"
},
{
"$ref": "#/$defs/CnnDecoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of a decoder.",
"title": "Decoder"
},
"hidden_size": {
"default": -1,
"description": "Size of hidden states. Overwrites [encoder/decoder].hidden_size if set.",
"title": "Hidden Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"layers": {
"default": -1,
"description": "Number of layers in both encoder and decoder (will overwrite enc_layers/dec_layers).",
"title": "Layers",
"type": "integer"
},
"transformer_ff": {
"default": -1,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"share_decoder_embeddings": {
"default": false,
"description": "Use a share weight matrix for the input and output word embeddings in the decoder.",
"title": "Share Decoder Embeddings",
"type": "boolean"
},
"share_embeddings": {
"default": false,
"description": "Share the word embeddings between encoder and decoder. Need to use shared vocabulary for this option.",
"title": "Share Embeddings",
"type": "boolean"
},
"input_feed": {
"default": 1,
"description": "Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.",
"title": "Input Feed",
"type": "integer"
},
"generator_function": {
"default": "softmax",
"description": "Which function to use for generating probabilities over the target vocabulary.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Generator Function",
"type": "string"
},
"add_estimator": {
"default": false,
"description": "Add estimator layer",
"title": "Add Estimator",
"type": "boolean"
},
"left_pad": {
"default": false,
"description": "Enable left-padding, useful for some LLMs.",
"title": "Left Pad",
"type": "boolean"
},
"architecture": {
"const": "rnn",
"default": "rnn",
"enum": [
"rnn"
],
"title": "Architecture",
"type": "string"
},
"bridge": {
"default": false,
"description": "Have an additional layer between the last encoder state and the first decoder state (RNN specific).",
"title": "Bridge",
"type": "boolean"
},
"rnn_type": {
"default": "LSTM",
"description": "The gate type to use in the RNNs.",
"enum": [
"LSTM",
"GRU"
],
"title": "Rnn Type",
"type": "string"
}
},
"title": "RnnModelConfig",
"type": "object"
},
"RotaryPositionConfig": {
"additionalProperties": false,
"description": "Configuration for rotary position embeddings used in transformer models.",
"properties": {
"rotary_interleave": {
"default": true,
"description": "Interleave the head dimensions when rotary embeddings are applied. Otherwise the head dimensions are sliced in half. (True=default Llama from Meta (original), False= used by all HuggingFace models)",
"title": "Rotary Interleave",
"type": "boolean"
},
"rotary_theta": {
"default": 10000,
"description": "Rotary theta base length, 1e4 for Llama2.Mistral, 1e6 for Mixtral",
"title": "Rotary Theta",
"type": "integer"
},
"rotary_dim": {
"default": 0,
"description": "Rotary dim when model requires it to be different to head dim.",
"title": "Rotary Dim",
"type": "integer"
},
"scaling_type": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Specifies the type of RoPE scaling to be applied, if any.",
"title": "Scaling Type"
},
"scaling_factor": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 8.0,
"description": "Factor by which to scale RoPE embeddings.",
"title": "Scaling Factor"
},
"low_freq_factor": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 1.0,
"description": "Scaling factor applied to the lower frequency components of RoPE.",
"title": "Low Freq Factor"
},
"high_freq_factor": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 4.0,
"description": "Scaling factor applied to the higher frequency components of RoPE.",
"title": "High Freq Factor"
},
"original_max_position_embeddings": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 8192,
"description": "Original maximum position embeddings for RoPE scaling.",
"title": "Original Max Position Embeddings"
}
},
"title": "RotaryPositionConfig",
"type": "object"
},
"SuffixConfig": {
"additionalProperties": false,
"properties": {
"src_suffix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "String to append to all source examples.",
"title": "Src Suffix"
},
"tgt_suffix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "String to append to all target examples.",
"title": "Tgt Suffix"
}
},
"title": "SuffixConfig",
"type": "object"
},
"SwitchOutConfig": {
"additionalProperties": false,
"properties": {
"switchout_temperature": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 1.0,
"description": "Sampling temperature for SwitchOut. :math:`\\tau^{-1}` in :cite:`DBLP:journals/corr/abs-1808-07512`. Smaller value makes data more diverse.",
"title": "Switchout Temperature"
}
},
"title": "SwitchOutConfig",
"type": "object"
},
"TerminologyConfig": {
"additionalProperties": false,
"properties": {
"termbase_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to a dictionary file with terms.",
"title": "Termbase Path"
},
"src_spacy_language_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Name of the spaCy language model for the source corpus.",
"title": "Src Spacy Language Model"
},
"tgt_spacy_language_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Name of the spaCy language model for the target corpus.",
"title": "Tgt Spacy Language Model"
},
"term_corpus_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.3,
"description": "Ratio of corpus to augment with terms.",
"title": "Term Corpus Ratio"
},
"term_example_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.2,
"description": "Maximum terms allowed in an example.",
"title": "Term Example Ratio"
},
"src_term_stoken": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5fsrc_term_start\uff60",
"description": "The source term start token.",
"title": "Src Term Stoken"
},
"tgt_term_stoken": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5ftgt_term_start\uff60",
"description": "The target term start token.",
"title": "Tgt Term Stoken"
},
"tgt_term_etoken": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5ftgt_term_end\uff60",
"description": "The target term end token.",
"title": "Tgt Term Etoken"
},
"term_source_delimiter": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5ffuzzy\uff60",
"description": "Any special token used for augmented source sentences. The default is the fuzzy token used in the FuzzyMatch transform.",
"title": "Term Source Delimiter"
}
},
"title": "TerminologyConfig",
"type": "object"
},
"TokenDropConfig": {
"additionalProperties": false,
"properties": {
"tokendrop_temperature": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 1.0,
"description": "Sampling temperature for token deletion.",
"title": "Tokendrop Temperature"
}
},
"title": "TokenDropConfig",
"type": "object"
},
"TokenMaskConfig": {
"additionalProperties": false,
"properties": {
"tokenmask_temperature": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 1.0,
"description": "Sampling temperature for token masking.",
"title": "Tokenmask Temperature"
}
},
"title": "TokenMaskConfig",
"type": "object"
},
"TransformerDecoderConfig": {
"additionalProperties": false,
"properties": {
"decoder_type": {
"const": "transformer",
"default": "transformer",
"enum": [
"transformer"
],
"title": "Decoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the decoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of decoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"tgt_word_vec_size": {
"default": 512,
"description": "Word embedding size for tgt.",
"title": "Tgt Word Vec Size",
"type": "integer"
},
"coverage_attn": {
"default": false,
"description": "Train a coverage attention layer.",
"title": "Coverage Attn",
"type": "boolean"
},
"lambda_coverage": {
"default": 0.0,
"description": "Lambda value for coverage loss of See et al (2017)",
"title": "Lambda Coverage",
"type": "number"
},
"global_attention": {
"default": "general",
"description": "The attention type to use. (Luong=general, Bahdanau=MLP)",
"enum": [
"dot",
"general",
"mlp",
null
],
"title": "Global Attention"
},
"global_attention_function": {
"default": "softmax",
"description": "Global attention function to use.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Global Attention Function",
"type": "string"
},
"sliding_window": {
"default": 0,
"description": "Sliding window for transformer self-attention.",
"title": "Sliding Window",
"type": "integer"
},
"heads": {
"default": 8,
"description": "Number of heads for transformer self-attention.",
"title": "Heads",
"type": "integer"
},
"transformer_ff": {
"default": 2048,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"relative_positions_buckets": {
"default": 0,
"description": "Enable relative position bias (https://github.com/google-research/text-to-text-transfer-transformer).",
"title": "Relative Positions Buckets",
"type": "integer"
},
"mlp_activation_fn": {
"$ref": "#/$defs/ActivationFunction",
"default": "relu",
"description": "The activation function to use in MLP layer."
},
"layer_norm": {
"default": "standard",
"description": "Type of layer normalization in transformer architecture.",
"enum": [
"standard",
"rms"
],
"title": "Layer Norm",
"type": "string"
},
"norm_eps": {
"default": 1e-06,
"description": "Layer norm epsilon.",
"title": "Norm Eps",
"type": "number"
},
"shared_layer_norm": {
"default": false,
"description": "Use a shared layer_norm in parallel residual attention. Note: must be True for Falcon 7B, False for Falcon 40B, same for GPT-J and GPT-NeoX models.",
"title": "Shared Layer Norm",
"type": "boolean"
},
"add_qkvbias": {
"default": false,
"description": "Add bias to nn.Linear of Query/Key/Value in MHA. Note: this will add bias to output projection layer too.",
"title": "Add Qkvbias",
"type": "boolean"
},
"heads_kv": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number of heads for KV. heads_kv=heads if None, else number of heads for KV(e.g. Falcon 40B)",
"title": "Heads Kv"
},
"head_dim": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Head dimension when this needs to be different vs hidden_size // heads",
"title": "Head Dim"
},
"add_ffnbias": {
"default": false,
"description": "Add bias to nn.Linear of MLP FFN.",
"title": "Add Ffnbias",
"type": "boolean"
},
"parallel_residual": {
"default": false,
"description": "Use parallel residual in decoder layer. Note: this is used by GPT-J / Falcon Architecture.",
"title": "Parallel Residual",
"type": "boolean"
},
"num_experts": {
"default": 0,
"description": "Number of experts for MoE models.",
"title": "Num Experts",
"type": "integer"
},
"num_experts_per_tok": {
"default": 2,
"description": "Number of experts per token.",
"title": "Num Experts Per Tok",
"type": "integer"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"rope_config": {
"anyOf": [
{
"$ref": "#/$defs/RotaryPositionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Rotary position config, if relevant."
},
"aan_useffn": {
"default": false,
"description": "Turn on the FFN layer in the AAN decoder.",
"title": "Aan Useffn",
"type": "boolean"
},
"alignment_layer": {
"default": -2,
"description": "Layer number which has to be supervised.",
"title": "Alignment Layer",
"type": "integer"
},
"alignment_heads": {
"default": 0,
"description": "Number of cross attention heads per layer to supervise with.",
"title": "Alignment Heads",
"type": "integer"
},
"full_context_alignment": {
"default": false,
"description": "Whether alignment is conditioned on full target context.",
"title": "Full Context Alignment",
"type": "boolean"
},
"lambda_align": {
"default": 0.0,
"description": "Lambda value for alignement loss of Garg et al, 2019 (https://arxiv.org/abs/1909.02074)",
"title": "Lambda Align",
"type": "number"
}
},
"title": "TransformerDecoderConfig",
"type": "object"
},
"TransformerEncoderConfig": {
"additionalProperties": false,
"properties": {
"encoder_type": {
"const": "transformer",
"default": "transformer",
"enum": [
"transformer"
],
"title": "Encoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the encoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of encoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"src_word_vec_size": {
"default": 512,
"description": "Word embedding size for src.",
"title": "Src Word Vec Size",
"type": "integer"
},
"sliding_window": {
"default": 0,
"description": "Sliding window for transformer self-attention.",
"title": "Sliding Window",
"type": "integer"
},
"heads": {
"default": 8,
"description": "Number of heads for transformer self-attention.",
"title": "Heads",
"type": "integer"
},
"transformer_ff": {
"default": 2048,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"relative_positions_buckets": {
"default": 0,
"description": "Enable relative position bias (https://github.com/google-research/text-to-text-transfer-transformer).",
"title": "Relative Positions Buckets",
"type": "integer"
},
"mlp_activation_fn": {
"$ref": "#/$defs/ActivationFunction",
"default": "relu",
"description": "The activation function to use in MLP layer."
},
"layer_norm": {
"default": "standard",
"description": "Type of layer normalization in transformer architecture.",
"enum": [
"standard",
"rms"
],
"title": "Layer Norm",
"type": "string"
},
"norm_eps": {
"default": 1e-06,
"description": "Layer norm epsilon.",
"title": "Norm Eps",
"type": "number"
},
"shared_layer_norm": {
"default": false,
"description": "Use a shared layer_norm in parallel residual attention. Note: must be True for Falcon 7B, False for Falcon 40B, same for GPT-J and GPT-NeoX models.",
"title": "Shared Layer Norm",
"type": "boolean"
},
"add_qkvbias": {
"default": false,
"description": "Add bias to nn.Linear of Query/Key/Value in MHA. Note: this will add bias to output projection layer too.",
"title": "Add Qkvbias",
"type": "boolean"
},
"heads_kv": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number of heads for KV. heads_kv=heads if None, else number of heads for KV(e.g. Falcon 40B)",
"title": "Heads Kv"
},
"head_dim": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Head dimension when this needs to be different vs hidden_size // heads",
"title": "Head Dim"
},
"add_ffnbias": {
"default": false,
"description": "Add bias to nn.Linear of MLP FFN.",
"title": "Add Ffnbias",
"type": "boolean"
},
"parallel_residual": {
"default": false,
"description": "Use parallel residual in decoder layer. Note: this is used by GPT-J / Falcon Architecture.",
"title": "Parallel Residual",
"type": "boolean"
},
"num_experts": {
"default": 0,
"description": "Number of experts for MoE models.",
"title": "Num Experts",
"type": "integer"
},
"num_experts_per_tok": {
"default": 2,
"description": "Number of experts per token.",
"title": "Num Experts Per Tok",
"type": "integer"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"rope_config": {
"anyOf": [
{
"$ref": "#/$defs/RotaryPositionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Rotary position config, if relevant."
}
},
"title": "TransformerEncoderConfig",
"type": "object"
},
"TransformerEncoderModelConfig": {
"additionalProperties": false,
"description": "Facilitate setting some transformer specific params at model level.",
"properties": {
"embeddings": {
"$ref": "#/$defs/EmbeddingsConfig",
"description": "Contains most of the args useful to build the Embeddings module."
},
"encoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"brnn": "#/$defs/RnnEncoderConfig",
"cnn": "#/$defs/CnnEncoderConfig",
"mean": "#/$defs/MeanEncoderConfig",
"rnn": "#/$defs/RnnEncoderConfig",
"transformer": "#/$defs/TransformerEncoderConfig"
},
"propertyName": "encoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerEncoderConfig"
},
{
"$ref": "#/$defs/RnnEncoderConfig"
},
{
"$ref": "#/$defs/CnnEncoderConfig"
},
{
"$ref": "#/$defs/MeanEncoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of an encoder.",
"title": "Encoder"
},
"decoder": {
"default": null,
"description": "Major parameters of a decoder.",
"title": "Decoder",
"type": "null"
},
"hidden_size": {
"default": -1,
"description": "Size of hidden states. Overwrites [encoder/decoder].hidden_size if set.",
"title": "Hidden Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"layers": {
"default": -1,
"description": "Number of layers in both encoder and decoder (will overwrite enc_layers/dec_layers).",
"title": "Layers",
"type": "integer"
},
"transformer_ff": {
"default": 2048,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"share_decoder_embeddings": {
"default": false,
"description": "Use a share weight matrix for the input and output word embeddings in the decoder.",
"title": "Share Decoder Embeddings",
"type": "boolean"
},
"share_embeddings": {
"default": false,
"description": "Share the word embeddings between encoder and decoder. Need to use shared vocabulary for this option.",
"title": "Share Embeddings",
"type": "boolean"
},
"input_feed": {
"default": 1,
"description": "Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.",
"title": "Input Feed",
"type": "integer"
},
"generator_function": {
"default": "softmax",
"description": "Which function to use for generating probabilities over the target vocabulary.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Generator Function",
"type": "string"
},
"add_estimator": {
"default": false,
"description": "Add estimator layer",
"title": "Add Estimator",
"type": "boolean"
},
"left_pad": {
"default": false,
"description": "Enable left-padding, useful for some LLMs.",
"title": "Left Pad",
"type": "boolean"
},
"architecture": {
"const": "transformer_encoder",
"default": "transformer_encoder",
"enum": [
"transformer_encoder"
],
"title": "Architecture",
"type": "string"
},
"sliding_window": {
"default": 0,
"description": "Sliding window for transformer self-attention.",
"title": "Sliding Window",
"type": "integer"
},
"heads": {
"default": 8,
"description": "Number of heads for transformer self-attention.",
"title": "Heads",
"type": "integer"
},
"relative_positions_buckets": {
"default": 0,
"description": "Enable relative position bias (https://github.com/google-research/text-to-text-transfer-transformer).",
"title": "Relative Positions Buckets",
"type": "integer"
},
"mlp_activation_fn": {
"$ref": "#/$defs/ActivationFunction",
"default": "relu",
"description": "The activation function to use in MLP layer."
},
"layer_norm": {
"default": "standard",
"description": "Type of layer normalization in transformer architecture.",
"enum": [
"standard",
"rms"
],
"title": "Layer Norm",
"type": "string"
},
"norm_eps": {
"default": 1e-06,
"description": "Layer norm epsilon.",
"title": "Norm Eps",
"type": "number"
},
"shared_layer_norm": {
"default": false,
"description": "Use a shared layer_norm in parallel residual attention. Note: must be True for Falcon 7B, False for Falcon 40B, same for GPT-J and GPT-NeoX models.",
"title": "Shared Layer Norm",
"type": "boolean"
},
"add_qkvbias": {
"default": false,
"description": "Add bias to nn.Linear of Query/Key/Value in MHA. Note: this will add bias to output projection layer too.",
"title": "Add Qkvbias",
"type": "boolean"
},
"heads_kv": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number of heads for KV. heads_kv=heads if None, else number of heads for KV(e.g. Falcon 40B)",
"title": "Heads Kv"
},
"head_dim": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Head dimension when this needs to be different vs hidden_size // heads",
"title": "Head Dim"
},
"add_ffnbias": {
"default": false,
"description": "Add bias to nn.Linear of MLP FFN.",
"title": "Add Ffnbias",
"type": "boolean"
},
"parallel_residual": {
"default": false,
"description": "Use parallel residual in decoder layer. Note: this is used by GPT-J / Falcon Architecture.",
"title": "Parallel Residual",
"type": "boolean"
},
"num_experts": {
"default": 0,
"description": "Number of experts for MoE models.",
"title": "Num Experts",
"type": "integer"
},
"num_experts_per_tok": {
"default": 2,
"description": "Number of experts per token.",
"title": "Num Experts Per Tok",
"type": "integer"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"rope_config": {
"anyOf": [
{
"$ref": "#/$defs/RotaryPositionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Rotary position config, if relevant."
}
},
"title": "TransformerEncoderModelConfig",
"type": "object"
},
"TransformerLMDecoderConfig": {
"additionalProperties": false,
"description": "Right now just wraps TransformerDecoderConfig for simplicity.\nMight merge in a single class later once TransformerLM path is clarified.",
"properties": {
"decoder_type": {
"const": "transformer_lm",
"default": "transformer_lm",
"enum": [
"transformer_lm"
],
"title": "Decoder Type",
"type": "string"
},
"layers": {
"default": 2,
"description": "Number of layers in the decoder.",
"title": "Layers",
"type": "integer"
},
"hidden_size": {
"default": 512,
"description": "Size of decoder hidden states.",
"title": "Hidden Size",
"type": "integer"
},
"tgt_word_vec_size": {
"default": 512,
"description": "Word embedding size for tgt.",
"title": "Tgt Word Vec Size",
"type": "integer"
},
"coverage_attn": {
"default": false,
"description": "Train a coverage attention layer.",
"title": "Coverage Attn",
"type": "boolean"
},
"lambda_coverage": {
"default": 0.0,
"description": "Lambda value for coverage loss of See et al (2017)",
"title": "Lambda Coverage",
"type": "number"
},
"global_attention": {
"default": "general",
"description": "The attention type to use. (Luong=general, Bahdanau=MLP)",
"enum": [
"dot",
"general",
"mlp",
null
],
"title": "Global Attention"
},
"global_attention_function": {
"default": "softmax",
"description": "Global attention function to use.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Global Attention Function",
"type": "string"
},
"sliding_window": {
"default": 0,
"description": "Sliding window for transformer self-attention.",
"title": "Sliding Window",
"type": "integer"
},
"heads": {
"default": 8,
"description": "Number of heads for transformer self-attention.",
"title": "Heads",
"type": "integer"
},
"transformer_ff": {
"default": 2048,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"relative_positions_buckets": {
"default": 0,
"description": "Enable relative position bias (https://github.com/google-research/text-to-text-transfer-transformer).",
"title": "Relative Positions Buckets",
"type": "integer"
},
"mlp_activation_fn": {
"$ref": "#/$defs/ActivationFunction",
"default": "relu",
"description": "The activation function to use in MLP layer."
},
"layer_norm": {
"default": "standard",
"description": "Type of layer normalization in transformer architecture.",
"enum": [
"standard",
"rms"
],
"title": "Layer Norm",
"type": "string"
},
"norm_eps": {
"default": 1e-06,
"description": "Layer norm epsilon.",
"title": "Norm Eps",
"type": "number"
},
"shared_layer_norm": {
"default": false,
"description": "Use a shared layer_norm in parallel residual attention. Note: must be True for Falcon 7B, False for Falcon 40B, same for GPT-J and GPT-NeoX models.",
"title": "Shared Layer Norm",
"type": "boolean"
},
"add_qkvbias": {
"default": false,
"description": "Add bias to nn.Linear of Query/Key/Value in MHA. Note: this will add bias to output projection layer too.",
"title": "Add Qkvbias",
"type": "boolean"
},
"heads_kv": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number of heads for KV. heads_kv=heads if None, else number of heads for KV(e.g. Falcon 40B)",
"title": "Heads Kv"
},
"head_dim": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Head dimension when this needs to be different vs hidden_size // heads",
"title": "Head Dim"
},
"add_ffnbias": {
"default": false,
"description": "Add bias to nn.Linear of MLP FFN.",
"title": "Add Ffnbias",
"type": "boolean"
},
"parallel_residual": {
"default": false,
"description": "Use parallel residual in decoder layer. Note: this is used by GPT-J / Falcon Architecture.",
"title": "Parallel Residual",
"type": "boolean"
},
"num_experts": {
"default": 0,
"description": "Number of experts for MoE models.",
"title": "Num Experts",
"type": "integer"
},
"num_experts_per_tok": {
"default": 2,
"description": "Number of experts per token.",
"title": "Num Experts Per Tok",
"type": "integer"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"rope_config": {
"anyOf": [
{
"$ref": "#/$defs/RotaryPositionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Rotary position config, if relevant."
},
"aan_useffn": {
"default": false,
"description": "Turn on the FFN layer in the AAN decoder.",
"title": "Aan Useffn",
"type": "boolean"
},
"alignment_layer": {
"default": -2,
"description": "Layer number which has to be supervised.",
"title": "Alignment Layer",
"type": "integer"
},
"alignment_heads": {
"default": 0,
"description": "Number of cross attention heads per layer to supervise with.",
"title": "Alignment Heads",
"type": "integer"
},
"full_context_alignment": {
"default": false,
"description": "Whether alignment is conditioned on full target context.",
"title": "Full Context Alignment",
"type": "boolean"
},
"lambda_align": {
"default": 0.0,
"description": "Lambda value for alignement loss of Garg et al, 2019 (https://arxiv.org/abs/1909.02074)",
"title": "Lambda Align",
"type": "number"
}
},
"title": "TransformerLMDecoderConfig",
"type": "object"
},
"TransformerLMModelConfig": {
"additionalProperties": false,
"description": "Facilitate setting some transformer specific params at model level.",
"properties": {
"embeddings": {
"$ref": "#/$defs/EmbeddingsConfig",
"description": "Contains most of the args useful to build the Embeddings module."
},
"encoder": {
"default": null,
"description": "Major parameters of an encoder.",
"title": "Encoder",
"type": "null"
},
"decoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"cnn": "#/$defs/CnnDecoderConfig",
"rnn": "#/$defs/RnnDecoderConfig",
"transformer": "#/$defs/TransformerDecoderConfig",
"transformer_lm": "#/$defs/TransformerLMDecoderConfig"
},
"propertyName": "decoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerDecoderConfig"
},
{
"$ref": "#/$defs/TransformerLMDecoderConfig"
},
{
"$ref": "#/$defs/RnnDecoderConfig"
},
{
"$ref": "#/$defs/CnnDecoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of a decoder.",
"title": "Decoder"
},
"hidden_size": {
"default": -1,
"description": "Size of hidden states. Overwrites [encoder/decoder].hidden_size if set.",
"title": "Hidden Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"layers": {
"default": -1,
"description": "Number of layers in both encoder and decoder (will overwrite enc_layers/dec_layers).",
"title": "Layers",
"type": "integer"
},
"transformer_ff": {
"default": 2048,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"share_decoder_embeddings": {
"default": false,
"description": "Use a share weight matrix for the input and output word embeddings in the decoder.",
"title": "Share Decoder Embeddings",
"type": "boolean"
},
"share_embeddings": {
"default": false,
"description": "Share the word embeddings between encoder and decoder. Need to use shared vocabulary for this option.",
"title": "Share Embeddings",
"type": "boolean"
},
"input_feed": {
"default": 1,
"description": "Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.",
"title": "Input Feed",
"type": "integer"
},
"generator_function": {
"default": "softmax",
"description": "Which function to use for generating probabilities over the target vocabulary.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Generator Function",
"type": "string"
},
"add_estimator": {
"default": false,
"description": "Add estimator layer",
"title": "Add Estimator",
"type": "boolean"
},
"left_pad": {
"default": false,
"description": "Enable left-padding, useful for some LLMs.",
"title": "Left Pad",
"type": "boolean"
},
"architecture": {
"const": "transformer_lm",
"default": "transformer_lm",
"enum": [
"transformer_lm"
],
"title": "Architecture",
"type": "string"
},
"sliding_window": {
"default": 0,
"description": "Sliding window for transformer self-attention.",
"title": "Sliding Window",
"type": "integer"
},
"heads": {
"default": 8,
"description": "Number of heads for transformer self-attention.",
"title": "Heads",
"type": "integer"
},
"relative_positions_buckets": {
"default": 0,
"description": "Enable relative position bias (https://github.com/google-research/text-to-text-transfer-transformer).",
"title": "Relative Positions Buckets",
"type": "integer"
},
"mlp_activation_fn": {
"$ref": "#/$defs/ActivationFunction",
"default": "relu",
"description": "The activation function to use in MLP layer."
},
"layer_norm": {
"default": "standard",
"description": "Type of layer normalization in transformer architecture.",
"enum": [
"standard",
"rms"
],
"title": "Layer Norm",
"type": "string"
},
"norm_eps": {
"default": 1e-06,
"description": "Layer norm epsilon.",
"title": "Norm Eps",
"type": "number"
},
"shared_layer_norm": {
"default": false,
"description": "Use a shared layer_norm in parallel residual attention. Note: must be True for Falcon 7B, False for Falcon 40B, same for GPT-J and GPT-NeoX models.",
"title": "Shared Layer Norm",
"type": "boolean"
},
"add_qkvbias": {
"default": false,
"description": "Add bias to nn.Linear of Query/Key/Value in MHA. Note: this will add bias to output projection layer too.",
"title": "Add Qkvbias",
"type": "boolean"
},
"heads_kv": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number of heads for KV. heads_kv=heads if None, else number of heads for KV(e.g. Falcon 40B)",
"title": "Heads Kv"
},
"head_dim": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Head dimension when this needs to be different vs hidden_size // heads",
"title": "Head Dim"
},
"add_ffnbias": {
"default": false,
"description": "Add bias to nn.Linear of MLP FFN.",
"title": "Add Ffnbias",
"type": "boolean"
},
"parallel_residual": {
"default": false,
"description": "Use parallel residual in decoder layer. Note: this is used by GPT-J / Falcon Architecture.",
"title": "Parallel Residual",
"type": "boolean"
},
"num_experts": {
"default": 0,
"description": "Number of experts for MoE models.",
"title": "Num Experts",
"type": "integer"
},
"num_experts_per_tok": {
"default": 2,
"description": "Number of experts per token.",
"title": "Num Experts Per Tok",
"type": "integer"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"rope_config": {
"anyOf": [
{
"$ref": "#/$defs/RotaryPositionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Rotary position config, if relevant."
}
},
"title": "TransformerLMModelConfig",
"type": "object"
},
"TransformerModelConfig": {
"additionalProperties": false,
"description": "Facilitate setting some transformer specific params at model level.",
"properties": {
"embeddings": {
"$ref": "#/$defs/EmbeddingsConfig",
"description": "Contains most of the args useful to build the Embeddings module."
},
"encoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"brnn": "#/$defs/RnnEncoderConfig",
"cnn": "#/$defs/CnnEncoderConfig",
"mean": "#/$defs/MeanEncoderConfig",
"rnn": "#/$defs/RnnEncoderConfig",
"transformer": "#/$defs/TransformerEncoderConfig"
},
"propertyName": "encoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerEncoderConfig"
},
{
"$ref": "#/$defs/RnnEncoderConfig"
},
{
"$ref": "#/$defs/CnnEncoderConfig"
},
{
"$ref": "#/$defs/MeanEncoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of an encoder.",
"title": "Encoder"
},
"decoder": {
"anyOf": [
{
"discriminator": {
"mapping": {
"cnn": "#/$defs/CnnDecoderConfig",
"rnn": "#/$defs/RnnDecoderConfig",
"transformer": "#/$defs/TransformerDecoderConfig",
"transformer_lm": "#/$defs/TransformerLMDecoderConfig"
},
"propertyName": "decoder_type"
},
"oneOf": [
{
"$ref": "#/$defs/TransformerDecoderConfig"
},
{
"$ref": "#/$defs/TransformerLMDecoderConfig"
},
{
"$ref": "#/$defs/RnnDecoderConfig"
},
{
"$ref": "#/$defs/CnnDecoderConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Major parameters of a decoder.",
"title": "Decoder"
},
"hidden_size": {
"default": -1,
"description": "Size of hidden states. Overwrites [encoder/decoder].hidden_size if set.",
"title": "Hidden Size",
"type": "integer"
},
"word_vec_size": {
"default": -1,
"description": "Word embedding size for src and tgt.",
"title": "Word Vec Size",
"type": "integer"
},
"layers": {
"default": -1,
"description": "Number of layers in both encoder and decoder (will overwrite enc_layers/dec_layers).",
"title": "Layers",
"type": "integer"
},
"transformer_ff": {
"default": 2048,
"description": "Size of hidden transformer feed-forward.",
"title": "Transformer Ff",
"type": "integer"
},
"share_decoder_embeddings": {
"default": false,
"description": "Use a share weight matrix for the input and output word embeddings in the decoder.",
"title": "Share Decoder Embeddings",
"type": "boolean"
},
"share_embeddings": {
"default": false,
"description": "Share the word embeddings between encoder and decoder. Need to use shared vocabulary for this option.",
"title": "Share Embeddings",
"type": "boolean"
},
"input_feed": {
"default": 1,
"description": "Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.",
"title": "Input Feed",
"type": "integer"
},
"generator_function": {
"default": "softmax",
"description": "Which function to use for generating probabilities over the target vocabulary.",
"enum": [
"softmax",
"sparsemax"
],
"title": "Generator Function",
"type": "string"
},
"add_estimator": {
"default": false,
"description": "Add estimator layer",
"title": "Add Estimator",
"type": "boolean"
},
"left_pad": {
"default": false,
"description": "Enable left-padding, useful for some LLMs.",
"title": "Left Pad",
"type": "boolean"
},
"architecture": {
"const": "transformer",
"default": "transformer",
"enum": [
"transformer"
],
"title": "Architecture",
"type": "string"
},
"sliding_window": {
"default": 0,
"description": "Sliding window for transformer self-attention.",
"title": "Sliding Window",
"type": "integer"
},
"heads": {
"default": 8,
"description": "Number of heads for transformer self-attention.",
"title": "Heads",
"type": "integer"
},
"relative_positions_buckets": {
"default": 0,
"description": "Enable relative position bias (https://github.com/google-research/text-to-text-transfer-transformer).",
"title": "Relative Positions Buckets",
"type": "integer"
},
"mlp_activation_fn": {
"$ref": "#/$defs/ActivationFunction",
"default": "relu",
"description": "The activation function to use in MLP layer."
},
"layer_norm": {
"default": "standard",
"description": "Type of layer normalization in transformer architecture.",
"enum": [
"standard",
"rms"
],
"title": "Layer Norm",
"type": "string"
},
"norm_eps": {
"default": 1e-06,
"description": "Layer norm epsilon.",
"title": "Norm Eps",
"type": "number"
},
"shared_layer_norm": {
"default": false,
"description": "Use a shared layer_norm in parallel residual attention. Note: must be True for Falcon 7B, False for Falcon 40B, same for GPT-J and GPT-NeoX models.",
"title": "Shared Layer Norm",
"type": "boolean"
},
"add_qkvbias": {
"default": false,
"description": "Add bias to nn.Linear of Query/Key/Value in MHA. Note: this will add bias to output projection layer too.",
"title": "Add Qkvbias",
"type": "boolean"
},
"heads_kv": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Number of heads for KV. heads_kv=heads if None, else number of heads for KV(e.g. Falcon 40B)",
"title": "Heads Kv"
},
"head_dim": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Head dimension when this needs to be different vs hidden_size // heads",
"title": "Head Dim"
},
"add_ffnbias": {
"default": false,
"description": "Add bias to nn.Linear of MLP FFN.",
"title": "Add Ffnbias",
"type": "boolean"
},
"parallel_residual": {
"default": false,
"description": "Use parallel residual in decoder layer. Note: this is used by GPT-J / Falcon Architecture.",
"title": "Parallel Residual",
"type": "boolean"
},
"num_experts": {
"default": 0,
"description": "Number of experts for MoE models.",
"title": "Num Experts",
"type": "integer"
},
"num_experts_per_tok": {
"default": 2,
"description": "Number of experts per token.",
"title": "Num Experts Per Tok",
"type": "integer"
},
"position_encoding_type": {
"anyOf": [
{
"$ref": "#/$defs/PositionEncodingType"
},
{
"type": "null"
}
],
"default": "SinusoidalInterleaved",
"description": "Type of positional encoding."
},
"n_positions": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Two casesCase 1: Absolute number of positions to learn position embeddings on (position_encoding_type: Learned)Case 2: Max Relative PositionsIn the case of position_encoding_type: Relative",
"title": "N Positions"
},
"rope_config": {
"anyOf": [
{
"$ref": "#/$defs/RotaryPositionConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "Rotary position config, if relevant."
}
},
"title": "TransformerModelConfig",
"type": "object"
},
"UpperCaseConfig": {
"additionalProperties": false,
"properties": {
"upper_corpus_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.01,
"description": "Corpus ratio to apply uppercasing.",
"title": "Upper Corpus Ratio"
}
},
"title": "UpperCaseConfig",
"type": "object"
}
},
"additionalProperties": false,
"required": [
"model_path",
"src"
]
}

field chat_template : str | None = None​

  • Validated by:
    • _validate_predict_config
    • _validate_running_config

field model : TransformerModelConfig | TransformerLMModelConfig | TransformerEncoderModelConfig | RnnModelConfig | CnnModelConfig | CustomModelConfig | None = None​

  • Validated by:
    • _validate_predict_config
    • _validate_running_config

field optional_eos : List[str] | None = []​

Optional EOS tokens that would stop generation, e.g. <

|eot_id|

for Llama3

  • Validated by:
    • _validate_predict_config
    • _validate_running_config

field share_vocab : bool | None = False​

  • Validated by:
    • _validate_predict_config
    • _validate_running_config

field src_subword_vocab : str | None = None​

  • Validated by:
    • _validate_predict_config
    • _validate_running_config

field transforms : List[str] | None = []​

  • Validated by:
    • _validate_predict_config
    • _validate_running_config

field transforms_configs : NestedAllTransformsConfig | None [Optional]​

  • Validated by:
    • _validate_predict_config
    • _validate_running_config

model_post_init(context: Any, /)​

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

  • Parameters:
    • self – The BaseModel instance.
    • context – The context.

pydantic model eole.config.run.BuildVocabConfig[source]​

Bases: DataConfig, MiscConfig, BaseVocabConfig

Show JSON schema
{
"title": "BuildVocabConfig",
"type": "object",
"properties": {
"src_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"description": "Path to src (or shared) vocabulary file. Format: one <word> or <word>\t<count> per line.",
"title": "Src Vocab"
},
"tgt_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to tgt vocabulary file. Format: one <word> or <word>\t<count> per line.",
"title": "Tgt Vocab"
},
"share_vocab": {
"default": false,
"description": "Share source and target vocabulary.",
"title": "Share Vocab",
"type": "boolean"
},
"decoder_start_token": {
"default": "&lt;s&gt;",
"description": "Default decoder start token. For most models it is &lt;s&gt; = BOS. Some fairseq models require &lt;/s&gt;.",
"title": "Decoder Start Token",
"type": "string"
},
"default_specials": {
"default": [
"<unk>",
"<blank>",
"&lt;s&gt;",
"&lt;/s&gt;"
],
"description": "Default specials used for vocab initialization. UNK, PAD, BOS, EOS will take IDs 0, 1, 2, 3.",
"items": {},
"title": "Default Specials",
"type": "array"
},
"both_embeddings": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to the embeddings file to use for both source and target tokens.",
"title": "Both Embeddings"
},
"src_embeddings": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to the embeddings file to use for source tokens.",
"title": "Src Embeddings"
},
"tgt_embeddings": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to the embeddings file to use for target tokens.",
"title": "Tgt Embeddings"
},
"embeddings_type": {
"anyOf": [
{
"enum": [
"GloVe",
"word2vec"
],
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Type of embeddings file.",
"title": "Embeddings Type"
},
"seed": {
"default": -1,
"description": "Set random seed used for better reproducibility between experiments.",
"title": "Seed",
"type": "integer"
},
"src_vocab_size": {
"default": 32758,
"description": "Maximum size of the source vocabulary.",
"title": "Src Vocab Size",
"type": "integer"
},
"tgt_vocab_size": {
"default": 32768,
"description": "Maximum size of the target vocabulary.",
"title": "Tgt Vocab Size",
"type": "integer"
},
"vocab_size_multiple": {
"default": 8,
"description": "Make the vocabulary size a multiple of this value. (Adds dummy tokens if needed.)",
"title": "Vocab Size Multiple",
"type": "integer"
},
"src_words_min_frequency": {
"default": 0,
"description": "Discard source words with lower frequency.",
"title": "Src Words Min Frequency",
"type": "integer"
},
"tgt_words_min_frequency": {
"default": 0,
"description": "Discard target words with lower frequency.",
"title": "Tgt Words Min Frequency",
"type": "integer"
},
"data": {
"anyOf": [
{
"additionalProperties": {
"$ref": "#/$defs/Dataset"
},
"type": "object"
},
{
"type": "null"
}
],
"description": "All datasets and their specifications. See examples/*.yaml for further details.",
"title": "Data"
},
"transforms": {
"default": [],
"description": "Default transform pipeline to apply to data. Can be specified in each corpus of data to override.",
"items": {
"type": "string"
},
"title": "Transforms",
"type": "array"
},
"transforms_configs": {
"anyOf": [
{
"$ref": "#/$defs/NestedAllTransformsConfig"
},
{
"type": "null"
}
]
},
"skip_empty_level": {
"default": "warning",
"description": "Logging level when encoutering empty examples. (silent: silently ignore/skip empty examples, warning: warn when ignoring/skipping empty examples, error: raise an error and stop execution when any empty example)",
"enum": [
"silent",
"warning",
"error"
],
"title": "Skip Empty Level",
"type": "string"
},
"n_sample": {
"default": 5000,
"description": "Number of transformed samples per corpus to use to build the vocabulary. Set to -1 to use the full corpora.",
"title": "N Sample",
"type": "integer"
},
"save_data": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Output base path for objects that will be saved (vocab, transforms, embeddings, ...)",
"title": "Save Data"
},
"overwrite": {
"default": false,
"description": "Overwrite existing objects if any.",
"title": "Overwrite",
"type": "boolean"
},
"dump_samples": {
"default": false,
"description": "Dump samples when building vocabulary. Warning: this may slow down the process.",
"title": "Dump Samples",
"type": "boolean"
},
"num_threads": {
"default": 1,
"description": "Number of parallel threads to build the vocabulary.",
"title": "Num Threads",
"type": "integer"
},
"learn_subwords": {
"default": false,
"description": "Learn subwords (based on defined transforms) prior to building vocabulary.",
"title": "Learn Subwords",
"type": "boolean"
},
"learn_subwords_size": {
"default": 32000,
"description": "Number of subwords operations to learn.",
"title": "Learn Subwords Size",
"type": "integer"
},
"vocab_sample_queue_size": {
"default": 20,
"description": "Size of queues used for dumping samples.",
"title": "Vocab Sample Queue Size",
"type": "integer"
}
},
"$defs": {
"BARTNoiseConfig": {
"additionalProperties": false,
"properties": {
"permute_sent_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Permute this proportion of sentences (boundaries defined by ['.', '?', '!']) in all inputs.",
"title": "Permute Sent Ratio"
},
"rotate_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Rotate this proportion of inputs.",
"title": "Rotate Ratio"
},
"insert_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Insert this percentage of additional random tokens.",
"title": "Insert Ratio"
},
"random_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Instead of using <mask>, use random token this often.",
"title": "Random Ratio"
},
"mask_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.0,
"description": "Fraction of words/subwords that will be masked.",
"title": "Mask Ratio"
},
"mask_length": {
"anyOf": [
{
"enum": [
"subword",
"word",
"span-poisson"
],
"type": "string"
},
{
"type": "null"
}
],
"default": "subword",
"description": "Length of masking window to apply.",
"title": "Mask Length"
},
"poisson_lambda": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 3.0,
"description": "Lambda for Poisson distribution to sample span length if `-mask_length` set to span-poisson.",
"title": "Poisson Lambda"
},
"replace_length": {
"anyOf": [
{
"maximum": 1,
"minimum": -1,
"type": "integer"
},
{
"type": "null"
}
],
"default": -1,
"description": "When masking N tokens, replace with 0, 1, or N tokens. (use -1 for N)",
"title": "Replace Length"
}
},
"title": "BARTNoiseConfig",
"type": "object"
},
"BaseTokenizerConfig": {
"additionalProperties": false,
"properties": {
"src_subword_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path of subword model for src (or shared).",
"title": "Src Subword Model"
},
"tgt_subword_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path of subword model for tgt.",
"title": "Tgt Subword Model"
},
"src_subword_nbest": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (source side)",
"title": "Src Subword Nbest"
},
"tgt_subword_nbest": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (target side)",
"title": "Tgt Subword Nbest"
},
"src_subword_alpha": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0,
"description": "Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (source side)",
"title": "Src Subword Alpha"
},
"tgt_subword_alpha": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0,
"description": "Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (target side)",
"title": "Tgt Subword Alpha"
},
"src_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Path to the vocabulary file for src subword. Format: <word>\\t<count> per line.",
"title": "Src Subword Vocab"
},
"tgt_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Path to the vocabulary file for tgt subword. Format: <word>\\t<count> per line.",
"title": "Tgt Subword Vocab"
},
"src_vocab_threshold": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Only produce src subword in src_subword_vocab with frequency >= src_vocab_threshold.",
"title": "Src Vocab Threshold"
},
"tgt_vocab_threshold": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Only produce tgt subword in tgt_subword_vocab with frequency >= tgt_vocab_threshold.",
"title": "Tgt Vocab Threshold"
}
},
"title": "BaseTokenizerConfig",
"type": "object"
},
"CleanConfig": {
"additionalProperties": false,
"properties": {
"src_eq_tgt": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Remove ex src==tgt",
"title": "Src Eq Tgt"
},
"same_char": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Remove ex with same char more than 4 times",
"title": "Same Char"
},
"same_word": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Remove ex with same word more than 3 times",
"title": "Same Word"
},
"scripts_ok": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
"Latin",
"Common"
],
"description": "list of unicodata scripts accepted",
"title": "Scripts Ok"
},
"scripts_nok": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [],
"description": "list of unicodata scripts not accepted",
"title": "Scripts Nok"
},
"src_tgt_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 2.0,
"description": "ratio between src and tgt",
"title": "Src Tgt Ratio"
},
"avg_tok_min": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 3.0,
"description": "average length of tokens min",
"title": "Avg Tok Min"
},
"avg_tok_max": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 20.0,
"description": "average length of tokens max",
"title": "Avg Tok Max"
},
"langid": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [],
"description": "list of languages accepted",
"title": "Langid"
}
},
"title": "CleanConfig",
"type": "object"
},
"Dataset": {
"additionalProperties": false,
"properties": {
"name": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Name"
},
"weight": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"title": "Weight"
},
"transforms": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"title": "Transforms"
},
"path_src": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Path Src"
},
"path_tgt": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Path Tgt"
},
"path_sco": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Path Sco"
},
"path_txt": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Path Txt"
},
"path_align": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Path Align"
},
"src_prefix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Src Prefix"
},
"tgt_prefix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Tgt Prefix"
},
"src_suffix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Src Suffix"
},
"tgt_suffix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Tgt Suffix"
},
"src_lang": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Src Lang"
},
"tgt_lang": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Tgt Lang"
},
"penn": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Penn"
},
"norm_quote_commas": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Norm Quote Commas"
},
"norm_numbers": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Norm Numbers"
},
"pre_replace_unicode_punct": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"title": "Pre Replace Unicode Punct"
},
"post_remove_control_chars": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"title": "Post Remove Control Chars"
},
"src_eq_tgt": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Src Eq Tgt"
},
"same_char": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Same Char"
},
"same_word": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"title": "Same Word"
},
"scripts_ok": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
"Latin",
"Common"
],
"title": "Scripts Ok"
},
"scripts_nok": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [],
"title": "Scripts Nok"
},
"src_tgt_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 2,
"title": "Src Tgt Ratio"
},
"avg_tok_min": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 3,
"title": "Avg Tok Min"
},
"avg_tok_max": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 20,
"title": "Avg Tok Max"
},
"lang_id": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
"en",
"fr"
],
"title": "Lang Id"
}
},
"title": "Dataset",
"type": "object"
},
"DocifyConfig": {
"additionalProperties": false,
"properties": {
"doc_length": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 200,
"description": "Number of tokens per doc.",
"title": "Doc Length"
},
"max_context": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Max context segments.",
"title": "Max Context"
}
},
"title": "DocifyConfig",
"type": "object"
},
"FilterTooLongConfig": {
"additionalProperties": false,
"properties": {
"src_seq_length": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 192,
"description": "Maximum source sequence length.",
"title": "Src Seq Length"
},
"tgt_seq_length": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 192,
"description": "Maximum target sequence length.",
"title": "Tgt Seq Length"
}
},
"title": "FilterTooLongConfig",
"type": "object"
},
"InlineTagsConfig": {
"additionalProperties": false,
"properties": {
"tags_dictionary_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to a flat term dictionary.",
"title": "Tags Dictionary Path"
},
"tags_corpus_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.1,
"description": "Ratio of corpus to augment with tags.",
"title": "Tags Corpus Ratio"
},
"max_tags": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 12,
"description": "Maximum number of tags that can be added to a single sentence.",
"title": "Max Tags"
},
"paired_stag": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5fph_#_beg\uff60",
"description": "The format of an opening paired inline tag. Must include the character #.",
"title": "Paired Stag"
},
"paired_etag": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5fph_#_end\uff60",
"description": "The format of a closing paired inline tag. Must include the character #.",
"title": "Paired Etag"
},
"isolated_tag": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5fph_#_std\uff60",
"description": "The format of an isolated inline tag. Must include the character #.",
"title": "Isolated Tag"
},
"src_delimiter": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5ffuzzy\uff60",
"description": "Any special token used for augmented src sentences. The default is the fuzzy token used in the FuzzyMatch transform.",
"title": "Src Delimiter"
}
},
"title": "InlineTagsConfig",
"type": "object"
},
"InsertMaskBeforePlaceholderConfig": {
"additionalProperties": false,
"properties": {
"response_patterns": {
"anyOf": [
{
"items": {
"type": "string"
},
"type": "array"
},
{
"type": "null"
}
],
"default": [
"Response : \uff5fnewline\uff60"
],
"description": "Response pattern to locate the end of the prompt.",
"title": "Response Patterns"
}
},
"title": "InsertMaskBeforePlaceholderConfig",
"type": "object"
},
"NestedAllTransformsConfig": {
"additionalProperties": false,
"properties": {
"docify": {
"$ref": "#/$defs/DocifyConfig",
"default": {
"doc_length": 200,
"max_context": 1
}
},
"inlinetags": {
"$ref": "#/$defs/InlineTagsConfig",
"default": {
"tags_dictionary_path": null,
"tags_corpus_ratio": 0.1,
"max_tags": 12,
"paired_stag": "\uff5fph_#_beg\uff60",
"paired_etag": "\uff5fph_#_end\uff60",
"isolated_tag": "\uff5fph_#_std\uff60",
"src_delimiter": "\uff5ffuzzy\uff60"
}
},
"terminology": {
"$ref": "#/$defs/TerminologyConfig",
"default": {
"termbase_path": null,
"src_spacy_language_model": null,
"tgt_spacy_language_model": null,
"term_corpus_ratio": 0.3,
"term_example_ratio": 0.2,
"src_term_stoken": "\uff5fsrc_term_start\uff60",
"tgt_term_stoken": "\uff5ftgt_term_start\uff60",
"tgt_term_etoken": "\uff5ftgt_term_end\uff60",
"term_source_delimiter": "\uff5ffuzzy\uff60"
}
},
"bart": {
"$ref": "#/$defs/BARTNoiseConfig",
"default": {
"permute_sent_ratio": 0.0,
"rotate_ratio": 0.0,
"insert_ratio": 0.0,
"random_ratio": 0.0,
"mask_ratio": 0.0,
"mask_length": "subword",
"poisson_lambda": 3.0,
"replace_length": -1
}
},
"uppercase": {
"$ref": "#/$defs/UpperCaseConfig",
"default": {
"upper_corpus_ratio": 0.01
}
},
"clean": {
"$ref": "#/$defs/CleanConfig",
"default": {
"src_eq_tgt": false,
"same_char": false,
"same_word": false,
"scripts_ok": [
"Latin",
"Common"
],
"scripts_nok": [],
"src_tgt_ratio": 2.0,
"avg_tok_min": 3.0,
"avg_tok_max": 20.0,
"langid": []
}
},
"switchout": {
"$ref": "#/$defs/SwitchOutConfig",
"default": {
"switchout_temperature": 1.0
}
},
"tokendrop": {
"$ref": "#/$defs/TokenDropConfig",
"default": {
"tokendrop_temperature": 1.0
}
},
"tokenmask": {
"$ref": "#/$defs/TokenMaskConfig",
"default": {
"tokenmask_temperature": 1.0
}
},
"insert_mask_before_placeholder": {
"$ref": "#/$defs/InsertMaskBeforePlaceholderConfig",
"default": {
"response_patterns": [
"Response : \uff5fnewline\uff60"
]
}
},
"filtertoolong": {
"$ref": "#/$defs/FilterTooLongConfig",
"default": {
"src_seq_length": 192,
"tgt_seq_length": 192
}
},
"prefix": {
"$ref": "#/$defs/PrefixConfig",
"default": {
"src_prefix": "",
"tgt_prefix": ""
}
},
"suffix": {
"$ref": "#/$defs/SuffixConfig",
"default": {
"src_suffix": "",
"tgt_suffix": ""
}
},
"sentencepiece": {
"$ref": "#/$defs/BaseTokenizerConfig",
"default": {
"src_subword_model": null,
"tgt_subword_model": null,
"src_subword_nbest": 1,
"tgt_subword_nbest": 1,
"src_subword_alpha": 0.0,
"tgt_subword_alpha": 0.0,
"src_subword_vocab": "",
"tgt_subword_vocab": "",
"src_vocab_threshold": 0,
"tgt_vocab_threshold": 0
}
},
"bpe": {
"$ref": "#/$defs/BaseTokenizerConfig",
"default": {
"src_subword_model": null,
"tgt_subword_model": null,
"src_subword_nbest": 1,
"tgt_subword_nbest": 1,
"src_subword_alpha": 0.0,
"tgt_subword_alpha": 0.0,
"src_subword_vocab": "",
"tgt_subword_vocab": "",
"src_vocab_threshold": 0,
"tgt_vocab_threshold": 0
}
},
"onmt_tokenize": {
"$ref": "#/$defs/ONMTTokenizerConfig",
"default": {
"src_subword_model": null,
"tgt_subword_model": null,
"src_subword_nbest": 1,
"tgt_subword_nbest": 1,
"src_subword_alpha": 0.0,
"tgt_subword_alpha": 0.0,
"src_subword_vocab": "",
"tgt_subword_vocab": "",
"src_vocab_threshold": 0,
"tgt_vocab_threshold": 0,
"src_subword_type": "none",
"tgt_subword_type": "none",
"src_onmttok_kwargs": {
"mode": "none"
},
"tgt_onmttok_kwargs": {
"mode": "none"
},
"gpt2_pretok": false,
"mapped_tokens": null
}
},
"normalize": {
"$ref": "#/$defs/NormalizeConfig",
"default": {
"src_lang": "",
"tgt_lang": "",
"penn": true,
"norm_quote_commas": true,
"norm_numbers": true,
"pre_replace_unicode_punct": false,
"post_remove_control_chars": false
}
}
},
"title": "NestedAllTransformsConfig",
"type": "object"
},
"NormalizeConfig": {
"additionalProperties": false,
"properties": {
"src_lang": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Source language code",
"title": "Src Lang"
},
"tgt_lang": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Target language code",
"title": "Tgt Lang"
},
"penn": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Penn substitution",
"title": "Penn"
},
"norm_quote_commas": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Normalize quotations and commas",
"title": "Norm Quote Commas"
},
"norm_numbers": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": true,
"description": "Normalize numbers",
"title": "Norm Numbers"
},
"pre_replace_unicode_punct": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Replace unicode punct",
"title": "Pre Replace Unicode Punct"
},
"post_remove_control_chars": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Remove control chars",
"title": "Post Remove Control Chars"
}
},
"title": "NormalizeConfig",
"type": "object"
},
"ONMTTokenizerConfig": {
"additionalProperties": false,
"properties": {
"src_subword_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path of subword model for src (or shared).",
"title": "Src Subword Model"
},
"tgt_subword_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path of subword model for tgt.",
"title": "Tgt Subword Model"
},
"src_subword_nbest": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (source side)",
"title": "Src Subword Nbest"
},
"tgt_subword_nbest": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 1,
"description": "Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (target side)",
"title": "Tgt Subword Nbest"
},
"src_subword_alpha": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0,
"description": "Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (source side)",
"title": "Src Subword Alpha"
},
"tgt_subword_alpha": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0,
"description": "Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (target side)",
"title": "Tgt Subword Alpha"
},
"src_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Path to the vocabulary file for src subword. Format: <word>\\t<count> per line.",
"title": "Src Subword Vocab"
},
"tgt_subword_vocab": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "Path to the vocabulary file for tgt subword. Format: <word>\\t<count> per line.",
"title": "Tgt Subword Vocab"
},
"src_vocab_threshold": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Only produce src subword in src_subword_vocab with frequency >= src_vocab_threshold.",
"title": "Src Vocab Threshold"
},
"tgt_vocab_threshold": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": 0,
"description": "Only produce tgt subword in tgt_subword_vocab with frequency >= tgt_vocab_threshold.",
"title": "Tgt Vocab Threshold"
},
"src_subword_type": {
"anyOf": [
{
"enum": [
"none",
"sentencepiece",
"bpe"
],
"type": "string"
},
{
"type": "null"
}
],
"default": "none",
"description": "Type of subword model for src (or shared) in pyonmttok.",
"title": "Src Subword Type"
},
"tgt_subword_type": {
"anyOf": [
{
"enum": [
"none",
"sentencepiece",
"bpe"
],
"type": "string"
},
{
"type": "null"
}
],
"default": "none",
"description": "Type of subword model for tgt in pyonmttok.",
"title": "Tgt Subword Type"
},
"src_onmttok_kwargs": {
"anyOf": [
{
"type": "object"
},
{
"type": "null"
}
],
"default": {
"mode": "none"
},
"description": "Other pyonmttok options for src in dict string, except subword related options listed earlier.",
"title": "Src Onmttok Kwargs"
},
"tgt_onmttok_kwargs": {
"anyOf": [
{
"type": "object"
},
{
"type": "null"
}
],
"default": {
"mode": "none"
},
"description": "Other pyonmttok options for tgt in dict string, except subword related options listed earlier.",
"title": "Tgt Onmttok Kwargs"
},
"gpt2_pretok": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": false,
"description": "Preprocess sentence with byte-level mapping.",
"title": "Gpt2 Pretok"
},
"mapped_tokens": {
"anyOf": [
{
"items": {
"maxItems": 2,
"minItems": 2,
"prefixItems": [
{
"type": "string"
},
{
"type": "string"
}
],
"type": "array"
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"description": "Mapped tokens for placeholders preservation",
"title": "Mapped Tokens"
}
},
"title": "ONMTTokenizerConfig",
"type": "object"
},
"PrefixConfig": {
"additionalProperties": false,
"properties": {
"src_prefix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "String to prepend to all source examples.",
"title": "Src Prefix"
},
"tgt_prefix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "String to prepend to all target examples.",
"title": "Tgt Prefix"
}
},
"title": "PrefixConfig",
"type": "object"
},
"SuffixConfig": {
"additionalProperties": false,
"properties": {
"src_suffix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "String to append to all source examples.",
"title": "Src Suffix"
},
"tgt_suffix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "",
"description": "String to append to all target examples.",
"title": "Tgt Suffix"
}
},
"title": "SuffixConfig",
"type": "object"
},
"SwitchOutConfig": {
"additionalProperties": false,
"properties": {
"switchout_temperature": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 1.0,
"description": "Sampling temperature for SwitchOut. :math:`\\tau^{-1}` in :cite:`DBLP:journals/corr/abs-1808-07512`. Smaller value makes data more diverse.",
"title": "Switchout Temperature"
}
},
"title": "SwitchOutConfig",
"type": "object"
},
"TerminologyConfig": {
"additionalProperties": false,
"properties": {
"termbase_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Path to a dictionary file with terms.",
"title": "Termbase Path"
},
"src_spacy_language_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Name of the spaCy language model for the source corpus.",
"title": "Src Spacy Language Model"
},
"tgt_spacy_language_model": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Name of the spaCy language model for the target corpus.",
"title": "Tgt Spacy Language Model"
},
"term_corpus_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.3,
"description": "Ratio of corpus to augment with terms.",
"title": "Term Corpus Ratio"
},
"term_example_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.2,
"description": "Maximum terms allowed in an example.",
"title": "Term Example Ratio"
},
"src_term_stoken": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5fsrc_term_start\uff60",
"description": "The source term start token.",
"title": "Src Term Stoken"
},
"tgt_term_stoken": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5ftgt_term_start\uff60",
"description": "The target term start token.",
"title": "Tgt Term Stoken"
},
"tgt_term_etoken": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5ftgt_term_end\uff60",
"description": "The target term end token.",
"title": "Tgt Term Etoken"
},
"term_source_delimiter": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": "\uff5ffuzzy\uff60",
"description": "Any special token used for augmented source sentences. The default is the fuzzy token used in the FuzzyMatch transform.",
"title": "Term Source Delimiter"
}
},
"title": "TerminologyConfig",
"type": "object"
},
"TokenDropConfig": {
"additionalProperties": false,
"properties": {
"tokendrop_temperature": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 1.0,
"description": "Sampling temperature for token deletion.",
"title": "Tokendrop Temperature"
}
},
"title": "TokenDropConfig",
"type": "object"
},
"TokenMaskConfig": {
"additionalProperties": false,
"properties": {
"tokenmask_temperature": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 1.0,
"description": "Sampling temperature for token masking.",
"title": "Tokenmask Temperature"
}
},
"title": "TokenMaskConfig",
"type": "object"
},
"UpperCaseConfig": {
"additionalProperties": false,
"properties": {
"upper_corpus_ratio": {
"anyOf": [
{
"type": "number"
},
{
"type": "null"
}
],
"default": 0.01,
"description": "Corpus ratio to apply uppercasing.",
"title": "Upper Corpus Ratio"
}
},
"title": "UpperCaseConfig",
"type": "object"
}
},
"required": [
"src_vocab",
"data"
]
}

field dump_samples : bool = False​

Dump samples when building vocabulary. Warning: this may slow down the process.

  • Validated by:
    • _validate_build_vocab_config
    • _validate_data_config

field learn_subwords : bool = False​

Learn subwords (based on defined transforms) prior to building vocabulary.

  • Validated by:
    • _validate_build_vocab_config
    • _validate_data_config

field learn_subwords_size : int = 32000​

Number of subwords operations to learn.

  • Validated by:
    • _validate_build_vocab_config
    • _validate_data_config

field n_sample : int = 5000​

Number of transformed samples per corpus to use to build the vocabulary. Set to -1 to use the full corpora.

  • Validated by:
    • _validate_build_vocab_config
    • _validate_data_config

field num_threads : int = 1​

Number of parallel threads to build the vocabulary.

  • Validated by:
    • _validate_build_vocab_config
    • _validate_data_config

field vocab_sample_queue_size : int = 20​

Size of queues used for dumping samples.

  • Validated by:
    • _validate_build_vocab_config
    • _validate_data_config

model_post_init(context: Any, /)​

We need to both initialize private attributes and call the user-defined model_post_init method.