Skip to main content

Llama2


NOTE To make your life easier, run these commands from the recipe directory (here recipes/llama2).

Retrieve and convert model

Set environment variables

export EOLE_MODEL_DIR=<where_to_store_models>
export HF_TOKEN=<your_hf_token>

Download and convert model

eole convert HF --model_dir meta-llama/Llama-2-7b-chat-hf --output $EOLE_MODEL_DIR/llama2-7b-chat-hf --token $HF_TOKEN

Inference

Write test prompt to text file

echo -e "<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
What are some nice places to visit in France? [/INST]" | sed ':a;N;$!ba;s/\n/⦅newline⦆/g' > test_prompt.txt

Run inference

Single GPU

eole predict -c llama-inference.yaml -src test_prompt.txt -output test_output.txt

Dual GPU (tensor parallelism)

eole predict -c llama-inference-tp-2gpu.yaml -src test_prompt.txt -output test_output.txt

Finetuning

Retrieve data

[ ! -d ./data ] && mkdir ./data
# Alpaca
wget -P ./data https://opennmt-models.s3.amazonaws.com/llama/alpaca_clean.txt

# Vicuna
wget -P ./data https://opennmt-models.s3.amazonaws.com/llama/sharegpt.txt

# Open Assisstant
wget -P ./data https://opennmt-models.s3.amazonaws.com/llama/osst1.flattened.txt

Finetune

eole train -c llama-finetune.yaml

Merge LoRa weights

eole model lora --action merge --base_model ${EOLE_MODEL_DIR}/llama2-7b-chat-hf --lora_weights ./finetune/llama2-7b-chat-hf-finetune --output ./finetune/merged

Then you can just update your inference setup to use the newly finetuned & merged model.