status
type
date
slug
summary
tags
category
icon
password
Dataset
Fine Tuning

dataset_text_field="text"
: Specifies the field in the dataset containing the text.
max_seq_length=512
: Maximum sequence length for input texts.
per_device_train_batch_size=2
: Batch size per GPU.
gradient_accumulation_steps=4
: Number of steps to accumulate gradients before updating.
optim="paged_adamw_32bit"
: Specifies the optimizer.
save_steps=50
: Save a checkpoint every 50 steps.
logging_steps=5
: Log training metrics every 5 steps.
learning_rate=2e-4
: The learning rate for training.
fp16=True
: Use mixed precision training.
max_grad_norm=0.3
: Maximum gradient norm for gradient clipping.
max_steps=200
: Total number of training steps.
warmup_ratio=0.03
: Portion of training steps used for learning rate warmup.
lr_scheduler_type="linear"
: Type of learning rate scheduler.
gradient_checkpointing=True
: Use gradient checkpointing to save memory.
用llama.cpp 来merge
这里因为 lora是根据fp16 train的 而在finetone过程中模型先被dequantized到了nf4(normalize后将weight划分到16个相邻最近的值 减小模型大小 16bit → 4bit 每个para)
这里我们选择先将两个fp16的merge 然后再quant 来做inference
这样我们从本来运行不了这个大小的模型 现在不光可以finetune 还可以做inference
我们将原始模型(fp16) 转换到 gguf (convert_hf_to_gguf)
然后lora weight 到gguf (convert_lora_to_gguf)
merge(这里需要把 llama.cpp/bin/ 加到path里)
quant

Why left padding?
下面这篇文章写的很好
Reference:
‣
- Author:ran2323
- URL:https://www.blueif.me//article/17771a79-6e22-802b-9865-fe07a9d9e1cb
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!