How to Fine-tune LLM Successfully in Bakery
Last updated
Last updated
This comprehensive guide will walk you through the process of fine-tuning Large Language Models (LLMs) in Bakery. We'll cover all the parameters available in the FineTunePayload and provide best practices for achieving optimal results.
Base Model Selection
You must provide either:
base_model
: Asset ID of a model in Bakery or Hugging Face model identifier (e.g., "gpt2", "facebook/opt-350m")
Dataset Selection:
Please select available datasets in "My Datasets" & choose appropriate file format for input & output column selection.
Recommendations:
Small models (< 3B parameters): 2e-5 to 5e-5
Medium models (3B-7B parameters): 1e-5 to 3e-5
Large models (> 7B parameters): 5e-6 to 1e-5
Guidelines:
Small datasets (< 1000 examples): 5-10 epochs
Medium datasets (1000-10000 examples): 3-5 epochs
Large datasets (> 10000 examples): 1-3 epochs
Choose based on:
Model size
To fine-tune small model like GPT2 or BERT, please use NVIDIA_TESLA_V100 for successful fine-tuning
To fine-tune medium/large model like LlaMa, StableCode, Qwen or SmolLM2, please use NVIDIA_TESLA_A100 for successful fine-tuning
Dataset size
For complex dataset, please use NVIDIA_TESLA_A100
Training budget
Required training speed
For faster training process, please use NVIDIA_TESLA_A100
Ensure your dataset meets these criteria:
Clean, high-quality data
Proper format (CSV, JSON, TXT, or Parquet)
Appropriate size for your model
Correct column names
The system provides real-time metrics:
Loss values
Learning rate
Training time
Training logs
Clean your data thoroughly
Remove duplicates
Balance your dataset
Validate data format
Start with smaller models for testing
Consider computational resources
Match model size to dataset size
Check model licenses
Start with recommended defaults
Adjust based on monitoring
Use early stopping when needed
Save checkpoints regularly
Monitor GPU memory usage
Schedule large jobs appropriately
Implement proper error handling
Back up important checkpoints
Solution:
Reduce batch size
Use smaller sequence lengths
Choose a smaller model
Upgrade GPU
Solution:
Check data quality
Adjust learning rate
Increase epochs
Validate preprocessing
Solution:
Use more powerful GPU
Optimize batch size
Reduce dataset size
Simplify model architecture
Use early stopping
Monitor validation metrics
Implement checkpointing
Save best models
Test on validation set
Compare with baseline
Check for overfitting
Validate outputs
Automatic save to Bakery
Version control
Metadata storage
For additional assistance, contact:
Join our
Visit our