Supported LLM Models for Fine-tuning in Bakery
Introduction
Bakery supports fine-tuning of various Large Language Models (LLMs) to meet different use cases and requirements. This guide outlines the currently supported models, their characteristics, and recommended use cases.
Supported LLM Models
1. LLaMa Family
Description: Powerful general-purpose language model.
Size: 1B, 3B, 8B parameters
Best For:
Complex text generation
Multi-turn conversations
Content summarization
Language translation
LLaMa Family
LLaMa-3-8b (8B parameters)
LLaMa-3.2-1b (1B parameters)
LLaMa-3.2-3b (3B parameters)
Training Requirements:
Minimum Dataset Size: 2,000 examples
Recommended GPU: NVIDIA A100
Typical Training Time: 4-6 hours for 10k examples
2. StableCode-instruct-alpha-3b
Description: Optimized for code generation and understanding.
Size: 3B parameters
Best For:
Code completion
Code documentation
Bug fixing
Code conversion between languages
Training Requirements:
Minimum Dataset Size: 1,000 examples
Recommended GPU: NVIDIA A100
Typical Training Time: 2-4 hours for 10k examples
3. SmolLM2
Description: Lightweight and efficient model for general text generation.
Size: 1.7B & 135M parameters
Best For:
Quick prototyping
Resource-constrained environments
General text generation
Classification tasks
Training Requirements:
Minimum Dataset Size: 500 examples
Recommended GPU: NVIDIA A100
Typical Training Time: 1-2 hours for 10k examples
4. Qwen2.5-Coder-7B-Instruct
Description: Advanced model with strong reasoning capabilities.
Size: 7B parameters
Best For:
Complex reasoning tasks
Question answering
Analysis and inference
Technical content generation
Training Requirements:
Minimum Dataset Size: 2,000 examples
Recommended GPU: NVIDIA A100
Typical Training Time: 4-6 hours for 10k examples
5. GPT-2 Family
Description: General-purpose language model.
Size: 124M, 355M, 774M parameters
Best For:
Simple text generation
Multi-turn conversations
Content summarization
GPT-2 Family
GPT-2 Small (124M parameters)
GPT-2 Medium (355M parameters)
GPT-2 Large (774M parameters)
Training Requirements:
Minimum Dataset Size: 1,000 examples
Recommended GPU: NVIDIA V100
Typical Training Time: 2-4 hours for 10k examples
6. BERT Family
Description: Text Classification language model.
Size: 110M, 340M, 66M parameters
Best For:
Text Classification
Sentiment Analysis
BERT Family
BERT Base
BERT Large
DistilBERT
Training Requirements:
Minimum Dataset Size: 1,000 examples
Recommended GPU: NVIDIA V100
Typical Training Time: 2-4 hours for 10k examples
7. T5 Family
Description: Question-Answering language model.
Size: 124M, 355M, 774M parameters
Best For:
Question Answering
Text generation
Multi-turn conversations
T5 Family
T5 Small
T5 Base
T5 Large
Training Requirements:
Minimum Dataset Size: 1,000 examples
Recommended GPU: NVIDIA V100
Typical Training Time: 2-4 hours for 10k examples
8. Hugging Face Models
Bakery also supports fine-tuning of various models from Hugging Face Hub. Some popular options include:
OpenAI GPT Variants
gpt2
gpt2-medium
gpt2-large
GPT-Neo and GPT-J Models (by EleutherAI)
EleutherAI/gpt-neo-125M
EleutherAI/gpt-neo-1.3B
EleutherAI/gpt-neo-2.7B
EleutherAI/gpt-j-6B
BLOOM Models (by BigScience)
bigscience/bloom-560m
bigscience/bloom-1b3
bigscience/bloom-7b1
bigscience/bloom
OPT Models (by Meta)
facebook/opt-125m
facebook/opt-350m
facebook/opt-1.3b
facebook/opt-2.7b
XGLM Models (by Facebook)
facebook/xglm-564M
facebook/xglm-1.7B
facebook/xglm-2.9B
facebook/xglm-7.5B
Code Models
Salesforce/codegen-350M-mono
Salesforce/codegen-2B-mono
Salesforce/codegen-6B-mono
Pythia Models (by EleutherAI)
EleutherAI/pythia-70m
EleutherAI/pythia-410m
EleutherAI/pythia-1b
EleutherAI/pythia-2.8b
Miscellaneous Models
bigcode/santacoder
cerebras/Cerebras-GPT-111M
cerebras/Cerebras-GPT-1.3B
cerebras/Cerebras-GPT-2.7B
RWKV/rwkv-4-pile-169m
RWKV/rwkv-4-pile-430m
RWKV/rwkv-4-pile-3b
RWKV/rwkv-4-pile-7b
Hardware Requirements
Minimum Requirements
CPU: 8 cores
RAM: 32GB
Storage: 100GB SSD
GPU: NVIDIA V100 (16GB)
Recommended Requirements
CPU: 16+ cores
RAM: 64GB+
Storage: 500GB SSD
GPU: NVIDIA A100 (40GB)
Best Practices
Model Selection
Start with smaller models for testing
Scale up based on results and requirements
Consider computational resources available
Dataset Preparation
Clean and preprocess data thoroughly
Match dataset size to model capacity
Ensure data quality over quantity
Training Process
Start with recommended parameters
Monitor training metrics closely
Limitations and Considerations
Resource Constraints
Larger models require significant GPU memory
Training time increases with model size
Cost considerations for extended training
Performance Tradeoffs
Smaller models train faster but may have lower performance
Larger models need more data but provide better results
Consider inference speed requirements
License Considerations
Check model licenses before deployment
Understand usage restrictions
Review commercial use requirements
Support:
Community Support
Technical Support
Email: support@bagel.net
Updates and Roadmap
We regularly update our supported models. Check our changelog for the latest additions and improvements. Upcoming features include:
Support for more efficient training methods
Additional model architectures
Enhanced fine-tuning capabilities
Improved resource optimization
For the latest updates and announcements, follow us on:
Last updated