Supported LLM Models for Fine-tuning in Bakery

Introduction

Bakery supports fine-tuning of various Large Language Models (LLMs) to meet different use cases and requirements. This guide outlines the currently supported models, their characteristics, and recommended use cases.

Supported LLM Models

1. LLaMa Family

Description: Powerful general-purpose language model.

Size: 1B, 3B, 8B parameters
Best For:
- Complex text generation
- Multi-turn conversations
- Content summarization
- Language translation

LLaMa Family

LLaMa-3-8b (8B parameters)
LLaMa-3.2-1b (1B parameters)
LLaMa-3.2-3b (3B parameters)

Training Requirements:

Minimum Dataset Size: 2,000 examples
Recommended GPU: NVIDIA A100
Typical Training Time: 4-6 hours for 10k examples

2. StableCode-instruct-alpha-3b

Description: Optimized for code generation and understanding.

Size: 3B parameters
Best For:
- Code completion
- Code documentation
- Bug fixing
- Code conversion between languages

Training Requirements:

Minimum Dataset Size: 1,000 examples
Recommended GPU: NVIDIA A100
Typical Training Time: 2-4 hours for 10k examples

3. SmolLM2

Description: Lightweight and efficient model for general text generation.

Size: 1.7B & 135M parameters
Best For:
- Quick prototyping
- Resource-constrained environments
- General text generation
- Classification tasks

Training Requirements:

Minimum Dataset Size: 500 examples
Recommended GPU: NVIDIA A100
Typical Training Time: 1-2 hours for 10k examples

4. Qwen2.5-Coder-7B-Instruct

Description: Advanced model with strong reasoning capabilities.

Size: 7B parameters
Best For:
- Complex reasoning tasks
- Question answering
- Analysis and inference
- Technical content generation

Training Requirements:

Minimum Dataset Size: 2,000 examples
Recommended GPU: NVIDIA A100
Typical Training Time: 4-6 hours for 10k examples

5. GPT-2 Family

Description: General-purpose language model.

Size: 124M, 355M, 774M parameters
Best For:
- Simple text generation
- Multi-turn conversations
- Content summarization

GPT-2 Family

GPT-2 Small (124M parameters)
GPT-2 Medium (355M parameters)
GPT-2 Large (774M parameters)

Training Requirements:

Minimum Dataset Size: 1,000 examples
Recommended GPU: NVIDIA V100
Typical Training Time: 2-4 hours for 10k examples

6. BERT Family

Description: Text Classification language model.

Size: 110M, 340M, 66M parameters
Best For:
- Text Classification
- Sentiment Analysis

BERT Family

BERT Base
BERT Large
DistilBERT

Training Requirements:

Minimum Dataset Size: 1,000 examples
Recommended GPU: NVIDIA V100
Typical Training Time: 2-4 hours for 10k examples

7. T5 Family

Description: Question-Answering language model.

Size: 124M, 355M, 774M parameters
Best For:
- Question Answering
- Text generation
- Multi-turn conversations

T5 Family

T5 Small
T5 Base
T5 Large

Training Requirements:

Minimum Dataset Size: 1,000 examples
Recommended GPU: NVIDIA V100
Typical Training Time: 2-4 hours for 10k examples

8. Hugging Face Models

Bakery also supports fine-tuning of various models from Hugging Face Hub. Some popular options include:

OpenAI GPT Variants

gpt2
gpt2-medium
gpt2-large

GPT-Neo and GPT-J Models (by EleutherAI)

EleutherAI/gpt-neo-125M
EleutherAI/gpt-neo-1.3B
EleutherAI/gpt-neo-2.7B
EleutherAI/gpt-j-6B

BLOOM Models (by BigScience)

bigscience/bloom-560m
bigscience/bloom-1b3
bigscience/bloom-7b1
bigscience/bloom

OPT Models (by Meta)

facebook/opt-125m
facebook/opt-350m
facebook/opt-1.3b
facebook/opt-2.7b

XGLM Models (by Facebook)

facebook/xglm-564M
facebook/xglm-1.7B
facebook/xglm-2.9B
facebook/xglm-7.5B

Code Models

Salesforce/codegen-350M-mono
Salesforce/codegen-2B-mono
Salesforce/codegen-6B-mono

Pythia Models (by EleutherAI)

EleutherAI/pythia-70m
EleutherAI/pythia-410m
EleutherAI/pythia-1b
EleutherAI/pythia-2.8b

Miscellaneous Models

bigcode/santacoder
cerebras/Cerebras-GPT-111M
cerebras/Cerebras-GPT-1.3B
cerebras/Cerebras-GPT-2.7B
RWKV/rwkv-4-pile-169m
RWKV/rwkv-4-pile-430m
RWKV/rwkv-4-pile-3b
RWKV/rwkv-4-pile-7b

Hardware Requirements

Minimum Requirements

CPU: 8 cores
RAM: 32GB
Storage: 100GB SSD
GPU: NVIDIA V100 (16GB)

Recommended Requirements

CPU: 16+ cores
RAM: 64GB+
Storage: 500GB SSD
GPU: NVIDIA A100 (40GB)

Best Practices

Model Selection
- Start with smaller models for testing
- Scale up based on results and requirements
- Consider computational resources available
Dataset Preparation
- Clean and preprocess data thoroughly
- Match dataset size to model capacity
- Ensure data quality over quantity
Training Process
- Start with recommended parameters
- Monitor training metrics closely

Limitations and Considerations

Resource Constraints
- Larger models require significant GPU memory
- Training time increases with model size
- Cost considerations for extended training
Performance Tradeoffs
- Smaller models train faster but may have lower performance
- Larger models need more data but provide better results
- Consider inference speed requirements
License Considerations
- Check model licenses before deployment
- Understand usage restrictions
- Review commercial use requirements

Support:

Community Support

Technical Support

Email: [email protected]
Priority Discord support for users

Updates and Roadmap

We regularly update our supported models. Check our changelog for the latest additions and improvements. Upcoming features include:

Support for more efficient training methods
Additional model architectures
Enhanced fine-tuning capabilities
Improved resource optimization

For the latest updates and announcements, follow us on:

PreviousDataset Preparation Guide for Fine-tuning NextHow to Fine-tune LLM Successfully in Bakery

Last updated 7 months ago