Bagel
  • Bakery Docs
    • Getting Started with the Bakery
    • Creating and Managing API Keys
    • Creating Datasets and Models
    • Uploading Files to the Bakery
    • Fine-Tuning Using The Bakery: Step-by-Step Guide
    • Purchasing and Publishing Assets to Marketplace
    • HuggingFace Integration Guide
  • Bagel Packages
    • Bagel Python Package
      • Bagels RAW - finetuning asset Quickstart.md
      • Fine-Tuning AI Models with Bagel - Overview
      • Fine-tuning LLama3 Models on Bagel
    • Bagel JavaScript Package
      • QuickStart
      • Fine-Tuning AI Models
  • Dataset Preparation Guide for Fine-tuning
  • Supported LLM Models for Fine-tuning in Bakery
  • How to Fine-tune LLM Successfully in Bakery
Powered by GitBook
On this page
  • Introduction
  • Supported LLM Models
  • 1. LLaMa Family
  • 2. StableCode-instruct-alpha-3b
  • 3. SmolLM2
  • 4. Qwen2.5-Coder-7B-Instruct
  • 5. GPT-2 Family
  • 6. BERT Family
  • 7. T5 Family
  • 8. Hugging Face Models
  • Hardware Requirements
  • Best Practices
  • Limitations and Considerations
  • Support:
  • Updates and Roadmap

Supported LLM Models for Fine-tuning in Bakery

Introduction

Bakery supports fine-tuning of various Large Language Models (LLMs) to meet different use cases and requirements. This guide outlines the currently supported models, their characteristics, and recommended use cases.

Supported LLM Models

1. LLaMa Family

Description: Powerful general-purpose language model.

  • Size: 1B, 3B, 8B parameters

  • Best For:

    • Complex text generation

    • Multi-turn conversations

    • Content summarization

    • Language translation

LLaMa Family

  • LLaMa-3-8b (8B parameters)

  • LLaMa-3.2-1b (1B parameters)

  • LLaMa-3.2-3b (3B parameters)

Training Requirements:

  • Minimum Dataset Size: 2,000 examples

  • Recommended GPU: NVIDIA A100

  • Typical Training Time: 4-6 hours for 10k examples

2. StableCode-instruct-alpha-3b

Description: Optimized for code generation and understanding.

  • Size: 3B parameters

  • Best For:

    • Code completion

    • Code documentation

    • Bug fixing

    • Code conversion between languages

Training Requirements:

  • Minimum Dataset Size: 1,000 examples

  • Recommended GPU: NVIDIA A100

  • Typical Training Time: 2-4 hours for 10k examples

3. SmolLM2

Description: Lightweight and efficient model for general text generation.

  • Size: 1.7B & 135M parameters

  • Best For:

    • Quick prototyping

    • Resource-constrained environments

    • General text generation

    • Classification tasks

Training Requirements:

  • Minimum Dataset Size: 500 examples

  • Recommended GPU: NVIDIA A100

  • Typical Training Time: 1-2 hours for 10k examples

4. Qwen2.5-Coder-7B-Instruct

Description: Advanced model with strong reasoning capabilities.

  • Size: 7B parameters

  • Best For:

    • Complex reasoning tasks

    • Question answering

    • Analysis and inference

    • Technical content generation

Training Requirements:

  • Minimum Dataset Size: 2,000 examples

  • Recommended GPU: NVIDIA A100

  • Typical Training Time: 4-6 hours for 10k examples

5. GPT-2 Family

Description: General-purpose language model.

  • Size: 124M, 355M, 774M parameters

  • Best For:

    • Simple text generation

    • Multi-turn conversations

    • Content summarization

GPT-2 Family

  • GPT-2 Small (124M parameters)

  • GPT-2 Medium (355M parameters)

  • GPT-2 Large (774M parameters)

Training Requirements:

  • Minimum Dataset Size: 1,000 examples

  • Recommended GPU: NVIDIA V100

  • Typical Training Time: 2-4 hours for 10k examples

6. BERT Family

Description: Text Classification language model.

  • Size: 110M, 340M, 66M parameters

  • Best For:

    • Text Classification

    • Sentiment Analysis

BERT Family

  • BERT Base

  • BERT Large

  • DistilBERT

Training Requirements:

  • Minimum Dataset Size: 1,000 examples

  • Recommended GPU: NVIDIA V100

  • Typical Training Time: 2-4 hours for 10k examples

7. T5 Family

Description: Question-Answering language model.

  • Size: 124M, 355M, 774M parameters

  • Best For:

    • Question Answering

    • Text generation

    • Multi-turn conversations

T5 Family

  • T5 Small

  • T5 Base

  • T5 Large

Training Requirements:

  • Minimum Dataset Size: 1,000 examples

  • Recommended GPU: NVIDIA V100

  • Typical Training Time: 2-4 hours for 10k examples

8. Hugging Face Models

Bakery also supports fine-tuning of various models from Hugging Face Hub. Some popular options include:

OpenAI GPT Variants

  1. gpt2

  2. gpt2-medium

  3. gpt2-large

GPT-Neo and GPT-J Models (by EleutherAI)

  1. EleutherAI/gpt-neo-125M

  2. EleutherAI/gpt-neo-1.3B

  3. EleutherAI/gpt-neo-2.7B

  4. EleutherAI/gpt-j-6B

BLOOM Models (by BigScience)

  1. bigscience/bloom-560m

  2. bigscience/bloom-1b3

  3. bigscience/bloom-7b1

  4. bigscience/bloom

OPT Models (by Meta)

  1. facebook/opt-125m

  2. facebook/opt-350m

  3. facebook/opt-1.3b

  4. facebook/opt-2.7b

XGLM Models (by Facebook)

  1. facebook/xglm-564M

  2. facebook/xglm-1.7B

  3. facebook/xglm-2.9B

  4. facebook/xglm-7.5B

Code Models

  1. Salesforce/codegen-350M-mono

  2. Salesforce/codegen-2B-mono

  3. Salesforce/codegen-6B-mono

Pythia Models (by EleutherAI)

  1. EleutherAI/pythia-70m

  2. EleutherAI/pythia-410m

  3. EleutherAI/pythia-1b

  4. EleutherAI/pythia-2.8b

Miscellaneous Models

  1. bigcode/santacoder

  2. cerebras/Cerebras-GPT-111M

  3. cerebras/Cerebras-GPT-1.3B

  4. cerebras/Cerebras-GPT-2.7B

  5. RWKV/rwkv-4-pile-169m

  6. RWKV/rwkv-4-pile-430m

  7. RWKV/rwkv-4-pile-3b

  8. RWKV/rwkv-4-pile-7b

Hardware Requirements

Minimum Requirements

  • CPU: 8 cores

  • RAM: 32GB

  • Storage: 100GB SSD

  • GPU: NVIDIA V100 (16GB)

Recommended Requirements

  • CPU: 16+ cores

  • RAM: 64GB+

  • Storage: 500GB SSD

  • GPU: NVIDIA A100 (40GB)

Best Practices

  1. Model Selection

    • Start with smaller models for testing

    • Scale up based on results and requirements

    • Consider computational resources available

  2. Dataset Preparation

    • Clean and preprocess data thoroughly

    • Match dataset size to model capacity

    • Ensure data quality over quantity

  3. Training Process

    • Start with recommended parameters

    • Monitor training metrics closely

Limitations and Considerations

  1. Resource Constraints

    • Larger models require significant GPU memory

    • Training time increases with model size

    • Cost considerations for extended training

  2. Performance Tradeoffs

    • Smaller models train faster but may have lower performance

    • Larger models need more data but provide better results

    • Consider inference speed requirements

  3. License Considerations

    • Check model licenses before deployment

    • Understand usage restrictions

    • Review commercial use requirements

Support:

Community Support

Technical Support

  • Email: support@bagel.net

Updates and Roadmap

We regularly update our supported models. Check our changelog for the latest additions and improvements. Upcoming features include:

  • Support for more efficient training methods

  • Additional model architectures

  • Enhanced fine-tuning capabilities

  • Improved resource optimization

For the latest updates and announcements, follow us on:

PreviousDataset Preparation Guide for Fine-tuningNextHow to Fine-tune LLM Successfully in Bakery

Last updated 5 months ago

Priority support for users

Discord community
GitHub discussions
Discord
Twitter
LinkedIn
Blog