Bagel
  • Bakery Docs
    • Getting Started with the Bakery
    • Creating and Managing API Keys
    • Creating Datasets and Models
    • Uploading Files to the Bakery
    • Fine-Tuning Using The Bakery: Step-by-Step Guide
    • Purchasing and Publishing Assets to Marketplace
    • HuggingFace Integration Guide
  • Bagel Packages
    • Bagel Python Package
      • Bagels RAW - finetuning asset Quickstart.md
      • Fine-Tuning AI Models with Bagel - Overview
      • Fine-tuning LLama3 Models on Bagel
    • Bagel JavaScript Package
      • QuickStart
      • Fine-Tuning AI Models
  • Dataset Preparation Guide for Fine-tuning
  • Supported LLM Models for Fine-tuning in Bakery
  • How to Fine-tune LLM Successfully in Bakery
Powered by GitBook
On this page
  • Guide to Successful LLM Fine-tuning in Bakery
  • Introduction
  • Fine-tuning Parameters Overview
  • Training Parameters
  • Step-by-Step Fine-tuning Guide
  • Best Practices
  • Common Issues and Solutions
  • Performance Optimization Tips
  • Post-Training Steps

How to Fine-tune LLM Successfully in Bakery

PreviousSupported LLM Models for Fine-tuning in Bakery

Last updated 5 months ago

Guide to Successful LLM Fine-tuning in Bakery

Introduction

This comprehensive guide will walk you through the process of fine-tuning Large Language Models (LLMs) in Bakery. We'll cover all the parameters available in the FineTunePayload and provide best practices for achieving optimal results.

Fine-tuning Parameters Overview

Essential Parameters

Base Model Selection

You must provide either:

  • base_model: Asset ID of a model in Bakery or Hugging Face model identifier (e.g., "gpt2", "facebook/opt-350m")

Dataset Selection:

Please select available datasets in "My Datasets" & choose appropriate file format for input & output column selection.

Training Parameters

Learning Rate

{
    "learning_rate": 2e-5  // Default value
}

Recommendations:

  • Small models (< 3B parameters): 2e-5 to 5e-5

  • Medium models (3B-7B parameters): 1e-5 to 3e-5

  • Large models (> 7B parameters): 5e-6 to 1e-5

Number of Epochs

{
    "epochs": 1  // Default value
}

Guidelines:

  • Small datasets (< 1000 examples): 5-10 epochs

  • Medium datasets (1000-10000 examples): 3-5 epochs

  • Large datasets (> 10000 examples): 1-3 epochs

GPU Selection

{
    "gpu": "NVIDIA_TESLA_A100"  // or "NVIDIA_TESLA_V100"
}

Choose based on:

  • Model size

    • To fine-tune small model like GPT2 or BERT, please use NVIDIA_TESLA_V100 for successful fine-tuning

    • To fine-tune medium/large model like LlaMa, StableCode, Qwen or SmolLM2, please use NVIDIA_TESLA_A100 for successful fine-tuning

  • Dataset size

    • For complex dataset, please use NVIDIA_TESLA_A100

  • Training budget

  • Required training speed

    • For faster training process, please use NVIDIA_TESLA_A100

Step-by-Step Fine-tuning Guide

1. Prepare Your Dataset

Ensure your dataset meets these criteria:

  • Clean, high-quality data

  • Proper format (CSV, JSON, TXT, or Parquet)

  • Appropriate size for your model

  • Correct column names

2. Choose following parameters for fine-tuning successfully

3. Monitor Training Progress

The system provides real-time metrics:

  • Loss values

  • Learning rate

  • Training time

  • Training logs

Best Practices

1. Dataset Quality

  • Clean your data thoroughly

  • Remove duplicates

  • Balance your dataset

  • Validate data format

2. Model Selection

  • Start with smaller models for testing

  • Consider computational resources

  • Match model size to dataset size

  • Check model licenses

3. Training Parameters

  • Start with recommended defaults

  • Adjust based on monitoring

  • Use early stopping when needed

  • Save checkpoints regularly

4. Resource Management

  • Monitor GPU memory usage

  • Schedule large jobs appropriately

  • Implement proper error handling

  • Back up important checkpoints

Common Issues and Solutions

1. Out of Memory Errors

Solution:

  • Reduce batch size

  • Use smaller sequence lengths

  • Choose a smaller model

  • Upgrade GPU

2. Poor Training Results

Solution:

  • Check data quality

  • Adjust learning rate

  • Increase epochs

  • Validate preprocessing

3. Slow Training Speed

Solution:

  • Use more powerful GPU

  • Optimize batch size

  • Reduce dataset size

  • Simplify model architecture

Performance Optimization Tips

Epoch Planning

  • Use early stopping

  • Monitor validation metrics

  • Implement checkpointing

  • Save best models

Post-Training Steps

1. Model Evaluation

  • Test on validation set

  • Compare with baseline

  • Check for overfitting

  • Validate outputs

2. Model Storage

  • Automatic save to Bakery

  • Version control

  • Metadata storage

For additional assistance, contact:

Join our

Visit our

Discord
Documentation
Available Base Models in Bakery Marketplace
Available Models from HuggingFace
Recommended Configuration Bakery Model Fine-tuning