Bagel
  • Bakery Docs
    • Getting Started with the Bakery
    • Creating and Managing API Keys
    • Creating Datasets and Models
    • Uploading Files to the Bakery
    • Fine-Tuning Using The Bakery: Step-by-Step Guide
    • Purchasing and Publishing Assets to Marketplace
    • HuggingFace Integration Guide
  • Bagel Packages
    • Bagel Python Package
      • Bagels RAW - finetuning asset Quickstart.md
      • Fine-Tuning AI Models with Bagel - Overview
      • Fine-tuning LLama3 Models on Bagel
    • Bagel JavaScript Package
      • QuickStart
      • Fine-Tuning AI Models
  • Dataset Preparation Guide for Fine-tuning
  • Supported LLM Models for Fine-tuning in Bakery
  • How to Fine-tune LLM Successfully in Bakery
Powered by GitBook
On this page
  • Prerequisites
  • Installation
  • Step 1: Bakery Setup
  • Step 2: Create Asset on Bagel
  • Step 3: Upload asset file in google colab
  • Step 4: Upload Dataset in Bakery
  • Step 5: Purchase a LLaMA3 Model from Bakery
  • You can also use your CLI to purchase assets by running this code:
  • Step 6: Fine-tune the LLama3 Model on Bakery
  • Step 7: Check Job Status
  • Step 8: Download Fine-tuned Model from Bakery
  • Step 9: Load Model using transformers
  • Step 10: ChatBot
  • Step 11: Clean GPU Memory
  • Step 12: Delete the Base Model & Fine-Tuned Model
  • Reference
  1. Bagel Packages
  2. Bagel Python Package

Fine-tuning LLama3 Models on Bagel

PreviousFine-Tuning AI Models with Bagel - OverviewNextBagel JavaScript Package

Last updated 5 months ago

This tutorial will guide you through the steps to fine-tune LLaMA3 models on Bagel. We will start by creating a raw asset, downloading a dataset, purchasing a LLaMA3 model from the Bakery marketplace, and finally, fine-tuning the model.

Prerequisites

  • Create Bakery Account from

  • Get user_id & api_key from bakery.

  • Google Colab (Chrome Browser).

Installation

To install the Bagel Python client, run the following commands in your terminal:

!pip install diffusers==0.27.1
!pip install accelerate==0.28.0
!pip install transformers==4.35.2
!pip install bitsandbytes peft
!pip install bagelML

Step 1: Bakery Setup

import bagel
user_id = ""
client = bagel.Client()

This initializes the Bagel client, which will be used to interact with the Bagel server throughout the tutorial.

import os
from getpass import getpass

# Copy & Paste the API Key from https://bakery.bagel.net/api-key
DEMO_KEY_IN_USE = getpass("Enter your API key: ")

# Set environment variable
os.environ['BAGEL_API_KEY'] = DEMO_KEY_IN_USE

Here, we'll prompt the user to enter their API key securely (without displaying it on the screen) and then sets it as an environment variable named BAGEL_API_KEY.

Step 2: Create Asset on Bagel

This code defines a function create_asset that creates a new dataset /asset using the bagel client and returns the resulting asset ID after creation.

def create_asset(client):
    title = "dataset_chatbot_test_6" #@param
    description = "Test dataset created from python client" #@param
    payload = {
                "title": title,
                "dataset_type": "RAW",
                "tags": [
                    ""
                ],
                "category": "AI",
                "details": description,
                "user_id": user_id
            }

    dataset = client.create_asset(
        payload=payload
    )
    return dataset

asset_id = create_asset(client)
print(asset_id)

You can also retrieve asset info by adding the following

asset_info = client.get_asset_info(asset_id)
print(asset_info)

Step 3: Upload asset file in google colab

This code uploads files from your local machine to the Google Colab environment . Please enusre you're using Google Chrome browser.

# This will only work on Google Chrome
from google.colab import files
uploaded = files.upload()

For printing the name and size in bytes of each file, please use the code below:

for filename, file_content in uploaded.items():
    print(f"Uploaded file: {filename} - Size: {len(file_content)} bytes")

Step 4: Upload Dataset in Bakery

This code uploads the dataset of your choice to your created asset using the bagel client, and then prints the response from the upload operation.

file_path = filename

response = client.file_upload(
                            file_path=file_path,
                            asset_id=asset_id
                          )
print(response)

Step 5: Purchase a LLaMA3 Model from Bakery

To fine-tune a Llama3 model, you first need to purchase it from the Bakery marketplace:

  1. Log in to your Bagel account and navigate to the Model tab on your Bakery homepage.

  2. Browse the available models and purchase the Llama3-8b model for this example.

It is a large language model designed for advanced text generation and dialogue applications. It is part of the Meta Llama 3 family, which includes both an 8 billion parameter and a 70 billion parameter version. This model utilizes an optimized transformer architecture and is fine-tuned with supervised learning and reinforcement learning with human feedback to enhance its helpfulness and safety.

  1. After purchasing, the model will be available under the My Models section.

You can also use your CLI to purchase assets by running this code:

Buy LLama3 Model From Bakery Marketplace

llama3_model_id = "3323b6c4-06ef-4949-b239-1a2b220e211d" #This is LLama3-8b model
response = client.buy_asset(
                    asset_id=llama3_model_id,
                    user_id=user_id
                )
print(response)

Step 6: Fine-tune the LLama3 Model on Bakery

Now that you have your dataset and model, you can fine-tune the LLaMA3 model using the following code:

# Getting Column names from selected Raw dataset
client.get_dataset_column_names(asset_id=asset_id, file_name=filename)
title = "test_model_client2005" #@param
file_name = file_path.split("/")[-1]
base_model = response[1]['new_dataset_id']
response = client.fine_tune(title=title,
                            user_id=user_id,
                            asset_id = asset_id,
                            file_name = file_name,
                            base_model = base_model,
                            epochs = 3,
                            learning_rate = 0.01,
                            input_column="prompt",
                            output_column="completion")
print(response)

This will initiate the finetuning process and print asset id of the finetuned model.

Step 7: Check Job Status

model_id = response[1]
response = client.get_job_by_asset_id(model_id)
response

This will print the job id of the finetuned model

Status of the Job can also be shown by using

import time
while True:
  response = client.get_job_by_asset_id(model_id)
  print(response[1]["job_status"])
  if response[1]["job_status"] == "JobState.JOB_STATE_SUCCEEDED":
    print("Model fine-tuned successfully")
    break
  if response[1]["job_status"] == "JobState.JOB_STATE_FAILED":
    print("Model fine-tuning failed")
    break
  time.sleep(10)

Step 8: Download Fine-tuned Model from Bakery

response = client.download_model(model_id)
response
!mkdir adapter_model

# Please provide the model zip file path
!unzip /content/{model_id}.zip -d /content/adapter_model

Step 9: Load Model using transformers

This code loads the pre-trained language model with an adapter using a configuration that minimizes GPU memory usage by quantizing the model to 8-bit precision. It first loads the adapter configuration to determine the base model, then initializes the model with the specified quantization settings. The tokenizer is loaded separately, and the model is only loaded when needed, ensuring efficient resource management.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel, PeftConfig

# Adapter model path
adapter_path = "adapter_model"

# Load only the adapter config initially
peft_config = PeftConfig.from_pretrained(adapter_path)
base_model_name = "bagelnet/Llama-3-8B"

# Function to load the model with minimal GPU memory usage
def load_model():
    quantization_config = BitsAndBytesConfig(
        load_in_8bit=True,
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False,
    )

    base_model = AutoModelForCausalLM.from_pretrained(
        base_model_name,
        quantization_config=quantization_config,
        device_map="auto",
        trust_remote_code=True
    )

    model = PeftModel.from_pretrained(base_model, adapter_path)
    return model

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# This function will load the model only when needed
model = None

if model is None:
    print("Loading model... This may take a moment.")
    model = load_model()
    print("Model loaded successfully.")

Step 10: ChatBot

Now lets use a generate_response function to generate a relevant and concise reply based on the finetuned model. We'll be using specific generation parameters to ensure a relevant and concise reply.

def generate_response(conversation_history, max_length=200):
    # Prepare the prompt
    prompt = f"You are a helpful AI assistant. Provide a concise and relevant answer to the user's question. Only reply to the question\n\n{conversation_history}"
    # prompt += conversation_history
    # prompt += "Assistant: "

    # Encode the input prompt
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)

    # Generate a response
    with torch.no_grad():
        output = model.generate(
            input_ids,
            max_length=input_ids.shape[1] + max_length,
            num_return_sequences=1,
            # no_repeat_ngram_size=2,
            temperature=0.7,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id,
        )

    # Decode and return the response
    response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
    return response.strip()

Lets use a chatbot function to initiate a simple text-based chatbot that interacts with the user in a loop, continuously prompting for user input and generating responses based on the input. It prints a welcome message, processes user input to generate a response using the generate_response function, and displays the chatbot's reply. The loop terminates when the user types "quit," ending the conversation and printing a farewell message.

def chatbot():
    print("Chatbot: Hello! I'm your AI assistant. How can I help you today? (Type 'quit' to exit)")

    while True:
        conversation = ''
        user_input = input("You: ").strip()

        if user_input.lower() == 'quit':
            print("Chatbot: Goodbye! Have a great day!")
            break

        conversation = f"{user_input}\n"
        response = generate_response(conversation)
        print("Chatbot:", response)

if __name__ == "__main__":
    chatbot()

Step 11: Clean GPU Memory

import gc
del model
del tokenizer
torch.cuda.empty_cache()
gc.collect()
torch.cuda.reset_peak_memory_stats()
torch.cuda.empty_cache()

Step 12: Delete the Base Model & Fine-Tuned Model

client.delete_asset(dataset_id=base_model)
client.delete_asset(dataset_id=model_id)

Reference

You can upload any text, CSV, JSON or Parquet format file. Generally, parquet format is widely used for LLama3 model fine-tuning. You can generate your own parquet format data from

- Please refer to the google colab tutorial for interactive development

Need extra help or just want to connect with other developers? Join our community on 👾

here
here
LLama3 bot tutorial
Discord for additional support