Bagel
  • Bakery Docs
    • Getting Started with the Bakery
    • Creating and Managing API Keys
    • Creating Datasets and Models
    • Uploading Files to the Bakery
    • Fine-Tuning Using The Bakery: Step-by-Step Guide
    • Purchasing and Publishing Assets to Marketplace
    • HuggingFace Integration Guide
  • Bagel Packages
    • Bagel Python Package
      • Bagels RAW - finetuning asset Quickstart.md
      • Fine-Tuning AI Models with Bagel - Overview
      • Fine-tuning LLama3 Models on Bagel
    • Bagel JavaScript Package
      • QuickStart
      • Fine-Tuning AI Models
  • Dataset Preparation Guide for Fine-tuning
  • Supported LLM Models for Fine-tuning in Bakery
  • How to Fine-tune LLM Successfully in Bakery
Powered by GitBook
On this page
  • Introduction
  • Dataset Management
  • Model Management
  • Fine-tuning Capabilities
  • Best Practices
  • Support and Resources
  1. Bakery Docs

HuggingFace Integration Guide

PreviousPurchasing and Publishing Assets to MarketplaceNextBagel Packages

Last updated 5 months ago

Introduction

Welcome to Bagel's HuggingFace integration guide. Our platform provides seamless access to Hugging Face's extensive collection of datasets and models, enabling you to leverage state-of-the-art machine learning resources directly within your Bagel workspace. This integration streamlines the process of discovering, cloning, and fine-tuning models while maintaining enterprise-grade security and performance.

Dataset Management

Discovering Datasets

Our advanced dataset discovery system allows you to explore Hugging Face's vast collection of datasets efficiently. You can search through thousands of datasets using various filters and sorting options to find exactly what you need for your projects.

Search Filters

  • Data Types: Narrow your search by selecting specific data types such as text, images, audio, or tabular data. This helps you focus on datasets that match your project requirements.

  • Formats: Filter datasets by their file format like JSON, CSV, Parquet format.

  • Popularities: Filter datasets by their popularities like Trending, No of Downloads, No of Likes, Last Modification time.

  • Search Option: Search any model directly from the search field in this page.

Sorting and Organization

The search results can be organized based on various metrics to help you make informed decisions:

  • Download Count: Identify widely-used datasets in the community

  • Last Updated: Find the most recently maintained datasets

  • Likes: Discover highly-liked datasets from the community

Cloning Datasets

Our platform provides a straightforward process for cloning datasets from Hugging Face to your Bagel workspace. This feature enables you to create your own copy of a dataset while maintaining all necessary version control and documentation.

Cloning Process

  1. Selection: Browse through available datasets and select the one you want to clone. Review important details such as size, format, and license before proceeding.

  2. Clone Dataset: Click on the button of "Clone Dataset" to start cloning process. It may take time to complete full cloning. Please check in the "My Datasets" section & files that you want to use for fine-tuning

  3. Verification: After cloning, our system automatically verifies the integrity of the cloned data and sets up appropriate access permissions.

Model Management

Discovering Models

Our model discovery interface provides comprehensive access to Hugging Face's model hub, allowing you to find pre-trained models that best suit your needs.

Search Capabilities

  • Model Architecture: Find models based on specific architectures like BERT, GPT, or custom implementations.

  • Task Types: Filter models by their primary tasks such as classification, generation, or translation.

  • Framework: Select models compatible with your preferred framework.

  • Model Size: Choose models that fit your computational resources.

Cloning Models

The model cloning feature allows you to create your own instance of any Hugging Face model within your Bagel workspace.

Fine-tuning Capabilities

Preparation Process

Before fine-tuning a model, our system helps you prepare your data and configure the training process appropriately. This includes:

  • Data validation and formatting

  • Resource allocation planning

  • Training objective definition

Training Management

Our fine-tuning interface provides comprehensive control over the training process:

Monitoring Features

  • Real-time training logs visualization

Best Practices

Performance Optimization

We recommend following these guidelines to ensure optimal performance:

  • Select appropriate dataset file & Input Output columns for fine-tuning

  • Select correct GPU based on the parameters of the base model

  • Wait for the results of fine-tuning before try with different fine-tuning

Support and Resources

Getting Help

Our support team is available to help you with any questions or issues:

Documentation updates at the .

Community forums for peer support in

official docs
Discord
HuggingFace Trending Datasets List
HuggingFace Trending Models List