HuggingFace Integration Guide

Introduction

Welcome to Bagel's HuggingFace integration guide. Our platform provides seamless access to Hugging Face's extensive collection of datasets and models, enabling you to leverage state-of-the-art machine learning resources directly within your Bagel workspace. This integration streamlines the process of discovering, cloning, and fine-tuning models while maintaining enterprise-grade security and performance.

Dataset Management

Discovering Datasets

Our advanced dataset discovery system allows you to explore Hugging Face's vast collection of datasets efficiently. You can search through thousands of datasets using various filters and sorting options to find exactly what you need for your projects.

Search Filters

Data Types: Narrow your search by selecting specific data types such as text, images, audio, or tabular data. This helps you focus on datasets that match your project requirements.
Formats: Filter datasets by their file format like JSON, CSV, Parquet format.
Popularities: Filter datasets by their popularities like Trending, No of Downloads, No of Likes, Last Modification time.
Search Option: Search any model directly from the search field in this page.

Sorting and Organization

The search results can be organized based on various metrics to help you make informed decisions:

Download Count: Identify widely-used datasets in the community
Last Updated: Find the most recently maintained datasets
Likes: Discover highly-liked datasets from the community

Cloning Datasets

Our platform provides a straightforward process for cloning datasets from Hugging Face to your Bagel workspace. This feature enables you to create your own copy of a dataset while maintaining all necessary version control and documentation.

Cloning Process

Selection: Browse through available datasets and select the one you want to clone. Review important details such as size, format, and license before proceeding.
Clone Dataset: Click on the button of "Clone Dataset" to start cloning process. It may take time to complete full cloning. Please check in the "My Datasets" section & files that you want to use for fine-tuning
Verification: After cloning, our system automatically verifies the integrity of the cloned data and sets up appropriate access permissions.

Model Management

Discovering Models

Our model discovery interface provides comprehensive access to Hugging Face's model hub, allowing you to find pre-trained models that best suit your needs.

Search Capabilities

Model Architecture: Find models based on specific architectures like BERT, GPT, or custom implementations.
Task Types: Filter models by their primary tasks such as classification, generation, or translation.
Framework: Select models compatible with your preferred framework.
Model Size: Choose models that fit your computational resources.

Cloning Models

The model cloning feature allows you to create your own instance of any Hugging Face model within your Bagel workspace.

Fine-tuning Capabilities

Preparation Process

Before fine-tuning a model, our system helps you prepare your data and configure the training process appropriately. This includes:

Data validation and formatting
Resource allocation planning
Training objective definition

Training Management

Our fine-tuning interface provides comprehensive control over the training process:

Monitoring Features

Real-time training logs visualization

Best Practices

Performance Optimization

We recommend following these guidelines to ensure optimal performance:

Select appropriate dataset file & Input Output columns for fine-tuning
Select correct GPU based on the parameters of the base model
Wait for the results of fine-tuning before try with different fine-tuning

Support and Resources

Getting Help

Our support team is available to help you with any questions or issues:

Documentation updates at the official docs.
Community forums for peer support in Discord

PreviousPurchasing and Publishing Assets to Marketplace NextBagel Packages

Last updated 7 months ago