HuggingFace Integration Guide
Last updated
Last updated
Welcome to Bagel's HuggingFace integration guide. Our platform provides seamless access to Hugging Face's extensive collection of datasets and models, enabling you to leverage state-of-the-art machine learning resources directly within your Bagel workspace. This integration streamlines the process of discovering, cloning, and fine-tuning models while maintaining enterprise-grade security and performance.
Our advanced dataset discovery system allows you to explore Hugging Face's vast collection of datasets efficiently. You can search through thousands of datasets using various filters and sorting options to find exactly what you need for your projects.
Search Filters
Data Types: Narrow your search by selecting specific data types such as text, images, audio, or tabular data. This helps you focus on datasets that match your project requirements.
Formats: Filter datasets by their file format like JSON, CSV, Parquet format.
Popularities: Filter datasets by their popularities like Trending, No of Downloads, No of Likes, Last Modification time.
Search Option: Search any model directly from the search field in this page.
Sorting and Organization
The search results can be organized based on various metrics to help you make informed decisions:
Download Count: Identify widely-used datasets in the community
Last Updated: Find the most recently maintained datasets
Likes: Discover highly-liked datasets from the community
Our platform provides a straightforward process for cloning datasets from Hugging Face to your Bagel workspace. This feature enables you to create your own copy of a dataset while maintaining all necessary version control and documentation.
Cloning Process
Selection: Browse through available datasets and select the one you want to clone. Review important details such as size, format, and license before proceeding.
Clone Dataset: Click on the button of "Clone Dataset" to start cloning process. It may take time to complete full cloning. Please check in the "My Datasets" section & files that you want to use for fine-tuning
Verification: After cloning, our system automatically verifies the integrity of the cloned data and sets up appropriate access permissions.
Our model discovery interface provides comprehensive access to Hugging Face's model hub, allowing you to find pre-trained models that best suit your needs.
Search Capabilities
Model Architecture: Find models based on specific architectures like BERT, GPT, or custom implementations.
Task Types: Filter models by their primary tasks such as classification, generation, or translation.
Framework: Select models compatible with your preferred framework.
Model Size: Choose models that fit your computational resources.
The model cloning feature allows you to create your own instance of any Hugging Face model within your Bagel workspace.
Before fine-tuning a model, our system helps you prepare your data and configure the training process appropriately. This includes:
Data validation and formatting
Resource allocation planning
Training objective definition
Our fine-tuning interface provides comprehensive control over the training process:
Monitoring Features
Real-time training logs visualization
We recommend following these guidelines to ensure optimal performance:
Select appropriate dataset file & Input Output columns for fine-tuning
Select correct GPU based on the parameters of the base model
Wait for the results of fine-tuning before try with different fine-tuning
Our support team is available to help you with any questions or issues:
Documentation updates at the .
Community forums for peer support in