LLMBuilder Documentation¶

🤖 LLMBuilder

A toolkit for building, training, and deploying language models

What is LLMBuilder?¶

LLMBuilder is a framework for training and fine-tuning Large Language Models (LLMs). It provides a complete pipeline to go from raw documents to deployable models, with support for both CPU and GPU training.

Key Features¶

Easy to Use: Simple commands to train and deploy models
Multi-Format Support: Process HTML, Markdown, EPUB, PDF, TXT files
Complete Pipeline: From data processing to model deployment
Flexible: Works on both CPU and GPU

Quick Start¶

# Install LLMBuilder
pip install llmbuilder

# Create a new project
llmbuilder init my_project

# Navigate to your project
cd my_project

# Follow the step-by-step instructions in README.md

Simple Example¶

import llmbuilder as lb

# Load configuration
cfg = lb.load_config(preset="cpu_small")

# Build model
model = lb.build_model(cfg.model)

# Prepare data
from llmbuilder.data import TextDataset
dataset = TextDataset("data.txt", block_size=cfg.model.max_seq_length)

# Train model
results = lb.train_model(model, dataset, cfg.training)

# Generate text
text = lb.generate_text(
    model_path="./checkpoints/model.pt",
    tokenizer_path="./tokenizers",
    prompt="The future of AI is",
    max_new_tokens=50
)
print(text)

Getting Started¶

Installation - Install LLMBuilder
Quick Start - Train your first model
User Guide - Learn all features

Community & Support¶

GitHub: Qubasehq/llmbuilder
Issues: Report bugs

Built by Qub△se