Quick Start¶
Get up and running with LLMBuilder quickly! This guide will walk you through training your first language model.
Simple Setup¶
Step 1: Install LLMBuilder¶
Step 2: Create a New Project¶
Step 3: Prepare Your Data¶
Add your text data files to the data/raw/ directory.
Step 4: Process Your Data¶
Step 5: Train a Tokenizer¶
Step 6: Train Your Model¶
Step 7: Generate Text¶
llmbuilder generate text -m models/checkpoints/latest.pt -t tokenizer/ -p "Artificial intelligence" --max-tokens 50
🎉 Congratulations! You've just trained and used your first language model with LLMBuilder!
Python API Quick Start¶
Prefer Python code? Here's the same workflow using the Python API:
import llmbuilder as lb
# Load configuration
cfg = lb.load_config(preset="cpu_small")
# Build model
model = lb.build_model(cfg.model)
# Prepare data
from llmbuilder.data import TextDataset
dataset = TextDataset("data.txt", block_size=cfg.model.max_seq_length)
# Train model
results = lb.train_model(model, dataset, cfg.training)
# Generate text
text = lb.generate_text(
model_path="./checkpoints/model.pt",
tokenizer_path="./tokenizers",
prompt="Artificial intelligence",
max_new_tokens=50
)
print(text)
What Just Happened?¶
- Project Creation: Set up the directory structure
- Data Processing: Loaded and cleaned your text data
- Tokenization: Created a vocabulary and tokenized the text
- Model Training: Trained a transformer model
- Text Generation: Used the model to generate new text
Next Steps¶
- Installation Guide - Detailed installation instructions
- User Guide - Learn about configuration options
- CLI Reference - Complete CLI documentation
You now have a working LLMBuilder setup! Try experimenting with different data and parameters.