Skip to content

CLI Overview

LLMBuilder provides a comprehensive command-line interface (CLI) that makes it easy to train, fine-tune, and deploy language models without writing code. This guide covers all CLI commands and their usage.

🚀 Getting Started

Installation Verification

First, verify that LLMBuilder is properly installed:

llmbuilder --version
llmbuilder --help

Welcome Command

For first-time users, start with the welcome command:

llmbuilder welcome

This interactive command guides you through:

  • Learning about LLMBuilder
  • Creating configuration files
  • Processing data
  • Training models
  • Generating text

📋 Command Structure

LLMBuilder CLI follows a hierarchical command structure:

llmbuilder [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS] [ARGS]

Global Options

Option Description
--version Show version and exit
--verbose, -v Enable verbose output
--help Show help message

Main Commands

Command Description
welcome Interactive getting started guide
info Display package information
config Configuration management
data Data processing and loading
train Model training
finetune Model fine-tuning
generate Text generation
model Model management
export Model export utilities

🎯 Command Categories

Information Commands

welcome

Interactive getting started experience:

llmbuilder welcome

Features:

  • Guided setup process
  • Learn about LLMBuilder capabilities
  • Quick access to common tasks
  • Beginner-friendly explanations

info

Display package information and credits:

llmbuilder info

Shows:

  • Package version and description
  • Available modules and their purposes
  • Quick command examples
  • Links to documentation and support

Configuration Commands

config create

Create configuration files with presets:

# Interactive configuration creation
llmbuilder config create --interactive

# Create from preset
llmbuilder config create --preset cpu_small --output config.json

# Available presets: cpu_small, gpu_medium, gpu_large, inference

config validate

Validate configuration files:

llmbuilder config validate config.json

config list

List available configuration presets:

llmbuilder config list

Data Processing Commands

data load

Load and preprocess text data from various formats:

# Interactive data loading
llmbuilder data load --interactive

# Process specific directory
llmbuilder data load \
  --input ./documents \
  --output clean_text.txt \
  --format all \
  --clean \
  --min-length 100

data tokenizer

Train tokenizers on text data:

llmbuilder data tokenizer \
  --input training_data.txt \
  --output ./tokenizer \
  --vocab-size 16000 \
  --model-type bpe

Training Commands

train model

Train language models from scratch:

# Interactive training setup
llmbuilder train model --interactive

# Direct training
llmbuilder train model \
  --config config.json \
  --data training_data.txt \
  --tokenizer ./tokenizer \
  --output ./model \
  --epochs 10 \
  --batch-size 16

train resume

Resume training from checkpoints:

llmbuilder train resume \
  --checkpoint ./model/checkpoint_1000.pt \
  --data training_data.txt \
  --output ./continued_model

Fine-tuning Commands

finetune model

Fine-tune pre-trained models:

llmbuilder finetune model \
  --model ./pretrained_model/model.pt \
  --dataset domain_data.txt \
  --output ./finetuned_model \
  --epochs 5 \
  --lr 5e-5 \
  --use-lora

Generation Commands

generate text

Generate text with trained models:

# Interactive generation
llmbuilder generate text --setup

# Direct generation
llmbuilder generate text \
  --model ./model/model.pt \
  --tokenizer ./tokenizer \
  --prompt "The future of AI is" \
  --max-tokens 100 \
  --temperature 0.8

# Interactive chat mode
llmbuilder generate text \
  --model ./model/model.pt \
  --tokenizer ./tokenizer \
  --interactive

Model Management Commands

model create

Create new model architectures:

llmbuilder model create \
  --vocab-size 16000 \
  --layers 12 \
  --heads 12 \
  --dim 768 \
  --output ./new_model

model info

Display model information:

llmbuilder model info ./model/model.pt

model evaluate

Evaluate model performance:

llmbuilder model evaluate \
  ./model/model.pt \
  --dataset test_data.txt \
  --batch-size 32

Export Commands

export gguf

Export models to GGUF format:

llmbuilder export gguf \
  ./model/model.pt \
  --output model.gguf \
  --quantization q4_0

export onnx

Export models to ONNX format:

llmbuilder export onnx \
  ./model/model.pt \
  --output model.onnx \
  --opset 11

export quantize

Quantize models for deployment:

llmbuilder export quantize \
  ./model/model.pt \
  --output quantized_model.pt \
  --method dynamic \
  --bits 8

🎨 Interactive Features

Guided Setup

Many commands support --interactive or --setup flags for guided experiences:

# Interactive data loading
llmbuilder data load --interactive

# Interactive model training
llmbuilder train model --interactive

# Interactive text generation setup
llmbuilder generate text --setup

Progress Indicators

LLMBuilder provides rich progress indicators:

# Training progress with real-time metrics
llmbuilder train model --data data.txt --output model/ --verbose

# Data processing with progress bars
llmbuilder data load --input docs/ --output data.txt --verbose

Colorful Output

The CLI uses colors and emojis for better user experience:

  • 🟢 Green: Success messages
  • 🔵 Blue: Information and headers
  • 🟡 Yellow: Warnings and prompts
  • 🔴 Red: Errors
  • 🎯 Emojis: Visual indicators for different operations

🔧 Advanced Usage

Configuration Files

Use configuration files for complex setups:

# Create configuration
llmbuilder config create --preset gpu_medium --output training_config.json

# Use configuration in training
llmbuilder train model --config training_config.json --data data.txt --output model/

Environment Variables

Set environment variables for default behavior:

# Set default device
export LLMBUILDER_DEVICE=cuda

# Set cache directory
export LLMBUILDER_CACHE_DIR=/path/to/cache

# Enable debug logging
export LLMBUILDER_LOG_LEVEL=DEBUG

Batch Processing

Process multiple files or configurations:

# Process multiple data directories
llmbuilder data load \
  --input "dir1,dir2,dir3" \
  --output combined_data.txt

# Train multiple model variants
for preset in cpu_small gpu_medium gpu_large; do
  llmbuilder config create --preset $preset --output ${preset}_config.json
  llmbuilder train model --config ${preset}_config.json --data data.txt --output ${preset}_model/
done

Pipeline Automation

Chain commands for complete workflows:

#!/bin/bash
# Complete training pipeline

# 1. Process data
llmbuilder data load \
  --input ./raw_documents \
  --output training_data.txt \
  --clean --min-length 100

# 2. Train tokenizer
llmbuilder data tokenizer \
  --input training_data.txt \
  --output ./tokenizer \
  --vocab-size 16000

# 3. Create configuration
llmbuilder config create \
  --preset gpu_medium \
  --output model_config.json

# 4. Train model
llmbuilder train model \
  --config model_config.json \
  --data training_data.txt \
  --tokenizer ./tokenizer \
  --output ./trained_model

# 5. Test generation
llmbuilder generate text \
  --model ./trained_model/model.pt \
  --tokenizer ./tokenizer \
  --prompt "Test generation" \
  --max-tokens 50

echo "Training pipeline completed!"

🚨 Error Handling

Common Error Messages

Configuration Errors

 Configuration validation failed: num_heads (8) must divide embedding_dim (512)
💡 Try: Set num_heads to 4, 8, or 16

Data Errors

 No supported files found in directory: ./documents
💡 Supported formats: .txt, .pdf, .docx, .html, .md

Memory Errors

 CUDA out of memory
💡 Try: Reduce batch size with --batch-size 4 or use CPU with --device cpu

Model Errors

 Model file not found: ./model/model.pt
💡 Check the model path or train a model first with: llmbuilder train model

Debugging Tips

Enable verbose output for detailed information:

llmbuilder --verbose train model --data data.txt --output model/

Check system information:

llmbuilder info --system

Validate configurations before use:

llmbuilder config validate config.json --strict

📊 Output and Logging

Standard Output

LLMBuilder provides structured output:

🚀 Starting model training...
📊 Dataset: 10,000 samples
🧠 Model: 12.5M parameters
📈 Training progress:
  Epoch 1/10: loss=3.45, lr=0.0003, time=2m 15s
  Epoch 2/10: loss=2.87, lr=0.0003, time=2m 12s
  ...
 Training completed successfully!
💾 Model saved to: ./model/model.pt

Log Files

Training and processing logs are automatically saved:

./model/
├── model.pt              # Trained model
├── config.json           # Training configuration
├── training.log          # Detailed training logs
├── metrics.json          # Training metrics
└── checkpoints/          # Training checkpoints
    ├── checkpoint_1000.pt
    ├── checkpoint_2000.pt
    └── ...

JSON Output

Use --json flag for machine-readable output:

llmbuilder model info ./model/model.pt --json
{
  "model_path": "./model/model.pt",
  "parameters": 12500000,
  "architecture": {
    "num_layers": 12,
    "num_heads": 12,
    "embedding_dim": 768,
    "vocab_size": 16000
  },
  "training_info": {
    "final_loss": 2.45,
    "training_time": "45m 23s",
    "epochs": 10
  }
}

🎯 Best Practices

1. Start Interactive

For new users, always start with interactive modes:

llmbuilder welcome
llmbuilder data load --interactive
llmbuilder train model --interactive

2. Use Configurations

Save and reuse configurations for consistency:

# Create and save configuration
llmbuilder config create --preset gpu_medium --output my_config.json

# Reuse configuration
llmbuilder train model --config my_config.json --data data.txt --output model/

3. Validate Before Training

Always validate configurations and data:

llmbuilder config validate config.json
llmbuilder data load --input data/ --output test.txt --dry-run

4. Monitor Progress

Use verbose mode for long-running operations:

llmbuilder --verbose train model --config config.json --data data.txt --output model/

5. Save Intermediate Results

Use checkpointing and intermediate saves:

llmbuilder train model \
  --config config.json \
  --data data.txt \
  --output model/ \
  --save-every 1000 \
  --eval-every 500

CLI Tips

  • Use tab completion if available in your shell
  • Combine --help with any command to see all options
  • Use --dry-run flags when available to test commands
  • Save successful command combinations as shell scripts
  • Use configuration files for complex setups