Frequently Asked Questions¶
Common questions and answers about LLMBuilder usage, troubleshooting, and best practices.
🚀 Getting Started¶
Q: What are the minimum system requirements?¶
A: LLMBuilder requires:
- Python 3.8+ (3.9+ recommended)
- 4GB RAM minimum (8GB+ recommended)
- 2GB free disk space for installation and basic models
- Optional: NVIDIA GPU with 4GB+ VRAM for faster training
Q: Should I use CPU or GPU for training?¶
A:
- CPU: Good for learning, small models, and development. Use
preset="cpu_small"
- GPU: Recommended for production training and larger models. Use
preset="gpu_medium"
orpreset="gpu_large"
- Mixed: Start with CPU for prototyping, then move to GPU for final training
Q: How long does it take to train a model?¶
A: Training time depends on several factors:
- Small model (10M params): 30 minutes - 2 hours on CPU, 5-15 minutes on GPU
- Medium model (50M params): 2-8 hours on CPU, 30 minutes - 2 hours on GPU
- Large model (200M+ params): Days on CPU, 2-12 hours on GPU
🔧 Configuration¶
Q: Which configuration preset should I use?¶
A: Choose based on your hardware and use case:
Preset | Use Case | Hardware | Model Size | Training Time |
---|---|---|---|---|
tiny |
Testing, debugging | Any | ~1M params | Minutes |
cpu_small |
Learning, development | CPU | ~10M params | Hours |
gpu_medium |
Production training | Single GPU | ~50M params | Hours |
gpu_large |
High-quality models | High-end GPU | ~200M+ params | Days |
Q: How do I customize model architecture?¶
A: Modify the model configuration:
from llmbuilder.config import ModelConfig
config = ModelConfig(
vocab_size=16000, # Match your tokenizer
num_layers=12, # More layers = more capacity
num_heads=12, # Should divide embedding_dim evenly
embedding_dim=768, # Larger = more capacity
max_seq_length=1024, # Longer sequences = more memory
dropout=0.1 # Higher = more regularization
)
Q: What vocabulary size should I use?¶
A: Vocabulary size depends on your data and use case:
- 8K-16K: Small datasets, specific domains
- 16K-32K: General purpose, balanced size
- 32K-64K: Large datasets, multilingual models
- 64K+: Very large datasets, maximum coverage
📊 Data and Training¶
Q: How much training data do I need?¶
A: Data requirements vary by model size and quality goals:
- Minimum: 1MB of text (~200K words) for basic functionality
- Recommended: 10MB+ of text (~2M words) for good quality
- Optimal: 100MB+ of text (~20M words) for high quality
- Production: 1GB+ of text (~200M words) for best results
Q: What file formats are supported for training data?¶
A: LLMBuilder supports:
- Text files:
.txt
,.md
(best quality) - Documents:
.pdf
,.docx
(good quality) - Web content:
.html
,.htm
(moderate quality) - Presentations:
.pptx
(basic support) - Data files:
.csv
,.json
(with proper formatting)
Q: How do I handle out-of-memory errors?¶
A: Try these solutions in order:
- Reduce batch size:
- Enable gradient checkpointing:
- Use gradient accumulation:
- Reduce sequence length:
- Use CPU training:
Q: My model isn't learning (loss not decreasing). What's wrong?¶
A: Common causes and solutions:
- Learning rate too high: Reduce to 1e-4 or 1e-5
- Learning rate too low: Increase to 3e-4 or 5e-4
- Bad data: Check for corrupted or repetitive text
- Wrong tokenizer: Ensure vocab_size matches tokenizer
- Insufficient warmup: Increase warmup_steps to 1000+
Q: How do I know if my model is overfitting?¶
A: Signs of overfitting:
- Training loss decreases but validation loss increases
- Generated text is repetitive or memorized
- Model performs poorly on new data
Solutions:
- Increase dropout rate (0.1 → 0.2)
- Add weight decay (0.01)
- Use early stopping
- Get more training data
- Reduce model size
🎯 Text Generation¶
Q: How do I improve generation quality?¶
A: Try these techniques:
- Adjust temperature:
- Lower (0.3-0.7): More focused, predictable
-
Higher (0.8-1.2): More creative, diverse
-
Use nucleus sampling:
config = GenerationConfig(
temperature=0.8,
top_p=0.9, # Nucleus sampling
top_k=50 # Top-k sampling
)
- Add repetition penalty:
- Better prompts:
- Be specific and clear
- Provide context and examples
- Use consistent formatting
Q: Why is my generated text repetitive?¶
A: Common causes and fixes:
- Insufficient training: Train for more epochs
- Poor sampling: Use top-p/top-k sampling instead of greedy
- Low temperature: Increase temperature to 0.8+
- Add repetition penalty: Set to 1.1-1.3
- Prevent n-gram repetition: Set
no_repeat_ngram_size=3
Q: How do I make generation faster?¶
A: Speed optimization techniques:
- Use GPU: Much faster than CPU
- Reduce max_tokens: Generate shorter responses
- Use greedy decoding: Set
do_sample=False
- Enable model compilation: Set
compile=True
(PyTorch 2.0+) - Quantize model: Use 8-bit or 16-bit precision
🔄 Fine-tuning¶
Q: When should I fine-tune vs. train from scratch?¶
A:
- Fine-tune when: You have a pre-trained model and domain-specific data
- Train from scratch when: You have lots of data and need full control
- Fine-tuning advantages: Faster, less data needed, preserves general knowledge
- Training advantages: Full customization, no dependency on base model
Q: What's the difference between LoRA and full fine-tuning?¶
A:
Aspect | LoRA | Full Fine-tuning |
---|---|---|
Memory | Low (~1% of params) | High (all params) |
Speed | Fast | Slower |
Quality | Good for most tasks | Best possible |
Flexibility | Limited adaptation | Full adaptation |
Use case | Domain adaptation | Major architecture changes |
Q: How do I prevent catastrophic forgetting during fine-tuning?¶
A: Use these techniques:
- Lower learning rate: 1e-5 to 5e-5
- Fewer epochs: 3-5 epochs usually sufficient
- Regularization: Add weight decay (0.01)
- LoRA: Preserves base model weights
- Mixed training: Include general data with domain data
🚀 Deployment¶
Q: How do I deploy my trained model?¶
A: LLMBuilder supports multiple deployment options:
- GGUF format (for llama.cpp):
- ONNX format (for cross-platform):
- Quantized PyTorch (for production):
Q: Which export format should I choose?¶
A: Choose based on your deployment target:
- GGUF: CPU inference, llama.cpp compatibility, edge devices
- ONNX: Cross-platform, mobile apps, cloud services
- Quantized PyTorch: PyTorch ecosystem, balanced performance
- HuggingFace: Easy sharing, transformers compatibility
Q: How do I reduce model size for deployment?¶
A: Size reduction techniques:
- Quantization: 8-bit (50% smaller) or 4-bit (75% smaller)
- Pruning: Remove least important weights
- Distillation: Train smaller model to mimic larger one
- Architecture optimization: Use efficient attention mechanisms
🐛 Troubleshooting¶
Q: I get "CUDA out of memory" errors. What should I do?¶
A: Try these solutions:
- Reduce batch size: Start with batch_size=1
- Enable gradient checkpointing: Trades compute for memory
- Use gradient accumulation: Simulate larger batches
- Reduce sequence length: Shorter sequences use less memory
- Use CPU: Slower but no memory limits
- Clear GPU cache:
torch.cuda.empty_cache()
Q: Training is very slow. How can I speed it up?¶
A: Speed optimization:
- Use GPU: 10-100x faster than CPU
- Increase batch size: Better GPU utilization
- Enable mixed precision:
fp16
orbf16
- Use multiple GPUs: Distributed training
- Optimize data loading: More workers, pin memory
- Compile model: PyTorch 2.0 compilation
Q: My tokenizer produces weird results. What's wrong?¶
A: Common tokenizer issues:
- Wrong vocabulary size: Must match model config
- Insufficient training data: Need diverse text corpus
- Character coverage too low: Increase to 0.9999
- Wrong model type: BPE usually works best
- Missing special tokens: Include
<pad>
,<unk>
, etc.
Q: Generated text contains strange characters or formatting¶
A: Text cleaning solutions:
- Improve data cleaning: Remove unwanted characters
- Filter by language: Keep only desired languages
- Normalize text: Fix encoding issues
- Add text filters: Remove specific patterns
- Better tokenizer training: Use cleaner training data
💡 Best Practices¶
Q: What are the most important best practices?¶
A: Key recommendations:
- Start small: Begin with tiny models and scale up
- Clean your data: Quality over quantity
- Monitor training: Watch loss curves and generation quality
- Save checkpoints: Protect against failures
- Validate everything: Test configurations before long training
- Document experiments: Keep track of what works
Q: How do I choose hyperparameters?¶
A: Hyperparameter selection guide:
- Learning rate: Start with 3e-4, adjust based on loss curves
- Batch size: Largest that fits in memory
- Model size: Balance quality needs with resources
- Sequence length: Match your use case requirements
- Dropout: 0.1 is usually good, increase if overfitting
Q: How do I evaluate model quality?¶
A: Evaluation methods:
- Perplexity: Lower is better (< 20 is good)
- Generation quality: Manual inspection of outputs
- Task-specific metrics: BLEU, ROUGE for specific tasks
- Human evaluation: Best but most expensive
- Automated metrics: Coherence, fluency scores
🆘 Getting Help¶
Q: Where can I get help if I'm stuck?¶
A: Support resources:
- Documentation: Complete guides and examples
- GitHub Issues: Report bugs and request features
- GitHub Discussions: Community Q&A
- Examples: Working code samples
- Stack Overflow: Tag questions with
llmbuilder
Q: How do I report a bug?¶
A: When reporting bugs, include:
- LLMBuilder version:
llmbuilder --version
- Python version:
python --version
- Operating system: Windows/macOS/Linux
- Hardware: CPU/GPU specifications
- Error message: Full traceback
- Minimal example: Code to reproduce the issue
- Configuration: Model and training configs used
Q: How can I contribute to LLMBuilder?¶
A: Ways to contribute:
- Report bugs: Help improve stability
- Request features: Suggest improvements
- Submit PRs: Code contributions welcome
- Improve docs: Fix typos, add examples
- Share examples: Help other users
- Test releases: Try beta versions
Still have questions?
If you can't find the answer here, check our GitHub Discussions or create a new issue. The community is always happy to help!