Files
homelab/docs/voice-assistant.md
Arpad Krejczinger 9aa881d895 Document voice assistant TTS service status
Mark TTS functionality as disabled due to onnxruntime removal
(freed 1.2GB disk space during cleanup)
2025-10-11 18:25:08 +02:00

129 lines
4.0 KiB
Markdown

# Voice Assistant Setup
⚠️ **STATUS: DISABLED** - onnxruntime package was removed to free disk space (1.2GB). Voice functionality is currently unavailable.
This document describes how to set up AI voice capabilities for Claude Code using local TTS (Text-to-Speech) services.
## Overview
The voice assistant setup uses:
- **Piper TTS**: Local neural text-to-speech engine for generating natural-sounding speech
- **FastAPI**: HTTP server wrapper to make Piper compatible with voice-mode
- **Ryan voice model**: Professional male US English voice for AI assistant personality
- **onnxruntime**: ML inference library (removed - required for TTS)
## Prerequisites
### System Dependencies
Install required packages from AUR:
```bash
yay -S piper-tts
```
### Voice Models
Download voice models to the piper voices directory:
```bash
# Create voice models directory
mkdir -p ~/.local/share/piper-voices
cd ~/.local/share/piper-voices
# Download Ryan voice (male US English - recommended for AI assistant)
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx.json
# Optional: Download Alan voice (male British English)
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_GB/alan/medium/en_GB-alan-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_GB/alan/medium/en_GB-alan-medium.onnx.json
```
### Testing Piper TTS
Test the installation:
```bash
echo "Hello, this is a test of piper text to speech" | piper-tts -m ~/.local/share/piper-voices/en_US-ryan-medium.onnx -f /tmp/test_voice.wav
```
You should hear a clear male voice saying the test phrase.
## Voice Server Setup
The voice server provides an HTTP API compatible with OpenAI's TTS format, allowing Claude Code to use Piper TTS seamlessly.
### Installation
1. Navigate to the voice server directory:
```bash
cd /path/to/homelab/voice-server
```
2. Install dependencies with Poetry:
```bash
poetry install
```
3. Start the voice server:
```bash
poetry run voice-server
```
The server will start on `http://127.0.0.1:8880` and provide:
- `/v1/audio/speech` - TTS endpoint compatible with OpenAI API
- `/v1/models` - List available models
- `/health` - Health check endpoint
## Usage
### Starting Voice Mode
Use the custom voice command to start both the server and enable voice mode:
```bash
./scripts/enable-voice.sh
```
### Voice Conversation
Once the server is running, you can use voice commands in Claude Code:
```python
# Text-to-speech only (no microphone input)
converse("Hello! I can now speak using the local piper TTS system.", wait_for_response=False)
```
### Configuration
The voice server uses the Ryan voice model by default. To change voices, edit the configuration in:
```
voice-server/config.py
```
## Available Voice Models
| Voice | Gender | Accent | Description |
|-------|--------|--------|-------------|
| ryan | Male | US English | Professional, clear, recommended for AI assistant |
| alan | Male | British English | Sophisticated, formal |
| lessac | Female | US English | Natural, conversational |
## Troubleshooting
### Voice Server Won't Start
- Ensure piper-tts is installed: `which piper-tts`
- Check voice models are downloaded: `ls ~/.local/share/piper-voices/`
- Verify port 8880 is available: `netstat -tlnp | grep 8880`
### Poor Audio Quality
- Try a different voice model
- Check audio system: `pactl info`
- Test piper directly: `echo "test" | piper-tts -m ~/.local/share/piper-voices/en_US-ryan-medium.onnx -f /tmp/test.wav`
### Audio Not Playing
- Check PulseAudio is running: `systemctl --user status pulseaudio`
- Test system audio: `speaker-test -t wav -c 2`
## Future Enhancements
- **Speech-to-Text**: Add Whisper.cpp for full voice conversations
- **Voice Selection**: Runtime voice switching via API
- **Voice Cloning**: Custom voice models
- **Multi-language**: Support for other languages