Add comprehensive voice assistant documentation

- Complete setup guide for Piper TTS installation
- Voice model download instructions with multiple options
- API usage examples and troubleshooting guide
- Available voice models comparison table
- Integration instructions for Claude Code

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-08-17 14:56:42 +02:00
parent 16081ec85e
commit e2b79e9662

126
docs/voice-assistant.md Normal file
View File

@@ -0,0 +1,126 @@
# Voice Assistant Setup
This document describes how to set up AI voice capabilities for Claude Code using local TTS (Text-to-Speech) services.
## Overview
The voice assistant setup uses:
- **Piper TTS**: Local neural text-to-speech engine for generating natural-sounding speech
- **FastAPI**: HTTP server wrapper to make Piper compatible with voice-mode
- **Ryan voice model**: Professional male US English voice for AI assistant personality
## Prerequisites
### System Dependencies
Install required packages from AUR:
```bash
yay -S piper-tts
```
### Voice Models
Download voice models to the piper voices directory:
```bash
# Create voice models directory
mkdir -p ~/.local/share/piper-voices
cd ~/.local/share/piper-voices
# Download Ryan voice (male US English - recommended for AI assistant)
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx.json
# Optional: Download Alan voice (male British English)
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_GB/alan/medium/en_GB-alan-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_GB/alan/medium/en_GB-alan-medium.onnx.json
```
### Testing Piper TTS
Test the installation:
```bash
echo "Hello, this is a test of piper text to speech" | piper-tts -m ~/.local/share/piper-voices/en_US-ryan-medium.onnx -f /tmp/test_voice.wav
```
You should hear a clear male voice saying the test phrase.
## Voice Server Setup
The voice server provides an HTTP API compatible with OpenAI's TTS format, allowing Claude Code to use Piper TTS seamlessly.
### Installation
1. Navigate to the voice server directory:
```bash
cd /path/to/homelab/voice-server
```
2. Install dependencies with Poetry:
```bash
poetry install
```
3. Start the voice server:
```bash
poetry run voice-server
```
The server will start on `http://127.0.0.1:8880` and provide:
- `/v1/audio/speech` - TTS endpoint compatible with OpenAI API
- `/v1/models` - List available models
- `/health` - Health check endpoint
## Usage
### Starting Voice Mode
Use the custom voice command to start both the server and enable voice mode:
```bash
./scripts/enable-voice.sh
```
### Voice Conversation
Once the server is running, you can use voice commands in Claude Code:
```python
# Text-to-speech only (no microphone input)
converse("Hello! I can now speak using the local piper TTS system.", wait_for_response=False)
```
### Configuration
The voice server uses the Ryan voice model by default. To change voices, edit the configuration in:
```
voice-server/config.py
```
## Available Voice Models
| Voice | Gender | Accent | Description |
|-------|--------|--------|-------------|
| ryan | Male | US English | Professional, clear, recommended for AI assistant |
| alan | Male | British English | Sophisticated, formal |
| lessac | Female | US English | Natural, conversational |
## Troubleshooting
### Voice Server Won't Start
- Ensure piper-tts is installed: `which piper-tts`
- Check voice models are downloaded: `ls ~/.local/share/piper-voices/`
- Verify port 8880 is available: `netstat -tlnp | grep 8880`
### Poor Audio Quality
- Try a different voice model
- Check audio system: `pactl info`
- Test piper directly: `echo "test" | piper-tts -m ~/.local/share/piper-voices/en_US-ryan-medium.onnx -f /tmp/test.wav`
### Audio Not Playing
- Check PulseAudio is running: `systemctl --user status pulseaudio`
- Test system audio: `speaker-test -t wav -c 2`
## Future Enhancements
- **Speech-to-Text**: Add Whisper.cpp for full voice conversations
- **Voice Selection**: Runtime voice switching via API
- **Voice Cloning**: Custom voice models
- **Multi-language**: Support for other languages