Add comprehensive voice assistant documentation
- Complete setup guide for Piper TTS installation - Voice model download instructions with multiple options - API usage examples and troubleshooting guide - Available voice models comparison table - Integration instructions for Claude Code 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
126
docs/voice-assistant.md
Normal file
126
docs/voice-assistant.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# Voice Assistant Setup
|
||||
|
||||
This document describes how to set up AI voice capabilities for Claude Code using local TTS (Text-to-Speech) services.
|
||||
|
||||
## Overview
|
||||
|
||||
The voice assistant setup uses:
|
||||
- **Piper TTS**: Local neural text-to-speech engine for generating natural-sounding speech
|
||||
- **FastAPI**: HTTP server wrapper to make Piper compatible with voice-mode
|
||||
- **Ryan voice model**: Professional male US English voice for AI assistant personality
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### System Dependencies
|
||||
|
||||
Install required packages from AUR:
|
||||
```bash
|
||||
yay -S piper-tts
|
||||
```
|
||||
|
||||
### Voice Models
|
||||
|
||||
Download voice models to the piper voices directory:
|
||||
```bash
|
||||
# Create voice models directory
|
||||
mkdir -p ~/.local/share/piper-voices
|
||||
cd ~/.local/share/piper-voices
|
||||
|
||||
# Download Ryan voice (male US English - recommended for AI assistant)
|
||||
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx
|
||||
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx.json
|
||||
|
||||
# Optional: Download Alan voice (male British English)
|
||||
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_GB/alan/medium/en_GB-alan-medium.onnx
|
||||
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_GB/alan/medium/en_GB-alan-medium.onnx.json
|
||||
```
|
||||
|
||||
### Testing Piper TTS
|
||||
|
||||
Test the installation:
|
||||
```bash
|
||||
echo "Hello, this is a test of piper text to speech" | piper-tts -m ~/.local/share/piper-voices/en_US-ryan-medium.onnx -f /tmp/test_voice.wav
|
||||
```
|
||||
|
||||
You should hear a clear male voice saying the test phrase.
|
||||
|
||||
## Voice Server Setup
|
||||
|
||||
The voice server provides an HTTP API compatible with OpenAI's TTS format, allowing Claude Code to use Piper TTS seamlessly.
|
||||
|
||||
### Installation
|
||||
|
||||
1. Navigate to the voice server directory:
|
||||
```bash
|
||||
cd /path/to/homelab/voice-server
|
||||
```
|
||||
|
||||
2. Install dependencies with Poetry:
|
||||
```bash
|
||||
poetry install
|
||||
```
|
||||
|
||||
3. Start the voice server:
|
||||
```bash
|
||||
poetry run voice-server
|
||||
```
|
||||
|
||||
The server will start on `http://127.0.0.1:8880` and provide:
|
||||
- `/v1/audio/speech` - TTS endpoint compatible with OpenAI API
|
||||
- `/v1/models` - List available models
|
||||
- `/health` - Health check endpoint
|
||||
|
||||
## Usage
|
||||
|
||||
### Starting Voice Mode
|
||||
|
||||
Use the custom voice command to start both the server and enable voice mode:
|
||||
```bash
|
||||
./scripts/enable-voice.sh
|
||||
```
|
||||
|
||||
### Voice Conversation
|
||||
|
||||
Once the server is running, you can use voice commands in Claude Code:
|
||||
```python
|
||||
# Text-to-speech only (no microphone input)
|
||||
converse("Hello! I can now speak using the local piper TTS system.", wait_for_response=False)
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
The voice server uses the Ryan voice model by default. To change voices, edit the configuration in:
|
||||
```
|
||||
voice-server/config.py
|
||||
```
|
||||
|
||||
## Available Voice Models
|
||||
|
||||
| Voice | Gender | Accent | Description |
|
||||
|-------|--------|--------|-------------|
|
||||
| ryan | Male | US English | Professional, clear, recommended for AI assistant |
|
||||
| alan | Male | British English | Sophisticated, formal |
|
||||
| lessac | Female | US English | Natural, conversational |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Voice Server Won't Start
|
||||
- Ensure piper-tts is installed: `which piper-tts`
|
||||
- Check voice models are downloaded: `ls ~/.local/share/piper-voices/`
|
||||
- Verify port 8880 is available: `netstat -tlnp | grep 8880`
|
||||
|
||||
### Poor Audio Quality
|
||||
- Try a different voice model
|
||||
- Check audio system: `pactl info`
|
||||
- Test piper directly: `echo "test" | piper-tts -m ~/.local/share/piper-voices/en_US-ryan-medium.onnx -f /tmp/test.wav`
|
||||
|
||||
### Audio Not Playing
|
||||
- Check PulseAudio is running: `systemctl --user status pulseaudio`
|
||||
- Test system audio: `speaker-test -t wav -c 2`
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- **Speech-to-Text**: Add Whisper.cpp for full voice conversations
|
||||
- **Voice Selection**: Runtime voice switching via API
|
||||
- **Voice Cloning**: Custom voice models
|
||||
- **Multi-language**: Support for other languages
|
||||
Reference in New Issue
Block a user