diff --git a/docs/voice-assistant.md b/docs/voice-assistant.md new file mode 100644 index 0000000..8f2a6f6 --- /dev/null +++ b/docs/voice-assistant.md @@ -0,0 +1,126 @@ +# Voice Assistant Setup + +This document describes how to set up AI voice capabilities for Claude Code using local TTS (Text-to-Speech) services. + +## Overview + +The voice assistant setup uses: +- **Piper TTS**: Local neural text-to-speech engine for generating natural-sounding speech +- **FastAPI**: HTTP server wrapper to make Piper compatible with voice-mode +- **Ryan voice model**: Professional male US English voice for AI assistant personality + +## Prerequisites + +### System Dependencies + +Install required packages from AUR: +```bash +yay -S piper-tts +``` + +### Voice Models + +Download voice models to the piper voices directory: +```bash +# Create voice models directory +mkdir -p ~/.local/share/piper-voices +cd ~/.local/share/piper-voices + +# Download Ryan voice (male US English - recommended for AI assistant) +wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx +wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx.json + +# Optional: Download Alan voice (male British English) +wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_GB/alan/medium/en_GB-alan-medium.onnx +wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_GB/alan/medium/en_GB-alan-medium.onnx.json +``` + +### Testing Piper TTS + +Test the installation: +```bash +echo "Hello, this is a test of piper text to speech" | piper-tts -m ~/.local/share/piper-voices/en_US-ryan-medium.onnx -f /tmp/test_voice.wav +``` + +You should hear a clear male voice saying the test phrase. + +## Voice Server Setup + +The voice server provides an HTTP API compatible with OpenAI's TTS format, allowing Claude Code to use Piper TTS seamlessly. + +### Installation + +1. Navigate to the voice server directory: + ```bash + cd /path/to/homelab/voice-server + ``` + +2. Install dependencies with Poetry: + ```bash + poetry install + ``` + +3. Start the voice server: + ```bash + poetry run voice-server + ``` + +The server will start on `http://127.0.0.1:8880` and provide: +- `/v1/audio/speech` - TTS endpoint compatible with OpenAI API +- `/v1/models` - List available models +- `/health` - Health check endpoint + +## Usage + +### Starting Voice Mode + +Use the custom voice command to start both the server and enable voice mode: +```bash +./scripts/enable-voice.sh +``` + +### Voice Conversation + +Once the server is running, you can use voice commands in Claude Code: +```python +# Text-to-speech only (no microphone input) +converse("Hello! I can now speak using the local piper TTS system.", wait_for_response=False) +``` + +### Configuration + +The voice server uses the Ryan voice model by default. To change voices, edit the configuration in: +``` +voice-server/config.py +``` + +## Available Voice Models + +| Voice | Gender | Accent | Description | +|-------|--------|--------|-------------| +| ryan | Male | US English | Professional, clear, recommended for AI assistant | +| alan | Male | British English | Sophisticated, formal | +| lessac | Female | US English | Natural, conversational | + +## Troubleshooting + +### Voice Server Won't Start +- Ensure piper-tts is installed: `which piper-tts` +- Check voice models are downloaded: `ls ~/.local/share/piper-voices/` +- Verify port 8880 is available: `netstat -tlnp | grep 8880` + +### Poor Audio Quality +- Try a different voice model +- Check audio system: `pactl info` +- Test piper directly: `echo "test" | piper-tts -m ~/.local/share/piper-voices/en_US-ryan-medium.onnx -f /tmp/test.wav` + +### Audio Not Playing +- Check PulseAudio is running: `systemctl --user status pulseaudio` +- Test system audio: `speaker-test -t wav -c 2` + +## Future Enhancements + +- **Speech-to-Text**: Add Whisper.cpp for full voice conversations +- **Voice Selection**: Runtime voice switching via API +- **Voice Cloning**: Custom voice models +- **Multi-language**: Support for other languages \ No newline at end of file