Add professional voice assistant server implementation
- FastAPI-based TTS server using Piper neural text-to-speech - Poetry for dependency management and virtual environments - OpenAI-compatible API endpoints for seamless integration - Support for multiple voice models (Ryan, Alan, Lessac) - Robust error handling and voice fallback system - Professional logging and configuration management - Docker-ready with proper Python packaging 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
130
voice-server/README.md
Normal file
130
voice-server/README.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# Homelab Voice Server
|
||||
|
||||
A local text-to-speech server using Piper TTS, designed to work with Claude Code voice assistant functionality.
|
||||
|
||||
## Features
|
||||
|
||||
- **Local TTS**: Uses Piper neural TTS for natural-sounding speech
|
||||
- **OpenAI Compatible**: Drop-in replacement for OpenAI TTS API
|
||||
- **Multiple Voices**: Support for different voice models and languages
|
||||
- **FastAPI**: Modern, fast web framework with automatic API documentation
|
||||
- **Poetry**: Dependency management and virtual environments
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. **Install dependencies:**
|
||||
```bash
|
||||
cd voice-server
|
||||
poetry install
|
||||
```
|
||||
|
||||
2. **Download voice models:**
|
||||
```bash
|
||||
mkdir -p ~/.local/share/piper-voices
|
||||
cd ~/.local/share/piper-voices
|
||||
|
||||
# Ryan voice (recommended - male US English)
|
||||
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx
|
||||
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx.json
|
||||
```
|
||||
|
||||
3. **Start the server:**
|
||||
```bash
|
||||
poetry run voice-server
|
||||
```
|
||||
|
||||
4. **Test the API:**
|
||||
```bash
|
||||
curl -X POST "http://127.0.0.1:8880/v1/audio/speech" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"input": "Hello from the voice server!", "voice": "ryan"}' \
|
||||
--output test.wav
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Health Check
|
||||
- `GET /health` - Server health and status
|
||||
|
||||
### Models (OpenAI Compatible)
|
||||
- `GET /v1/models` - List available models
|
||||
|
||||
### Voices
|
||||
- `GET /v1/voices` - List all available voices
|
||||
- `GET /v1/voices/{voice_name}` - Get specific voice information
|
||||
|
||||
### Speech Synthesis (OpenAI Compatible)
|
||||
- `POST /v1/audio/speech` - Generate speech from text
|
||||
|
||||
#### Request Body
|
||||
```json
|
||||
{
|
||||
"input": "Text to speak",
|
||||
"voice": "ryan",
|
||||
"speed": 1.0
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables (prefix with `VOICE_SERVER_`):
|
||||
|
||||
- `VOICE_SERVER_HOST`: Server host (default: 127.0.0.1)
|
||||
- `VOICE_SERVER_PORT`: Server port (default: 8880)
|
||||
- `VOICE_SERVER_DEFAULT_VOICE`: Default voice (default: ryan)
|
||||
- `VOICE_SERVER_LOG_LEVEL`: Logging level (default: info)
|
||||
|
||||
## Available Voices
|
||||
|
||||
| Voice | Gender | Language | Description |
|
||||
|-------|--------|----------|-------------|
|
||||
| ryan | Male | en-US | Professional, clear (recommended for AI) |
|
||||
| alan | Male | en-GB | Sophisticated British accent |
|
||||
| lessac| Female | en-US | Natural, conversational |
|
||||
|
||||
## Development
|
||||
|
||||
### API Documentation
|
||||
Visit `http://127.0.0.1:8880/docs` when the server is running for interactive API documentation.
|
||||
|
||||
### Adding New Voices
|
||||
1. Download voice model files to `~/.local/share/piper-voices/`
|
||||
2. Add voice configuration to `src/voice_server/config.py`
|
||||
3. Restart the server
|
||||
|
||||
### Running Tests
|
||||
```bash
|
||||
poetry run pytest
|
||||
```
|
||||
|
||||
### Code Formatting
|
||||
```bash
|
||||
poetry run black src/
|
||||
poetry run isort src/
|
||||
```
|
||||
|
||||
## Integration with Claude Code
|
||||
|
||||
The voice server is designed to work with Claude Code's voice-mode functionality:
|
||||
|
||||
```python
|
||||
# In Claude Code
|
||||
converse("Hello! I can now speak using the local voice server.", wait_for_response=False)
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Server Won't Start
|
||||
- Check that piper-tts is installed: `which piper-tts`
|
||||
- Verify voice models are downloaded
|
||||
- Check port 8880 is available
|
||||
|
||||
### No Audio Output
|
||||
- Test piper directly: `echo "test" | piper-tts -m ~/.local/share/piper-voices/en_US-ryan-medium.onnx -f test.wav`
|
||||
- Check audio system settings
|
||||
- Verify file permissions on voice models
|
||||
|
||||
### Voice Not Available
|
||||
- Check voice files exist: `ls ~/.local/share/piper-voices/`
|
||||
- Verify file naming matches configuration
|
||||
- Check server logs for detailed error messages
|
||||
Reference in New Issue
Block a user