Files
homelab/voice-server/README.md
Arpad Krejczinger 572434d42e Add professional voice assistant server implementation
- FastAPI-based TTS server using Piper neural text-to-speech
- Poetry for dependency management and virtual environments
- OpenAI-compatible API endpoints for seamless integration
- Support for multiple voice models (Ryan, Alan, Lessac)
- Robust error handling and voice fallback system
- Professional logging and configuration management
- Docker-ready with proper Python packaging

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-17 14:56:01 +02:00

3.5 KiB

Homelab Voice Server

A local text-to-speech server using Piper TTS, designed to work with Claude Code voice assistant functionality.

Features

  • Local TTS: Uses Piper neural TTS for natural-sounding speech
  • OpenAI Compatible: Drop-in replacement for OpenAI TTS API
  • Multiple Voices: Support for different voice models and languages
  • FastAPI: Modern, fast web framework with automatic API documentation
  • Poetry: Dependency management and virtual environments

Quick Start

  1. Install dependencies:

    cd voice-server
    poetry install
    
  2. Download voice models:

    mkdir -p ~/.local/share/piper-voices
    cd ~/.local/share/piper-voices
    
    # Ryan voice (recommended - male US English)
    wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx
    wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx.json
    
  3. Start the server:

    poetry run voice-server
    
  4. Test the API:

    curl -X POST "http://127.0.0.1:8880/v1/audio/speech" \
      -H "Content-Type: application/json" \
      -d '{"input": "Hello from the voice server!", "voice": "ryan"}' \
      --output test.wav
    

API Endpoints

Health Check

  • GET /health - Server health and status

Models (OpenAI Compatible)

  • GET /v1/models - List available models

Voices

  • GET /v1/voices - List all available voices
  • GET /v1/voices/{voice_name} - Get specific voice information

Speech Synthesis (OpenAI Compatible)

  • POST /v1/audio/speech - Generate speech from text

Request Body

{
  "input": "Text to speak",
  "voice": "ryan",
  "speed": 1.0
}

Configuration

Environment variables (prefix with VOICE_SERVER_):

  • VOICE_SERVER_HOST: Server host (default: 127.0.0.1)
  • VOICE_SERVER_PORT: Server port (default: 8880)
  • VOICE_SERVER_DEFAULT_VOICE: Default voice (default: ryan)
  • VOICE_SERVER_LOG_LEVEL: Logging level (default: info)

Available Voices

Voice Gender Language Description
ryan Male en-US Professional, clear (recommended for AI)
alan Male en-GB Sophisticated British accent
lessac Female en-US Natural, conversational

Development

API Documentation

Visit http://127.0.0.1:8880/docs when the server is running for interactive API documentation.

Adding New Voices

  1. Download voice model files to ~/.local/share/piper-voices/
  2. Add voice configuration to src/voice_server/config.py
  3. Restart the server

Running Tests

poetry run pytest

Code Formatting

poetry run black src/
poetry run isort src/

Integration with Claude Code

The voice server is designed to work with Claude Code's voice-mode functionality:

# In Claude Code
converse("Hello! I can now speak using the local voice server.", wait_for_response=False)

Troubleshooting

Server Won't Start

  • Check that piper-tts is installed: which piper-tts
  • Verify voice models are downloaded
  • Check port 8880 is available

No Audio Output

  • Test piper directly: echo "test" | piper-tts -m ~/.local/share/piper-voices/en_US-ryan-medium.onnx -f test.wav
  • Check audio system settings
  • Verify file permissions on voice models

Voice Not Available

  • Check voice files exist: ls ~/.local/share/piper-voices/
  • Verify file naming matches configuration
  • Check server logs for detailed error messages