Add professional voice assistant server implementation

- FastAPI-based TTS server using Piper neural text-to-speech - Poetry for dependency management and virtual environments - OpenAI-compatible API endpoints for seamless integration - Support for multiple voice models (Ryan, Alan, Lessac) - Robust error handling and voice fallback system - Professional logging and configuration management - Docker-ready with proper Python packaging 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-17 14:56:01 +02:00
parent 82f9cc4990
commit 572434d42e
13 changed files with 1722 additions and 0 deletions
--- a/voice-server/README.md
+++ b/voice-server/README.md
@@ -0,0 +1,130 @@
+# Homelab Voice Server
+
+A local text-to-speech server using Piper TTS, designed to work with Claude Code voice assistant functionality.
+
+## Features
+
+- **Local TTS**: Uses Piper neural TTS for natural-sounding speech
+- **OpenAI Compatible**: Drop-in replacement for OpenAI TTS API
+- **Multiple Voices**: Support for different voice models and languages
+- **FastAPI**: Modern, fast web framework with automatic API documentation
+- **Poetry**: Dependency management and virtual environments
+
+## Quick Start
+
+1. **Install dependencies:**
+   ```bash
+   cd voice-server
+   poetry install
+   ```
+
+2. **Download voice models:**
+   ```bash
+   mkdir -p ~/.local/share/piper-voices
+   cd ~/.local/share/piper-voices
+   
+   # Ryan voice (recommended - male US English)
+   wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx
+   wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx.json
+   ```
+
+3. **Start the server:**
+   ```bash
+   poetry run voice-server
+   ```
+
+4. **Test the API:**
+   ```bash
+   curl -X POST "http://127.0.0.1:8880/v1/audio/speech" \
+     -H "Content-Type: application/json" \
+     -d '{"input": "Hello from the voice server!", "voice": "ryan"}' \
+     --output test.wav
+   ```
+
+## API Endpoints
+
+### Health Check
+- `GET /health` - Server health and status
+
+### Models (OpenAI Compatible)
+- `GET /v1/models` - List available models
+
+### Voices
+- `GET /v1/voices` - List all available voices
+- `GET /v1/voices/{voice_name}` - Get specific voice information
+
+### Speech Synthesis (OpenAI Compatible)
+- `POST /v1/audio/speech` - Generate speech from text
+
+#### Request Body
+```json
+{
+  "input": "Text to speak",
+  "voice": "ryan",
+  "speed": 1.0
+}
+```
+
+## Configuration
+
+Environment variables (prefix with `VOICE_SERVER_`):
+
+- `VOICE_SERVER_HOST`: Server host (default: 127.0.0.1)
+- `VOICE_SERVER_PORT`: Server port (default: 8880)
+- `VOICE_SERVER_DEFAULT_VOICE`: Default voice (default: ryan)
+- `VOICE_SERVER_LOG_LEVEL`: Logging level (default: info)
+
+## Available Voices
+
+| Voice | Gender | Language | Description |
+|-------|--------|----------|-------------|
+| ryan  | Male   | en-US    | Professional, clear (recommended for AI) |
+| alan  | Male   | en-GB    | Sophisticated British accent |
+| lessac| Female | en-US    | Natural, conversational |
+
+## Development
+
+### API Documentation
+Visit `http://127.0.0.1:8880/docs` when the server is running for interactive API documentation.
+
+### Adding New Voices
+1. Download voice model files to `~/.local/share/piper-voices/`
+2. Add voice configuration to `src/voice_server/config.py`
+3. Restart the server
+
+### Running Tests
+```bash
+poetry run pytest
+```
+
+### Code Formatting
+```bash
+poetry run black src/
+poetry run isort src/
+```
+
+## Integration with Claude Code
+
+The voice server is designed to work with Claude Code's voice-mode functionality:
+
+```python
+# In Claude Code
+converse("Hello! I can now speak using the local voice server.", wait_for_response=False)
+```
+
+## Troubleshooting
+
+### Server Won't Start
+- Check that piper-tts is installed: `which piper-tts`
+- Verify voice models are downloaded
+- Check port 8880 is available
+
+### No Audio Output
+- Test piper directly: `echo "test" | piper-tts -m ~/.local/share/piper-voices/en_US-ryan-medium.onnx -f test.wav`
+- Check audio system settings
+- Verify file permissions on voice models
+
+### Voice Not Available
+- Check voice files exist: `ls ~/.local/share/piper-voices/`
+- Verify file naming matches configuration
+- Check server logs for detailed error messages
--- a/voice-server/poetry.lock
+++ b/voice-server/poetry.lock
--- a/voice-server/pyproject.toml
+++ b/voice-server/pyproject.toml
@@ -0,0 +1,36 @@
+[tool.poetry]
+name = "homelab-voice-server"
+version = "0.1.0"
+description = "Local TTS server for Claude Code voice assistant using Piper"
+authors = ["Homelab <homelab@ak-homelab.duckdns.org>"]
+readme = "README.md"
+packages = [{include = "voice_server", from = "src"}]
+
+[tool.poetry.dependencies]
+python = "^3.10"
+fastapi = "^0.115.0"
+uvicorn = {extras = ["standard"], version = "^0.30.0"}
+pydantic = "^2.10.0"
+pydantic-settings = "^2.7.0"
+python-dotenv = "^1.0.0"
+
+[tool.poetry.group.dev.dependencies]
+pytest = "^8.0.0"
+httpx = "^0.26.0"
+black = "^24.0.0"
+isort = "^5.13.0"
+
+[tool.poetry.scripts]
+voice-server = "voice_server.main:main"
+
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"
+
+[tool.black]
+line-length = 88
+target-version = ['py310']
+
+[tool.isort]
+profile = "black"
+multi_line_output = 3
--- a/voice-server/src/voice_server/init.py
+++ b/voice-server/src/voice_server/init.py
@@ -0,0 +1,11 @@
+"""Homelab Voice Server - Local TTS server for Claude Code."""
+
+__version__ = "0.1.0"
+__author__ = "Homelab"
+__description__ = "Local TTS server using Piper for Claude Code voice assistant"
+
+from .config import config
+from .tts import TTSService
+from .api import app
+
+__all__ = ["config", "TTSService", "app"]
--- a/voice-server/src/voice_server/pycache/init.cpython-313.pyc
+++ b/voice-server/src/voice_server/pycache/init.cpython-313.pyc
--- a/voice-server/src/voice_server/pycache/api.cpython-313.pyc
+++ b/voice-server/src/voice_server/pycache/api.cpython-313.pyc
--- a/voice-server/src/voice_server/pycache/config.cpython-313.pyc
+++ b/voice-server/src/voice_server/pycache/config.cpython-313.pyc
--- a/voice-server/src/voice_server/pycache/main.cpython-313.pyc
+++ b/voice-server/src/voice_server/pycache/main.cpython-313.pyc
--- a/voice-server/src/voice_server/pycache/tts.cpython-313.pyc
+++ b/voice-server/src/voice_server/pycache/tts.cpython-313.pyc
--- a/voice-server/src/voice_server/api.py
+++ b/voice-server/src/voice_server/api.py
@@ -0,0 +1,169 @@
+"""FastAPI application for voice server."""
+import logging
+from typing import Optional
+from fastapi import FastAPI, HTTPException, Response
+from pydantic import BaseModel, Field
+from .tts import TTSService
+from .config import config
+
+# Configure logging
+logging.basicConfig(level=getattr(logging, config.log_level.upper()))
+logger = logging.getLogger(__name__)
+
+# Initialize TTS service
+try:
+    tts_service = TTSService()
+except Exception as e:
+    logger.error(f"Failed to initialize TTS service: {e}")
+    tts_service = None
+
+app = FastAPI(
+    title="Homelab Voice Server",
+    description="Local TTS server for Claude Code voice assistant using Piper",
+    version="0.1.0"
+)
+
+
+class TTSRequest(BaseModel):
+    """Request model for TTS synthesis."""
+    input: str = Field(..., description="Text to synthesize")
+    model: str = Field(default="tts-1", description="Model to use (for compatibility)")
+    voice: str = Field(default="alloy", description="Voice to use")
+    response_format: str = Field(default="mp3", description="Audio format (ignored, always returns wav)")
+    speed: float = Field(default=1.0, ge=0.25, le=4.0, description="Speech speed")
+
+
+class ModelInfo(BaseModel):
+    """Model information."""
+    id: str
+    object: str = "model"
+    created: int = 1677649963
+    owned_by: str = "piper"
+
+
+class ModelsResponse(BaseModel):
+    """Response for models endpoint."""
+    object: str = "list"
+    data: list[ModelInfo]
+
+
+@app.get("/health")
+async def health_check():
+    """Health check endpoint."""
+    if tts_service is None:
+        raise HTTPException(status_code=503, detail="TTS service not available")
+    
+    return {
+        "status": "healthy",
+        "tts_available": True,
+        "default_voice": config.default_voice,
+        "voices_available": len(config.available_voices)
+    }
+
+
+@app.get("/v1/models", response_model=ModelsResponse)
+async def list_models():
+    """List available models (OpenAI compatible)."""
+    return ModelsResponse(
+        object="list",
+        data=[
+            ModelInfo(id="tts-1", owned_by="piper"),
+            ModelInfo(id="tts-1-hd", owned_by="piper")
+        ]
+    )
+
+
+@app.get("/v1/voices")
+async def list_voices():
+    """List available voices."""
+    if tts_service is None:
+        raise HTTPException(status_code=503, detail="TTS service not available")
+    
+    return {"voices": tts_service.list_voices()}
+
+
+@app.get("/v1/voices/{voice_name}")
+async def get_voice_info(voice_name: str):
+    """Get information about a specific voice."""
+    if tts_service is None:
+        raise HTTPException(status_code=503, detail="TTS service not available")
+    
+    try:
+        voice_info = tts_service.get_voice_info(voice_name)
+        return voice_info
+    except ValueError as e:
+        raise HTTPException(status_code=404, detail=str(e))
+
+
+@app.post("/v1/audio/speech")
+async def create_speech(request: TTSRequest):
+    """
+    Create speech from text (OpenAI compatible).
+    
+    Returns raw audio data as wav format.
+    """
+    if tts_service is None:
+        raise HTTPException(status_code=503, detail="TTS service not available")
+    
+    # Map common voice names to our voices
+    voice_mapping = {
+        # OpenAI voices
+        "alloy": config.default_voice,
+        "echo": config.default_voice,
+        "fable": config.default_voice,
+        "onyx": config.default_voice,
+        "nova": "lessac",  # Female voice
+        "shimmer": "lessac",  # Female voice
+        # Common defaults
+        "default": config.default_voice,
+        "male": config.default_voice,
+        "female": "lessac"
+    }
+    
+    # Get voice name, with fallback to default
+    voice_name = voice_mapping.get(request.voice, request.voice)
+    
+    # If the requested voice doesn't exist in our available voices, use default
+    if voice_name not in config.available_voices:
+        logger.warning(f"Requested voice '{voice_name}' not available, using default: {config.default_voice}")
+        voice_name = config.default_voice
+    
+    try:
+        audio_data, audio_format = tts_service.synthesize(
+            text=request.input,
+            voice=voice_name,
+            speed=request.speed
+        )
+        
+        # Return raw audio data
+        return Response(
+            content=audio_data,
+            media_type="audio/wav",
+            headers={
+                "Content-Disposition": "attachment; filename=speech.wav"
+            }
+        )
+        
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    except RuntimeError as e:
+        logger.error(f"TTS synthesis failed: {e}")
+        raise HTTPException(status_code=500, detail=f"TTS synthesis failed: {e}")
+
+
+@app.get("/")
+async def root():
+    """Root endpoint with API information."""
+    return {
+        "service": "Homelab Voice Server",
+        "version": "0.1.0",
+        "description": "Local TTS server using Piper",
+        "endpoints": {
+            "health": "/health",
+            "models": "/v1/models",
+            "voices": "/v1/voices",
+            "speech": "/v1/audio/speech"
+        },
+        "default_voice": config.default_voice,
+        "available_voices": list(config.available_voices.keys()) if tts_service else []
+    }
--- a/voice-server/src/voice_server/config.py
+++ b/voice-server/src/voice_server/config.py
@@ -0,0 +1,90 @@
+"""Configuration for the voice server."""
+import os
+from pathlib import Path
+from typing import Dict, Any
+from pydantic_settings import BaseSettings
+from pydantic import Field
+
+
+class VoiceServerConfig(BaseSettings):
+    """Voice server configuration."""
+    
+    host: str = Field(default="127.0.0.1", description="Server host")
+    port: int = Field(default=8880, description="Server port")
+    
+    # Voice model configuration
+    default_voice: str = Field(default="ryan", description="Default voice model")
+    voices_dir: Path = Field(
+        default_factory=lambda: Path.home() / ".local/share/piper-voices",
+        description="Directory containing voice models"
+    )
+    
+    # Available voice models
+    available_voices: Dict[str, Dict[str, Any]] = Field(
+        default_factory=lambda: {
+            "ryan": {
+                "model_file": "en_US-ryan-medium.onnx",
+                "config_file": "en_US-ryan-medium.onnx.json",
+                "language": "en-US",
+                "gender": "male",
+                "description": "Professional US male voice"
+            },
+            "alan": {
+                "model_file": "en_GB-alan-medium.onnx", 
+                "config_file": "en_GB-alan-medium.onnx.json",
+                "language": "en-GB",
+                "gender": "male",
+                "description": "Sophisticated British male voice"
+            },
+            "lessac": {
+                "model_file": "en_US-lessac-medium.onnx",
+                "config_file": "en_US-lessac-medium.onnx.json", 
+                "language": "en-US",
+                "gender": "female",
+                "description": "Natural US female voice"
+            }
+        }
+    )
+    
+    # Piper TTS configuration
+    piper_executable: str = Field(default="piper-tts", description="Piper TTS executable")
+    audio_format: str = Field(default="wav", description="Audio output format")
+    
+    # Server configuration
+    log_level: str = Field(default="info", description="Logging level")
+    
+    class Config:
+        env_prefix = "VOICE_SERVER_"
+        env_file = ".env"
+    
+    def get_voice_model_path(self, voice_name: str = None) -> Path:
+        """Get the full path to a voice model file."""
+        voice_name = voice_name or self.default_voice
+        if voice_name not in self.available_voices:
+            raise ValueError(f"Voice '{voice_name}' not found in available voices")
+        
+        voice_config = self.available_voices[voice_name]
+        return self.voices_dir / voice_config["model_file"]
+    
+    def get_voice_config_path(self, voice_name: str = None) -> Path:
+        """Get the full path to a voice config file."""
+        voice_name = voice_name or self.default_voice
+        if voice_name not in self.available_voices:
+            raise ValueError(f"Voice '{voice_name}' not found in available voices")
+        
+        voice_config = self.available_voices[voice_name]
+        return self.voices_dir / voice_config["config_file"]
+    
+    def validate_voice_files(self, voice_name: str = None) -> bool:
+        """Check if voice model files exist."""
+        voice_name = voice_name or self.default_voice
+        try:
+            model_path = self.get_voice_model_path(voice_name)
+            config_path = self.get_voice_config_path(voice_name)
+            return model_path.exists() and config_path.exists()
+        except ValueError:
+            return False
+
+
+# Global configuration instance
+config = VoiceServerConfig()
--- a/voice-server/src/voice_server/main.py
+++ b/voice-server/src/voice_server/main.py
@@ -0,0 +1,82 @@
+"""Main entry point for the voice server."""
+import logging
+import sys
+from pathlib import Path
+import uvicorn
+from .config import config
+from .api import app
+
+logger = logging.getLogger(__name__)
+
+
+def check_prerequisites():
+    """Check if all prerequisites are met."""
+    errors = []
+    
+    # Check if voices directory exists
+    if not config.voices_dir.exists():
+        errors.append(f"Voices directory not found: {config.voices_dir}")
+        errors.append("Run: mkdir -p ~/.local/share/piper-voices")
+    
+    # Check if default voice files exist
+    if not config.validate_voice_files():
+        voice_name = config.default_voice
+        model_path = config.get_voice_model_path()
+        errors.append(f"Default voice '{voice_name}' files not found")
+        errors.append(f"Expected model at: {model_path}")
+        errors.append("Download voice models from: https://huggingface.co/rhasspy/piper-voices")
+    
+    # Check available voices
+    available_count = sum(
+        1 for voice in config.available_voices 
+        if config.validate_voice_files(voice)
+    )
+    
+    if available_count == 0:
+        errors.append("No voice models available")
+        errors.append("Please download at least one voice model")
+    
+    return errors
+
+
+def main():
+    """Main entry point."""
+    # Set up logging
+    logging.basicConfig(
+        level=getattr(logging, config.log_level.upper()),
+        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+    )
+    
+    logger.info("Starting Homelab Voice Server")
+    logger.info(f"Configuration: {config.dict()}")
+    
+    # Check prerequisites
+    errors = check_prerequisites()
+    if errors:
+        logger.error("Prerequisites not met:")
+        for error in errors:
+            logger.error(f"  - {error}")
+        sys.exit(1)
+    
+    # Log available voices
+    available_voices = [
+        voice for voice in config.available_voices 
+        if config.validate_voice_files(voice)
+    ]
+    logger.info(f"Available voices: {available_voices}")
+    logger.info(f"Default voice: {config.default_voice}")
+    
+    # Start server
+    logger.info(f"Starting server on {config.host}:{config.port}")
+    
+    uvicorn.run(
+        app,
+        host=config.host,
+        port=config.port,
+        log_level=config.log_level,
+        access_log=True
+    )
+
+
+if __name__ == "__main__":
+    main()
--- a/voice-server/src/voice_server/tts.py
+++ b/voice-server/src/voice_server/tts.py
@@ -0,0 +1,158 @@
+"""Text-to-speech service using Piper."""
+import subprocess
+import tempfile
+import os
+import logging
+from pathlib import Path
+from typing import Optional, Tuple
+from .config import config
+
+logger = logging.getLogger(__name__)
+
+
+class TTSService:
+    """Text-to-speech service using Piper."""
+    
+    def __init__(self):
+        self.config = config
+        self._validate_setup()
+    
+    def _validate_setup(self):
+        """Validate that piper and voice models are available."""
+        # Check if piper-tts is available
+        try:
+            result = subprocess.run(
+                [self.config.piper_executable, "--help"],
+                capture_output=True,
+                timeout=10
+            )
+            if result.returncode != 0:
+                raise RuntimeError(f"Piper TTS not working: {result.stderr.decode()}")
+        except (subprocess.TimeoutExpired, FileNotFoundError) as e:
+            raise RuntimeError(f"Piper TTS not found or not working: {e}")
+        
+        # Check if default voice model exists
+        if not self.config.validate_voice_files():
+            default_voice = self.config.default_voice
+            model_path = self.config.get_voice_model_path()
+            raise RuntimeError(
+                f"Default voice '{default_voice}' model not found at {model_path}. "
+                f"Please download the voice model files."
+            )
+        
+        logger.info(f"TTS service initialized with voice: {self.config.default_voice}")
+    
+    def synthesize(
+        self, 
+        text: str, 
+        voice: Optional[str] = None,
+        speed: float = 1.0
+    ) -> Tuple[bytes, str]:
+        """
+        Synthesize text to speech.
+        
+        Args:
+            text: Text to synthesize
+            voice: Voice to use (defaults to configured default)
+            speed: Speech speed multiplier
+            
+        Returns:
+            Tuple of (audio_data, audio_format)
+            
+        Raises:
+            ValueError: If voice is not available
+            RuntimeError: If synthesis fails
+        """
+        voice = voice or self.config.default_voice
+        
+        if not self.config.validate_voice_files(voice):
+            available_voices = list(self.config.available_voices.keys())
+            raise ValueError(
+                f"Voice '{voice}' not available. Available voices: {available_voices}"
+            )
+        
+        model_path = self.config.get_voice_model_path(voice)
+        
+        # Create temporary file for output
+        with tempfile.NamedTemporaryFile(suffix=f".{self.config.audio_format}", delete=False) as temp_file:
+            temp_path = temp_file.name
+        
+        try:
+            # Build piper command
+            cmd = [
+                self.config.piper_executable,
+                "-m", str(model_path),
+                "-f", temp_path
+            ]
+            
+            # Add speed if different from default
+            if speed != 1.0:
+                cmd.extend(["--length-scale", str(1.0 / speed)])
+            
+            logger.debug(f"Running piper command: {' '.join(cmd)}")
+            
+            # Run piper-tts
+            process = subprocess.Popen(
+                cmd,
+                stdin=subprocess.PIPE,
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+                text=True
+            )
+            
+            stdout, stderr = process.communicate(input=text, timeout=30)
+            
+            if process.returncode != 0:
+                raise RuntimeError(f"TTS synthesis failed: {stderr}")
+            
+            # Read the generated audio file
+            with open(temp_path, "rb") as f:
+                audio_data = f.read()
+            
+            if not audio_data:
+                raise RuntimeError("Generated audio file is empty")
+            
+            logger.info(f"Successfully synthesized {len(text)} characters with voice '{voice}'")
+            return audio_data, self.config.audio_format
+            
+        except subprocess.TimeoutExpired:
+            process.kill()
+            raise RuntimeError("TTS synthesis timed out")
+        except Exception as e:
+            logger.error(f"TTS synthesis error: {e}")
+            raise
+        finally:
+            # Clean up temp file
+            try:
+                os.unlink(temp_path)
+            except OSError:
+                pass
+    
+    def list_voices(self) -> dict:
+        """List available voices with their information."""
+        voices = {}
+        for voice_name, voice_config in self.config.available_voices.items():
+            voices[voice_name] = {
+                "name": voice_name,
+                "language": voice_config["language"],
+                "gender": voice_config["gender"],
+                "description": voice_config["description"],
+                "available": self.config.validate_voice_files(voice_name)
+            }
+        return voices
+    
+    def get_voice_info(self, voice_name: str) -> dict:
+        """Get information about a specific voice."""
+        if voice_name not in self.config.available_voices:
+            raise ValueError(f"Voice '{voice_name}' not found")
+        
+        voice_config = self.config.available_voices[voice_name]
+        return {
+            "name": voice_name,
+            "language": voice_config["language"],
+            "gender": voice_config["gender"],
+            "description": voice_config["description"],
+            "available": self.config.validate_voice_files(voice_name),
+            "model_path": str(self.config.get_voice_model_path(voice_name)),
+            "config_path": str(self.config.get_voice_config_path(voice_name))
+        }