Add professional voice assistant server implementation
- FastAPI-based TTS server using Piper neural text-to-speech - Poetry for dependency management and virtual environments - OpenAI-compatible API endpoints for seamless integration - Support for multiple voice models (Ryan, Alan, Lessac) - Robust error handling and voice fallback system - Professional logging and configuration management - Docker-ready with proper Python packaging 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
130
voice-server/README.md
Normal file
130
voice-server/README.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# Homelab Voice Server
|
||||
|
||||
A local text-to-speech server using Piper TTS, designed to work with Claude Code voice assistant functionality.
|
||||
|
||||
## Features
|
||||
|
||||
- **Local TTS**: Uses Piper neural TTS for natural-sounding speech
|
||||
- **OpenAI Compatible**: Drop-in replacement for OpenAI TTS API
|
||||
- **Multiple Voices**: Support for different voice models and languages
|
||||
- **FastAPI**: Modern, fast web framework with automatic API documentation
|
||||
- **Poetry**: Dependency management and virtual environments
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. **Install dependencies:**
|
||||
```bash
|
||||
cd voice-server
|
||||
poetry install
|
||||
```
|
||||
|
||||
2. **Download voice models:**
|
||||
```bash
|
||||
mkdir -p ~/.local/share/piper-voices
|
||||
cd ~/.local/share/piper-voices
|
||||
|
||||
# Ryan voice (recommended - male US English)
|
||||
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx
|
||||
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/ryan/medium/en_US-ryan-medium.onnx.json
|
||||
```
|
||||
|
||||
3. **Start the server:**
|
||||
```bash
|
||||
poetry run voice-server
|
||||
```
|
||||
|
||||
4. **Test the API:**
|
||||
```bash
|
||||
curl -X POST "http://127.0.0.1:8880/v1/audio/speech" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"input": "Hello from the voice server!", "voice": "ryan"}' \
|
||||
--output test.wav
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Health Check
|
||||
- `GET /health` - Server health and status
|
||||
|
||||
### Models (OpenAI Compatible)
|
||||
- `GET /v1/models` - List available models
|
||||
|
||||
### Voices
|
||||
- `GET /v1/voices` - List all available voices
|
||||
- `GET /v1/voices/{voice_name}` - Get specific voice information
|
||||
|
||||
### Speech Synthesis (OpenAI Compatible)
|
||||
- `POST /v1/audio/speech` - Generate speech from text
|
||||
|
||||
#### Request Body
|
||||
```json
|
||||
{
|
||||
"input": "Text to speak",
|
||||
"voice": "ryan",
|
||||
"speed": 1.0
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables (prefix with `VOICE_SERVER_`):
|
||||
|
||||
- `VOICE_SERVER_HOST`: Server host (default: 127.0.0.1)
|
||||
- `VOICE_SERVER_PORT`: Server port (default: 8880)
|
||||
- `VOICE_SERVER_DEFAULT_VOICE`: Default voice (default: ryan)
|
||||
- `VOICE_SERVER_LOG_LEVEL`: Logging level (default: info)
|
||||
|
||||
## Available Voices
|
||||
|
||||
| Voice | Gender | Language | Description |
|
||||
|-------|--------|----------|-------------|
|
||||
| ryan | Male | en-US | Professional, clear (recommended for AI) |
|
||||
| alan | Male | en-GB | Sophisticated British accent |
|
||||
| lessac| Female | en-US | Natural, conversational |
|
||||
|
||||
## Development
|
||||
|
||||
### API Documentation
|
||||
Visit `http://127.0.0.1:8880/docs` when the server is running for interactive API documentation.
|
||||
|
||||
### Adding New Voices
|
||||
1. Download voice model files to `~/.local/share/piper-voices/`
|
||||
2. Add voice configuration to `src/voice_server/config.py`
|
||||
3. Restart the server
|
||||
|
||||
### Running Tests
|
||||
```bash
|
||||
poetry run pytest
|
||||
```
|
||||
|
||||
### Code Formatting
|
||||
```bash
|
||||
poetry run black src/
|
||||
poetry run isort src/
|
||||
```
|
||||
|
||||
## Integration with Claude Code
|
||||
|
||||
The voice server is designed to work with Claude Code's voice-mode functionality:
|
||||
|
||||
```python
|
||||
# In Claude Code
|
||||
converse("Hello! I can now speak using the local voice server.", wait_for_response=False)
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Server Won't Start
|
||||
- Check that piper-tts is installed: `which piper-tts`
|
||||
- Verify voice models are downloaded
|
||||
- Check port 8880 is available
|
||||
|
||||
### No Audio Output
|
||||
- Test piper directly: `echo "test" | piper-tts -m ~/.local/share/piper-voices/en_US-ryan-medium.onnx -f test.wav`
|
||||
- Check audio system settings
|
||||
- Verify file permissions on voice models
|
||||
|
||||
### Voice Not Available
|
||||
- Check voice files exist: `ls ~/.local/share/piper-voices/`
|
||||
- Verify file naming matches configuration
|
||||
- Check server logs for detailed error messages
|
||||
1046
voice-server/poetry.lock
generated
Normal file
1046
voice-server/poetry.lock
generated
Normal file
File diff suppressed because it is too large
Load Diff
36
voice-server/pyproject.toml
Normal file
36
voice-server/pyproject.toml
Normal file
@@ -0,0 +1,36 @@
|
||||
[tool.poetry]
|
||||
name = "homelab-voice-server"
|
||||
version = "0.1.0"
|
||||
description = "Local TTS server for Claude Code voice assistant using Piper"
|
||||
authors = ["Homelab <homelab@ak-homelab.duckdns.org>"]
|
||||
readme = "README.md"
|
||||
packages = [{include = "voice_server", from = "src"}]
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = "^3.10"
|
||||
fastapi = "^0.115.0"
|
||||
uvicorn = {extras = ["standard"], version = "^0.30.0"}
|
||||
pydantic = "^2.10.0"
|
||||
pydantic-settings = "^2.7.0"
|
||||
python-dotenv = "^1.0.0"
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
pytest = "^8.0.0"
|
||||
httpx = "^0.26.0"
|
||||
black = "^24.0.0"
|
||||
isort = "^5.13.0"
|
||||
|
||||
[tool.poetry.scripts]
|
||||
voice-server = "voice_server.main:main"
|
||||
|
||||
[build-system]
|
||||
requires = ["poetry-core"]
|
||||
build-backend = "poetry.core.masonry.api"
|
||||
|
||||
[tool.black]
|
||||
line-length = 88
|
||||
target-version = ['py310']
|
||||
|
||||
[tool.isort]
|
||||
profile = "black"
|
||||
multi_line_output = 3
|
||||
11
voice-server/src/voice_server/__init__.py
Normal file
11
voice-server/src/voice_server/__init__.py
Normal file
@@ -0,0 +1,11 @@
|
||||
"""Homelab Voice Server - Local TTS server for Claude Code."""
|
||||
|
||||
__version__ = "0.1.0"
|
||||
__author__ = "Homelab"
|
||||
__description__ = "Local TTS server using Piper for Claude Code voice assistant"
|
||||
|
||||
from .config import config
|
||||
from .tts import TTSService
|
||||
from .api import app
|
||||
|
||||
__all__ = ["config", "TTSService", "app"]
|
||||
Binary file not shown.
BIN
voice-server/src/voice_server/__pycache__/api.cpython-313.pyc
Normal file
BIN
voice-server/src/voice_server/__pycache__/api.cpython-313.pyc
Normal file
Binary file not shown.
BIN
voice-server/src/voice_server/__pycache__/config.cpython-313.pyc
Normal file
BIN
voice-server/src/voice_server/__pycache__/config.cpython-313.pyc
Normal file
Binary file not shown.
BIN
voice-server/src/voice_server/__pycache__/main.cpython-313.pyc
Normal file
BIN
voice-server/src/voice_server/__pycache__/main.cpython-313.pyc
Normal file
Binary file not shown.
BIN
voice-server/src/voice_server/__pycache__/tts.cpython-313.pyc
Normal file
BIN
voice-server/src/voice_server/__pycache__/tts.cpython-313.pyc
Normal file
Binary file not shown.
169
voice-server/src/voice_server/api.py
Normal file
169
voice-server/src/voice_server/api.py
Normal file
@@ -0,0 +1,169 @@
|
||||
"""FastAPI application for voice server."""
|
||||
import logging
|
||||
from typing import Optional
|
||||
from fastapi import FastAPI, HTTPException, Response
|
||||
from pydantic import BaseModel, Field
|
||||
from .tts import TTSService
|
||||
from .config import config
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=getattr(logging, config.log_level.upper()))
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Initialize TTS service
|
||||
try:
|
||||
tts_service = TTSService()
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to initialize TTS service: {e}")
|
||||
tts_service = None
|
||||
|
||||
app = FastAPI(
|
||||
title="Homelab Voice Server",
|
||||
description="Local TTS server for Claude Code voice assistant using Piper",
|
||||
version="0.1.0"
|
||||
)
|
||||
|
||||
|
||||
class TTSRequest(BaseModel):
|
||||
"""Request model for TTS synthesis."""
|
||||
input: str = Field(..., description="Text to synthesize")
|
||||
model: str = Field(default="tts-1", description="Model to use (for compatibility)")
|
||||
voice: str = Field(default="alloy", description="Voice to use")
|
||||
response_format: str = Field(default="mp3", description="Audio format (ignored, always returns wav)")
|
||||
speed: float = Field(default=1.0, ge=0.25, le=4.0, description="Speech speed")
|
||||
|
||||
|
||||
class ModelInfo(BaseModel):
|
||||
"""Model information."""
|
||||
id: str
|
||||
object: str = "model"
|
||||
created: int = 1677649963
|
||||
owned_by: str = "piper"
|
||||
|
||||
|
||||
class ModelsResponse(BaseModel):
|
||||
"""Response for models endpoint."""
|
||||
object: str = "list"
|
||||
data: list[ModelInfo]
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health_check():
|
||||
"""Health check endpoint."""
|
||||
if tts_service is None:
|
||||
raise HTTPException(status_code=503, detail="TTS service not available")
|
||||
|
||||
return {
|
||||
"status": "healthy",
|
||||
"tts_available": True,
|
||||
"default_voice": config.default_voice,
|
||||
"voices_available": len(config.available_voices)
|
||||
}
|
||||
|
||||
|
||||
@app.get("/v1/models", response_model=ModelsResponse)
|
||||
async def list_models():
|
||||
"""List available models (OpenAI compatible)."""
|
||||
return ModelsResponse(
|
||||
object="list",
|
||||
data=[
|
||||
ModelInfo(id="tts-1", owned_by="piper"),
|
||||
ModelInfo(id="tts-1-hd", owned_by="piper")
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
@app.get("/v1/voices")
|
||||
async def list_voices():
|
||||
"""List available voices."""
|
||||
if tts_service is None:
|
||||
raise HTTPException(status_code=503, detail="TTS service not available")
|
||||
|
||||
return {"voices": tts_service.list_voices()}
|
||||
|
||||
|
||||
@app.get("/v1/voices/{voice_name}")
|
||||
async def get_voice_info(voice_name: str):
|
||||
"""Get information about a specific voice."""
|
||||
if tts_service is None:
|
||||
raise HTTPException(status_code=503, detail="TTS service not available")
|
||||
|
||||
try:
|
||||
voice_info = tts_service.get_voice_info(voice_name)
|
||||
return voice_info
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=404, detail=str(e))
|
||||
|
||||
|
||||
@app.post("/v1/audio/speech")
|
||||
async def create_speech(request: TTSRequest):
|
||||
"""
|
||||
Create speech from text (OpenAI compatible).
|
||||
|
||||
Returns raw audio data as wav format.
|
||||
"""
|
||||
if tts_service is None:
|
||||
raise HTTPException(status_code=503, detail="TTS service not available")
|
||||
|
||||
# Map common voice names to our voices
|
||||
voice_mapping = {
|
||||
# OpenAI voices
|
||||
"alloy": config.default_voice,
|
||||
"echo": config.default_voice,
|
||||
"fable": config.default_voice,
|
||||
"onyx": config.default_voice,
|
||||
"nova": "lessac", # Female voice
|
||||
"shimmer": "lessac", # Female voice
|
||||
# Common defaults
|
||||
"default": config.default_voice,
|
||||
"male": config.default_voice,
|
||||
"female": "lessac"
|
||||
}
|
||||
|
||||
# Get voice name, with fallback to default
|
||||
voice_name = voice_mapping.get(request.voice, request.voice)
|
||||
|
||||
# If the requested voice doesn't exist in our available voices, use default
|
||||
if voice_name not in config.available_voices:
|
||||
logger.warning(f"Requested voice '{voice_name}' not available, using default: {config.default_voice}")
|
||||
voice_name = config.default_voice
|
||||
|
||||
try:
|
||||
audio_data, audio_format = tts_service.synthesize(
|
||||
text=request.input,
|
||||
voice=voice_name,
|
||||
speed=request.speed
|
||||
)
|
||||
|
||||
# Return raw audio data
|
||||
return Response(
|
||||
content=audio_data,
|
||||
media_type="audio/wav",
|
||||
headers={
|
||||
"Content-Disposition": "attachment; filename=speech.wav"
|
||||
}
|
||||
)
|
||||
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
except RuntimeError as e:
|
||||
logger.error(f"TTS synthesis failed: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"TTS synthesis failed: {e}")
|
||||
|
||||
|
||||
@app.get("/")
|
||||
async def root():
|
||||
"""Root endpoint with API information."""
|
||||
return {
|
||||
"service": "Homelab Voice Server",
|
||||
"version": "0.1.0",
|
||||
"description": "Local TTS server using Piper",
|
||||
"endpoints": {
|
||||
"health": "/health",
|
||||
"models": "/v1/models",
|
||||
"voices": "/v1/voices",
|
||||
"speech": "/v1/audio/speech"
|
||||
},
|
||||
"default_voice": config.default_voice,
|
||||
"available_voices": list(config.available_voices.keys()) if tts_service else []
|
||||
}
|
||||
90
voice-server/src/voice_server/config.py
Normal file
90
voice-server/src/voice_server/config.py
Normal file
@@ -0,0 +1,90 @@
|
||||
"""Configuration for the voice server."""
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
from pydantic_settings import BaseSettings
|
||||
from pydantic import Field
|
||||
|
||||
|
||||
class VoiceServerConfig(BaseSettings):
|
||||
"""Voice server configuration."""
|
||||
|
||||
host: str = Field(default="127.0.0.1", description="Server host")
|
||||
port: int = Field(default=8880, description="Server port")
|
||||
|
||||
# Voice model configuration
|
||||
default_voice: str = Field(default="ryan", description="Default voice model")
|
||||
voices_dir: Path = Field(
|
||||
default_factory=lambda: Path.home() / ".local/share/piper-voices",
|
||||
description="Directory containing voice models"
|
||||
)
|
||||
|
||||
# Available voice models
|
||||
available_voices: Dict[str, Dict[str, Any]] = Field(
|
||||
default_factory=lambda: {
|
||||
"ryan": {
|
||||
"model_file": "en_US-ryan-medium.onnx",
|
||||
"config_file": "en_US-ryan-medium.onnx.json",
|
||||
"language": "en-US",
|
||||
"gender": "male",
|
||||
"description": "Professional US male voice"
|
||||
},
|
||||
"alan": {
|
||||
"model_file": "en_GB-alan-medium.onnx",
|
||||
"config_file": "en_GB-alan-medium.onnx.json",
|
||||
"language": "en-GB",
|
||||
"gender": "male",
|
||||
"description": "Sophisticated British male voice"
|
||||
},
|
||||
"lessac": {
|
||||
"model_file": "en_US-lessac-medium.onnx",
|
||||
"config_file": "en_US-lessac-medium.onnx.json",
|
||||
"language": "en-US",
|
||||
"gender": "female",
|
||||
"description": "Natural US female voice"
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
# Piper TTS configuration
|
||||
piper_executable: str = Field(default="piper-tts", description="Piper TTS executable")
|
||||
audio_format: str = Field(default="wav", description="Audio output format")
|
||||
|
||||
# Server configuration
|
||||
log_level: str = Field(default="info", description="Logging level")
|
||||
|
||||
class Config:
|
||||
env_prefix = "VOICE_SERVER_"
|
||||
env_file = ".env"
|
||||
|
||||
def get_voice_model_path(self, voice_name: str = None) -> Path:
|
||||
"""Get the full path to a voice model file."""
|
||||
voice_name = voice_name or self.default_voice
|
||||
if voice_name not in self.available_voices:
|
||||
raise ValueError(f"Voice '{voice_name}' not found in available voices")
|
||||
|
||||
voice_config = self.available_voices[voice_name]
|
||||
return self.voices_dir / voice_config["model_file"]
|
||||
|
||||
def get_voice_config_path(self, voice_name: str = None) -> Path:
|
||||
"""Get the full path to a voice config file."""
|
||||
voice_name = voice_name or self.default_voice
|
||||
if voice_name not in self.available_voices:
|
||||
raise ValueError(f"Voice '{voice_name}' not found in available voices")
|
||||
|
||||
voice_config = self.available_voices[voice_name]
|
||||
return self.voices_dir / voice_config["config_file"]
|
||||
|
||||
def validate_voice_files(self, voice_name: str = None) -> bool:
|
||||
"""Check if voice model files exist."""
|
||||
voice_name = voice_name or self.default_voice
|
||||
try:
|
||||
model_path = self.get_voice_model_path(voice_name)
|
||||
config_path = self.get_voice_config_path(voice_name)
|
||||
return model_path.exists() and config_path.exists()
|
||||
except ValueError:
|
||||
return False
|
||||
|
||||
|
||||
# Global configuration instance
|
||||
config = VoiceServerConfig()
|
||||
82
voice-server/src/voice_server/main.py
Normal file
82
voice-server/src/voice_server/main.py
Normal file
@@ -0,0 +1,82 @@
|
||||
"""Main entry point for the voice server."""
|
||||
import logging
|
||||
import sys
|
||||
from pathlib import Path
|
||||
import uvicorn
|
||||
from .config import config
|
||||
from .api import app
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def check_prerequisites():
|
||||
"""Check if all prerequisites are met."""
|
||||
errors = []
|
||||
|
||||
# Check if voices directory exists
|
||||
if not config.voices_dir.exists():
|
||||
errors.append(f"Voices directory not found: {config.voices_dir}")
|
||||
errors.append("Run: mkdir -p ~/.local/share/piper-voices")
|
||||
|
||||
# Check if default voice files exist
|
||||
if not config.validate_voice_files():
|
||||
voice_name = config.default_voice
|
||||
model_path = config.get_voice_model_path()
|
||||
errors.append(f"Default voice '{voice_name}' files not found")
|
||||
errors.append(f"Expected model at: {model_path}")
|
||||
errors.append("Download voice models from: https://huggingface.co/rhasspy/piper-voices")
|
||||
|
||||
# Check available voices
|
||||
available_count = sum(
|
||||
1 for voice in config.available_voices
|
||||
if config.validate_voice_files(voice)
|
||||
)
|
||||
|
||||
if available_count == 0:
|
||||
errors.append("No voice models available")
|
||||
errors.append("Please download at least one voice model")
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point."""
|
||||
# Set up logging
|
||||
logging.basicConfig(
|
||||
level=getattr(logging, config.log_level.upper()),
|
||||
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
|
||||
)
|
||||
|
||||
logger.info("Starting Homelab Voice Server")
|
||||
logger.info(f"Configuration: {config.dict()}")
|
||||
|
||||
# Check prerequisites
|
||||
errors = check_prerequisites()
|
||||
if errors:
|
||||
logger.error("Prerequisites not met:")
|
||||
for error in errors:
|
||||
logger.error(f" - {error}")
|
||||
sys.exit(1)
|
||||
|
||||
# Log available voices
|
||||
available_voices = [
|
||||
voice for voice in config.available_voices
|
||||
if config.validate_voice_files(voice)
|
||||
]
|
||||
logger.info(f"Available voices: {available_voices}")
|
||||
logger.info(f"Default voice: {config.default_voice}")
|
||||
|
||||
# Start server
|
||||
logger.info(f"Starting server on {config.host}:{config.port}")
|
||||
|
||||
uvicorn.run(
|
||||
app,
|
||||
host=config.host,
|
||||
port=config.port,
|
||||
log_level=config.log_level,
|
||||
access_log=True
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
158
voice-server/src/voice_server/tts.py
Normal file
158
voice-server/src/voice_server/tts.py
Normal file
@@ -0,0 +1,158 @@
|
||||
"""Text-to-speech service using Piper."""
|
||||
import subprocess
|
||||
import tempfile
|
||||
import os
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Optional, Tuple
|
||||
from .config import config
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class TTSService:
|
||||
"""Text-to-speech service using Piper."""
|
||||
|
||||
def __init__(self):
|
||||
self.config = config
|
||||
self._validate_setup()
|
||||
|
||||
def _validate_setup(self):
|
||||
"""Validate that piper and voice models are available."""
|
||||
# Check if piper-tts is available
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[self.config.piper_executable, "--help"],
|
||||
capture_output=True,
|
||||
timeout=10
|
||||
)
|
||||
if result.returncode != 0:
|
||||
raise RuntimeError(f"Piper TTS not working: {result.stderr.decode()}")
|
||||
except (subprocess.TimeoutExpired, FileNotFoundError) as e:
|
||||
raise RuntimeError(f"Piper TTS not found or not working: {e}")
|
||||
|
||||
# Check if default voice model exists
|
||||
if not self.config.validate_voice_files():
|
||||
default_voice = self.config.default_voice
|
||||
model_path = self.config.get_voice_model_path()
|
||||
raise RuntimeError(
|
||||
f"Default voice '{default_voice}' model not found at {model_path}. "
|
||||
f"Please download the voice model files."
|
||||
)
|
||||
|
||||
logger.info(f"TTS service initialized with voice: {self.config.default_voice}")
|
||||
|
||||
def synthesize(
|
||||
self,
|
||||
text: str,
|
||||
voice: Optional[str] = None,
|
||||
speed: float = 1.0
|
||||
) -> Tuple[bytes, str]:
|
||||
"""
|
||||
Synthesize text to speech.
|
||||
|
||||
Args:
|
||||
text: Text to synthesize
|
||||
voice: Voice to use (defaults to configured default)
|
||||
speed: Speech speed multiplier
|
||||
|
||||
Returns:
|
||||
Tuple of (audio_data, audio_format)
|
||||
|
||||
Raises:
|
||||
ValueError: If voice is not available
|
||||
RuntimeError: If synthesis fails
|
||||
"""
|
||||
voice = voice or self.config.default_voice
|
||||
|
||||
if not self.config.validate_voice_files(voice):
|
||||
available_voices = list(self.config.available_voices.keys())
|
||||
raise ValueError(
|
||||
f"Voice '{voice}' not available. Available voices: {available_voices}"
|
||||
)
|
||||
|
||||
model_path = self.config.get_voice_model_path(voice)
|
||||
|
||||
# Create temporary file for output
|
||||
with tempfile.NamedTemporaryFile(suffix=f".{self.config.audio_format}", delete=False) as temp_file:
|
||||
temp_path = temp_file.name
|
||||
|
||||
try:
|
||||
# Build piper command
|
||||
cmd = [
|
||||
self.config.piper_executable,
|
||||
"-m", str(model_path),
|
||||
"-f", temp_path
|
||||
]
|
||||
|
||||
# Add speed if different from default
|
||||
if speed != 1.0:
|
||||
cmd.extend(["--length-scale", str(1.0 / speed)])
|
||||
|
||||
logger.debug(f"Running piper command: {' '.join(cmd)}")
|
||||
|
||||
# Run piper-tts
|
||||
process = subprocess.Popen(
|
||||
cmd,
|
||||
stdin=subprocess.PIPE,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
text=True
|
||||
)
|
||||
|
||||
stdout, stderr = process.communicate(input=text, timeout=30)
|
||||
|
||||
if process.returncode != 0:
|
||||
raise RuntimeError(f"TTS synthesis failed: {stderr}")
|
||||
|
||||
# Read the generated audio file
|
||||
with open(temp_path, "rb") as f:
|
||||
audio_data = f.read()
|
||||
|
||||
if not audio_data:
|
||||
raise RuntimeError("Generated audio file is empty")
|
||||
|
||||
logger.info(f"Successfully synthesized {len(text)} characters with voice '{voice}'")
|
||||
return audio_data, self.config.audio_format
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
process.kill()
|
||||
raise RuntimeError("TTS synthesis timed out")
|
||||
except Exception as e:
|
||||
logger.error(f"TTS synthesis error: {e}")
|
||||
raise
|
||||
finally:
|
||||
# Clean up temp file
|
||||
try:
|
||||
os.unlink(temp_path)
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
def list_voices(self) -> dict:
|
||||
"""List available voices with their information."""
|
||||
voices = {}
|
||||
for voice_name, voice_config in self.config.available_voices.items():
|
||||
voices[voice_name] = {
|
||||
"name": voice_name,
|
||||
"language": voice_config["language"],
|
||||
"gender": voice_config["gender"],
|
||||
"description": voice_config["description"],
|
||||
"available": self.config.validate_voice_files(voice_name)
|
||||
}
|
||||
return voices
|
||||
|
||||
def get_voice_info(self, voice_name: str) -> dict:
|
||||
"""Get information about a specific voice."""
|
||||
if voice_name not in self.config.available_voices:
|
||||
raise ValueError(f"Voice '{voice_name}' not found")
|
||||
|
||||
voice_config = self.config.available_voices[voice_name]
|
||||
return {
|
||||
"name": voice_name,
|
||||
"language": voice_config["language"],
|
||||
"gender": voice_config["gender"],
|
||||
"description": voice_config["description"],
|
||||
"available": self.config.validate_voice_files(voice_name),
|
||||
"model_path": str(self.config.get_voice_model_path(voice_name)),
|
||||
"config_path": str(self.config.get_voice_config_path(voice_name))
|
||||
}
|
||||
Reference in New Issue
Block a user