Features:
- Bilingual support (English/Mandarin Chinese)
- Hotword detection: 'Hey Osiris' / '你好 Osiris'
- Music playback control (MP3, WAV, OGG, FLAC)
- OpenClaw integration for AI responses
- Google AIY Voice Kit V1 compatible
- Text-to-speech in both languages
- Voice command recognition
- Raspberry Pi ready with installation script
AI Now Inc - Del Mar Demo Unit 🏭
228 lines
7.2 KiB
Markdown
228 lines
7.2 KiB
Markdown
# 🎤 Bilingual Voice Assistant - Google AIY Voice Kit V1
|
||
|
||
**AI Now Inc - Del Mar Demo Unit**
|
||
**Laboratory Assistant:** Claw 🏭
|
||
|
||
A bilingual (English/Mandarin) voice-activated assistant for Google AIY Voice Kit V1 with music playback capability.
|
||
|
||
## Features
|
||
|
||
- ✅ **Bilingual Support** - English and Mandarin Chinese speech recognition
|
||
- ✅ **Text-to-Speech** - Respond in the detected language
|
||
- ✅ **Music Playback** - Play MP3 files by voice command
|
||
- ✅ **Remote Communication** - Connect to OpenClaw assistant via API
|
||
- ✅ **Offline Capability** - Basic commands work without internet
|
||
- ✅ **Hotword Detection** - "Hey Assistant" / "你好助手" wake word
|
||
|
||
## Hardware Requirements
|
||
|
||
- **Google AIY Voice Kit V1** (with Voice HAT)
|
||
- **Raspberry Pi** (3B/3B+/4B recommended)
|
||
- **MicroSD Card** (8GB+)
|
||
- **Speaker** (3.5mm or HDMI audio)
|
||
- **Microphone** (included with AIY Kit)
|
||
- **Internet Connection** (WiFi/Ethernet)
|
||
|
||
## Software Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────┐
|
||
│ Google AIY Voice Kit V1 │
|
||
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ Hotword │ │ Speech │ │ Command │ │
|
||
│ │ Detection │→ │ Recognition │→ │ Processing │ │
|
||
│ └─────────────┘ └──────────────┘ └──────────────┘ │
|
||
│ ↓ ↓ │
|
||
│ ┌──────────────────────────────────────────────────┐ │
|
||
│ │ Language Detection (en/zh) │ │
|
||
│ └──────────────────────────────────────────────────┘ │
|
||
│ ↓ │
|
||
│ ┌──────────────────────────────────────────────────┐ │
|
||
│ │ OpenClaw API Communication │ │
|
||
│ └──────────────────────────────────────────────────┘ │
|
||
│ ↓ │
|
||
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||
│ │ TTS │ │ Music Player │ │ Response │ │
|
||
│ │ (en/zh) │ │ (MP3) │ │ Handler │ │
|
||
│ └─────────────┘ └──────────────┘ └──────────────┘ │
|
||
└─────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Installation
|
||
|
||
### 1. Setup Google AIY Voice Kit
|
||
|
||
```bash
|
||
# Update system
|
||
sudo apt-get update
|
||
sudo apt-get upgrade
|
||
|
||
# Install AIY Voice Kit software
|
||
cd ~
|
||
git clone https://github.com/google/aiyprojects-raspbian.git
|
||
cd aiyprojects-raspbian
|
||
bash install.sh
|
||
sudo reboot
|
||
```
|
||
|
||
### 2. Install Dependencies
|
||
|
||
```bash
|
||
# Python dependencies
|
||
pip3 install google-cloud-speech google-cloud-texttospeech
|
||
pip3 install pygame mutagen
|
||
pip3 install requests websocket-client
|
||
pip3 install langdetect
|
||
```
|
||
|
||
### 3. Configure Google Cloud (Optional - for cloud services)
|
||
|
||
```bash
|
||
# Set up Google Cloud credentials
|
||
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials.json"
|
||
```
|
||
|
||
## Configuration
|
||
|
||
Edit `config.json`:
|
||
|
||
```json
|
||
{
|
||
"openclaw": {
|
||
"enabled": true,
|
||
"ws_url": "ws://192.168.1.100:18790",
|
||
"api_key": "your_api_key"
|
||
},
|
||
"speech": {
|
||
"language": "auto",
|
||
"hotword": "hey assistant|你好助手"
|
||
},
|
||
"music": {
|
||
"library_path": "/home/pi/Music",
|
||
"default_volume": 0.7
|
||
},
|
||
"tts": {
|
||
"english_voice": "en-US-Standard-A",
|
||
"chinese_voice": "zh-CN-Standard-A"
|
||
}
|
||
}
|
||
```
|
||
|
||
## Usage
|
||
|
||
### Start the Assistant
|
||
|
||
```bash
|
||
cd /home/pi/voice-assistant
|
||
python3 main.py
|
||
```
|
||
|
||
### Voice Commands
|
||
|
||
#### General Commands
|
||
- "Hey Assistant, what time is it?" / "你好助手,现在几点?"
|
||
- "Hey Assistant, how are you?" / "你好助手,你好吗?"
|
||
- "Hey Assistant, tell me a joke" / "你好助手,讲个笑话"
|
||
|
||
#### Music Commands
|
||
- "Hey Assistant, play [song name]" / "你好助手,播放 [歌曲名]"
|
||
- "Hey Assistant, pause" / "你好助手,暂停"
|
||
- "Hey Assistant, resume" / "你好助手,继续"
|
||
- "Hey Assistant, stop" / "你好助手,停止"
|
||
- "Hey Assistant, next track" / "你好助手,下一首"
|
||
- "Hey Assistant, volume up" / "你好助手,音量加大"
|
||
|
||
#### OpenClaw Commands
|
||
- "Hey Assistant, ask Claw: [your question]"
|
||
- "你好助手,问 Claw:[你的问题]"
|
||
|
||
## Project Structure
|
||
|
||
```
|
||
voice-assistant/
|
||
├── main.py # Main entry point
|
||
├── config.json # Configuration file
|
||
├── assistant.py # Core assistant logic
|
||
├── speech_recognizer.py # Speech recognition (en/zh)
|
||
├── tts_engine.py # Text-to-speech engine
|
||
├── music_player.py # MP3 playback control
|
||
├── openclaw_client.py # OpenClaw API client
|
||
├── hotword_detector.py # Wake word detection
|
||
├── requirements.txt # Python dependencies
|
||
└── samples/ # Sample audio files
|
||
```
|
||
|
||
## Language Detection
|
||
|
||
The system automatically detects the spoken language:
|
||
|
||
- **English keywords** → English response
|
||
- **Chinese keywords** → Mandarin response
|
||
- **Mixed input** → Respond in dominant language
|
||
|
||
## Music Library
|
||
|
||
Organize your MP3 files:
|
||
|
||
```
|
||
/home/pi/Music/
|
||
├── artist1/
|
||
│ ├── song1.mp3
|
||
│ └── song2.mp3
|
||
├── artist2/
|
||
│ └── song3.mp3
|
||
└── playlist/
|
||
└── favorites.mp3
|
||
```
|
||
|
||
## Advanced Features
|
||
|
||
### Custom Hotword
|
||
Train your own hotword using Porcupine or Snowboy.
|
||
|
||
### Offline Speech Recognition
|
||
Use Vosk or PocketSphinx for offline recognition.
|
||
|
||
### Multi-room Audio
|
||
Stream audio to multiple devices via Snapcast.
|
||
|
||
### Voice Profiles
|
||
Recognize different users and personalize responses.
|
||
|
||
## Troubleshooting
|
||
|
||
### Microphone not detected
|
||
```bash
|
||
arecord -l # List audio devices
|
||
alsamixer # Check levels
|
||
```
|
||
|
||
### Poor speech recognition
|
||
- Speak clearly and closer to the microphone
|
||
- Reduce background noise
|
||
- Check internet connection for cloud recognition
|
||
|
||
### Music playback issues
|
||
```bash
|
||
# Test audio output
|
||
speaker-test -t wav
|
||
|
||
# Check volume
|
||
alsamixer
|
||
```
|
||
|
||
## Next Steps
|
||
|
||
- [ ] Add voice profile recognition
|
||
- [ ] Implement offline speech recognition
|
||
- [ ] Add Spotify/Apple Music integration
|
||
- [ ] Create web UI for music library management
|
||
- [ ] Add multi-language support (Spanish, French, etc.)
|
||
- [ ] Implement voice commands for industrial control
|
||
|
||
---
|
||
|
||
**AI Now Inc** - Del Mar Show Demo Unit
|
||
**Contact:** Laboratory Assistant Claw 🏭
|
||
**Version:** 1.0.0
|