🎤 Bilingual Voice Assistant - Google AIY Voice Kit V1
AI Now Inc - Del Mar Demo Unit
Laboratory Assistant: Claw 🏭
A bilingual (English/Mandarin) voice-activated assistant for Google AIY Voice Kit V1 with music playback capability.
Features
- ✅ Bilingual Support - English and Mandarin Chinese speech recognition
- ✅ Text-to-Speech - Respond in the detected language
- ✅ Music Playback - Play MP3 files by voice command
- ✅ Remote Communication - Connect to OpenClaw assistant via API
- ✅ Offline Capability - Basic commands work without internet
- ✅ Hotword Detection - "Hey Assistant" / "你好助手" wake word
Hardware Requirements
- Google AIY Voice Kit V1 (with Voice HAT)
- Raspberry Pi (3B/3B+/4B recommended)
- MicroSD Card (8GB+)
- Speaker (3.5mm or HDMI audio)
- Microphone (included with AIY Kit)
- Internet Connection (WiFi/Ethernet)
Software Architecture
┌─────────────────────────────────────────────────────────┐
│ Google AIY Voice Kit V1 │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Hotword │ │ Speech │ │ Command │ │
│ │ Detection │→ │ Recognition │→ │ Processing │ │
│ └─────────────┘ └──────────────┘ └──────────────┘ │
│ ↓ ↓ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Language Detection (en/zh) │ │
│ └──────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ OpenClaw API Communication │ │
│ └──────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ TTS │ │ Music Player │ │ Response │ │
│ │ (en/zh) │ │ (MP3) │ │ Handler │ │
│ └─────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
Installation
1. Setup Google AIY Voice Kit
# Update system
sudo apt-get update
sudo apt-get upgrade
# Install AIY Voice Kit software
cd ~
git clone https://github.com/google/aiyprojects-raspbian.git
cd aiyprojects-raspbian
bash install.sh
sudo reboot
2. Install Dependencies
# Python dependencies
pip3 install google-cloud-speech google-cloud-texttospeech
pip3 install pygame mutagen
pip3 install requests websocket-client
pip3 install langdetect
3. Configure Google Cloud (Optional - for cloud services)
# Set up Google Cloud credentials
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials.json"
Configuration
Edit config.json:
{
"openclaw": {
"enabled": true,
"ws_url": "ws://192.168.1.100:18790",
"api_key": "your_api_key"
},
"speech": {
"language": "auto",
"hotword": "hey assistant|你好助手"
},
"music": {
"library_path": "/home/pi/Music",
"default_volume": 0.7
},
"tts": {
"english_voice": "en-US-Standard-A",
"chinese_voice": "zh-CN-Standard-A"
}
}
Usage
Start the Assistant
cd /home/pi/voice-assistant
python3 main.py
Voice Commands
General Commands
- "Hey Assistant, what time is it?" / "你好助手,现在几点?"
- "Hey Assistant, how are you?" / "你好助手,你好吗?"
- "Hey Assistant, tell me a joke" / "你好助手,讲个笑话"
Music Commands
- "Hey Assistant, play [song name]" / "你好助手,播放 [歌曲名]"
- "Hey Assistant, pause" / "你好助手,暂停"
- "Hey Assistant, resume" / "你好助手,继续"
- "Hey Assistant, stop" / "你好助手,停止"
- "Hey Assistant, next track" / "你好助手,下一首"
- "Hey Assistant, volume up" / "你好助手,音量加大"
OpenClaw Commands
- "Hey Assistant, ask Claw: [your question]"
- "你好助手,问 Claw:[你的问题]"
Project Structure
voice-assistant/
├── main.py # Main entry point
├── config.json # Configuration file
├── assistant.py # Core assistant logic
├── speech_recognizer.py # Speech recognition (en/zh)
├── tts_engine.py # Text-to-speech engine
├── music_player.py # MP3 playback control
├── openclaw_client.py # OpenClaw API client
├── hotword_detector.py # Wake word detection
├── requirements.txt # Python dependencies
└── samples/ # Sample audio files
Language Detection
The system automatically detects the spoken language:
- English keywords → English response
- Chinese keywords → Mandarin response
- Mixed input → Respond in dominant language
Music Library
Organize your MP3 files:
/home/pi/Music/
├── artist1/
│ ├── song1.mp3
│ └── song2.mp3
├── artist2/
│ └── song3.mp3
└── playlist/
└── favorites.mp3
Advanced Features
Custom Hotword
Train your own hotword using Porcupine or Snowboy.
Offline Speech Recognition
Use Vosk or PocketSphinx for offline recognition.
Multi-room Audio
Stream audio to multiple devices via Snapcast.
Voice Profiles
Recognize different users and personalize responses.
Troubleshooting
Microphone not detected
arecord -l # List audio devices
alsamixer # Check levels
Poor speech recognition
- Speak clearly and closer to the microphone
- Reduce background noise
- Check internet connection for cloud recognition
Music playback issues
# Test audio output
speaker-test -t wav
# Check volume
alsamixer
Next Steps
- Add voice profile recognition
- Implement offline speech recognition
- Add Spotify/Apple Music integration
- Create web UI for music library management
- Add multi-language support (Spanish, French, etc.)
- Implement voice commands for industrial control
AI Now Inc - Del Mar Show Demo Unit
Contact: Laboratory Assistant Claw 🏭
Version: 1.0.0
Description
Bilingual Voice Assistant for Google AIY Voice Kit V1 - English/Mandarin support with 'Hey Osiris' hotword detection
Languages
Python
81.6%
Shell
18.4%