158 lines
4.7 KiB
Markdown
158 lines
4.7 KiB
Markdown
|
|
# AI语音交互命令行工具
|
|||
|
|
|
|||
|
|
`AI代理` `语音交互` `CLI工具` `OpenAI` `实时API` `Node.js`
|
|||
|
|
|
|||
|
|
# Agent Voice
|
|||
|
|
|
|||
|
|
免提 AI 代理。`agent-voice` 是一个命令行工具,让 AI 代理能够通过 OpenAI Realtime API(或任何兼容提供商)与人类进行语音交互。
|
|||
|
|
|
|||
|
|
两个基本原语:**say**(向用户说话)和 **ask**(说话后等待回应)。就这么简单。
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
agent-voice say -m "正在部署到生产环境。"
|
|||
|
|
agent-voice ask -m "应该用 Postgres 还是 SQLite?"
|
|||
|
|
# → 用户说话 → "Postgres"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Agent Skill
|
|||
|
|
|
|||
|
|
Agent Voice 以 [Agent Skill](https://skills.sh/adriancooney/agent-voice) 形式发布,适用于 AI 编程代理。
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npx skills add adriancooney/agent-voice
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
`/voice` 技能可启动免提语音对话。代理用 `say` 说话,用 `ask` 监听——无需屏幕。
|
|||
|
|
|
|||
|
|
## 安装
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npm install -g agent-voice
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 配置
|
|||
|
|
|
|||
|
|
Agent Voice 需要一个兼容 OpenAI 的 Realtime API 密钥。运行认证向导:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
agent-voice auth
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
凭据保存至 `~/.agent-voice/config.json`(权限 `0600`)。也可设置 `OPENAI_API_KEY` 环境变量——CLI 同时检查两者,优先使用配置文件。
|
|||
|
|
|
|||
|
|
任何兼容 OpenAI 的 Realtime API 均可使用——在 `agent-voice auth` 时提供自定义 base URL 即可。
|
|||
|
|
|
|||
|
|
## CLI
|
|||
|
|
|
|||
|
|
### `say`
|
|||
|
|
|
|||
|
|
播放一条消息。无需麦克风,无需回应——即发即忘。
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
agent-voice say -m "构建完成,无错误。"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
| 参数 | 默认值 | 说明 |
|
|||
|
|
|------|--------|------|
|
|||
|
|
| `-m, --message` | — | 要播放的文本(或通过 stdin 传入) |
|
|||
|
|
| `--voice` | `ash` | 使用的语音 |
|
|||
|
|
| `--no-daemon` | — | 跳过守护进程,直接运行 |
|
|||
|
|
|
|||
|
|
### `ask`
|
|||
|
|
|
|||
|
|
播放一条消息,然后监听用户的语音回应。将转录结果输出到 stdout。
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
agent-voice ask -m "这个组件应该叫什么名字?"
|
|||
|
|
# stdout: SearchBar
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
| 参数 | 默认值 | 说明 |
|
|||
|
|
|------|--------|------|
|
|||
|
|
| `-m, --message` | — | 要播放的文本(或通过 stdin 传入) |
|
|||
|
|
| `--voice` | `ash` | 使用的语音 |
|
|||
|
|
| `--timeout` | `120` | 等待语音的秒数 |
|
|||
|
|
| `--ack` | `false` | 用户回应后播放简短确认语 |
|
|||
|
|
| `--no-daemon` | — | 跳过守护进程,直接运行 |
|
|||
|
|
|
|||
|
|
### `voices`
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
agent-voice voices # 列出所有语音
|
|||
|
|
agent-voice voices set coral # 设置默认语音
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### `config`
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
agent-voice config get # 显示所有配置
|
|||
|
|
agent-voice config set debug true # 设置某个值
|
|||
|
|
agent-voice config reset # 重置为默认值(保留认证信息)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### `daemon`
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
agent-voice daemon start # 启动(已运行则无操作)
|
|||
|
|
agent-voice daemon stop # 优雅停止
|
|||
|
|
agent-voice daemon restart # 停止后重启
|
|||
|
|
agent-voice daemon status # 显示 PID、运行时长、命令数量
|
|||
|
|
agent-voice daemon logs -f # 实时跟踪事件日志
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 守护进程
|
|||
|
|
|
|||
|
|
后台进程,在命令之间保持音频引擎预热状态,降低启动延迟。首次执行 `say`/`ask` 时自动启动——无需手动配置。
|
|||
|
|
|
|||
|
|
- 监听 Unix socket:`~/.agent-voice/daemon.sock`
|
|||
|
|
- 串行执行命令(音频硬件为单消费者模式)
|
|||
|
|
- 空闲 30 分钟后自动退出(可配置)
|
|||
|
|
- 守护进程无法启动时回退为直接执行
|
|||
|
|
|
|||
|
|
## 调试日志
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
agent-voice config set debug true # NDJSON 事件追踪
|
|||
|
|
agent-voice config set debug.audio true # 同时捕获 WAV 文件
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
启用后,所有命令将结构化追踪写入 `~/.agent-voice/logs/events.ndjson`,WAV 捕获(助手、麦克风、模型输入)写入 `~/.agent-voice/logs/audio/`。音频文件使用环形缓冲区——保留最近 50 条命令,最旧的自动删除。
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
agent-voice daemon logs -f # 实时跟踪
|
|||
|
|
agent-voice daemon logs -n 100 # 最近 100 条
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Node.js API
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
npm install agent-voice
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
import { say, ask } from "agent-voice";
|
|||
|
|
|
|||
|
|
await say("部署完成。");
|
|||
|
|
|
|||
|
|
const answer = await ask("应该用哪个数据库?");
|
|||
|
|
// → "Postgres"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
完整 [API 参考](./packages/agent-voice/README.md#nodejs-api) 包含音频回调和追踪事件等选项。
|
|||
|
|
|
|||
|
|
## 工作原理
|
|||
|
|
|
|||
|
|
Agent Voice 通过 WebSocket 连接到 OpenAI Realtime API。文本作为对话条目发送,由模型朗读。`ask` 模式下,消息播放完毕后麦克风打开,音频流式传输到 API,使用 `gpt-4o-transcribe` 结合语义 VAD 进行转录,判断用户何时停止说话。
|
|||
|
|
|
|||
|
|
音频格式为 24kHz 单声道 PCM16,通过内置声学回声消除(AEC)的 Rust 音频引擎处理,防止助手听到自身的播放声。
|
|||
|
|
|
|||
|
|
## 软件包
|
|||
|
|
|
|||
|
|
| 软件包 | 说明 |
|
|||
|
|
|--------|------|
|
|||
|
|
| [`agent-voice`](./packages/agent-voice) | CLI、Node.js API、守护进程、调试日志 |
|
|||
|
|
| [`agent-voice-audio`](./packages/agent-voice-audio) | 带 AEC 的 Rust 音频引擎 |
|
|||
|
|
|
|||
|
|
## 许可证
|
|||
|
|
|
|||
|
|
MIT
|