API Contracts for VLA Module

Voice-to-Action Interface

Interface Overview

The Voice-to-Action interface handles the conversion of audio input to transcribed text commands. This interface connects audio input sources to the Whisper transcription service.

API Endpoints

POST /api/vla/transcribe

Transcribes audio data using Whisper and returns the transcribed text.

Request:

{
  "audioData": "base64 encoded audio data or file reference",
  "language": "en",
  "model": "base",
  "timeout": 30
}
Response:

json
Copy code
{
  "id": "transcription-12345",
  "transcription": "Hello world",
  "confidence": 0.98,
  "language": "en",
  "processingTime": 1200,
  "status": "success"
}
Error Response:

json
Copy code
{
  "error": "Invalid audio format",
  "code": "INVALID_REQUEST",
  "details": "Audio file must be WAV or MP3"
}
POST /api/vla/realtime-transcribe
Processes audio stream in real-time for continuous transcription.

Request:

Content-Type: application/octet-stream

Transfer-Encoding: chunked

Binary audio stream data

Response (Server-Sent Events):

kotlin
Copy code
data: {"transcription": "partial text", "isFinal": false, "timestamp": "2023-01-01T10:00:00Z"}
data: {"transcription": "complete sentence", "isFinal": true, "timestamp": "2023-01-01T10:00:05Z"}
Error Handling
Return error message if audio quality is poor

Implement timeout of 30 seconds for processing

Provide confidence scores for transcriptions

LLM Cognitive Planning Interface
POST /api/vla/plan
Request:

json
Copy code
{
  "command": "Move the robot to point A",
  "context": {
    "robotPosition": [0, 0, 0],
    "environmentMap": "map-1",
    "objectLocations": {},
    "robotCapabilities": ["navigation", "manipulation"],
    "constraints": {
      "safeZones": [[0,0,10,10]],
      "forbiddenAreas": [[5,5,6,6]]
    }
  },
  "timeout": 60
}
Response:

json
Copy code
{
  "id": "plan-12345",
  "command": "Move the robot to point A",
  "actionSequence": [
    {
      "id": "action-1",
      "type": "navigation",
      "action": "move",
      "parameters": {"destination": [1, 2, 0]},
      "dependencies": [],
      "timeout": 30
    }
  ],
  "contextUsed": {
    "robotPosition": [0, 0, 0],
    "environmentMap": "map-1"
  },
  "confidence": 0.95,
  "processingTime": 500,
  "status": "success"
}
POST /api/vla/plan/clarify
Request:

json
Copy code
{
  "command": "Move to the location",
  "context": {
    "robotPosition": [0, 0, 0],
    "environmentMap": "map-1"
  }
}
Response:

json
Copy code
{
  "clarificationNeeded": "Which specific location should the robot move to?",
  "options": ["Point A", "Point B"],
  "suggestedRephrasing": "Move to Point A or Point B"
}
ROS2 Action Execution Interface
POST /api/vla/execute
Request:

json
Copy code
{
  "actionSequence": [
    {
      "id": "action-1",
      "type": "navigation",
      "action": "move",
      "parameters": {"destination": [1,2,0]},
      "timeout": 30
    }
  ],
  "executionOptions": {
    "continueOnError": false,
    "maxRetries": 3,
    "safetyCheck": true
  }
}
Response:

json
Copy code
{
  "executionId": "exec-12345",
  "status": "executing",
  "estimatedCompletionTime": 30,
  "actionResults": [
    {
      "actionId": "action-1",
      "status": "pending",
      "result": null,
      "executionTime": null
    }
  ]
}
GET /api/vla/execute/{executionId}
Response:

json
Copy code
{
  "executionId": "exec-12345",
  "overallStatus": "executing",
  "currentActionIndex": 0,
  "actionResults": [
    {
      "actionId": "action-1",
      "status": "pending",
      "result": null,
      "executionTime": null
    }
  ],
  "progress": 0,
  "estimatedRemainingTime": 30
}
POST /api/vla/execute/{executionId}/cancel
Response:

json
Copy code
{
  "executionId": "exec-12345",
  "status": "cancelled",
  "actionResults": []
}
Common Error Responses
400 Bad Request
json
Copy code
{
  "error": "Invalid request format or parameters",
  "code": "INVALID_REQUEST",
  "details": "Specific details about what was invalid"
}
408 Request Timeout
json
Copy code
{
  "error": "Request timed out",
  "code": "REQUEST_TIMEOUT",
  "details": "Operation did not complete within the specified timeout"
}
500 Internal Server Error
json
Copy code
{
  "error": "Internal server error",
  "code": "INTERNAL_ERROR",
  "details": "Additional details about the error"
}
503 Service Unavailable
json
Copy code
{
  "error": "Service temporarily unavailable",
  "code": "SERVICE_UNAVAILABLE",
  "details": "Service is temporarily down for maintenance or overload"
}
Authentication and Authorization
All API endpoints require authentication using API keys passed in the Authorization header:

makefile
Copy code
Authorization: Bearer YOUR_API_KEY
Rate Limiting
API endpoints are subject to rate limiting:

Transcription endpoints: 10 requests per minute

Planning endpoints: 5 requests per minute

Execution endpoints: 2 requests per minute

Rate limit responses include the following headers:

X-RateLimit-Limit: Maximum requests allowed

X-RateLimit-Remaining: Remaining requests in the current window

X-RateLimit-Reset: Time when the rate limit resets

Voice-to-Action Interface​

Interface Overview​

API Endpoints​

POST /api/vla/transcribe​

Voice-to-Action Interface

Interface Overview

API Endpoints

POST /api/vla/transcribe