Simulation Test Environment for Voice Commands

Overview

This document describes the simulation environment for testing voice command processing and robot action execution. The simulation provides a safe and controlled environment for students to test the VLA pipeline without requiring physical hardware.

Simulation Architecture

Gazebo Environment Setup

The simulation environment uses Gazebo as the primary physics simulator with the following components:

Robot Model: Humanoid robot model with appropriate sensors and actuators
Environment: Indoor environment with rooms, furniture, and objects
Sensors: Camera, LIDAR, and audio simulation capabilities
Control Interface: ROS2 bridge for command execution

Required Components

Robot Model Configuration

<!-- Example robot configuration for Gazebo -->
<sdf version="1.6">
  <model name="humanoid_robot">
    <link name="base_link">
      <!-- Robot base with collision and visual properties -->
    </link>

    <joint name="camera_joint" type="fixed">
      <parent>base_link</parent>
      <child>camera_link</child>
    </joint>

    <link name="camera_link">
      <sensor name="camera" type="camera">
        <!-- Camera sensor configuration -->
      </sensor>
    </link>

    <joint name="lidar_joint" type="fixed">
      <parent>base_link</parent>
      <child>lidar_link</child>
    </joint>

    <link name="lidar_link">
      <sensor name="lidar" type="ray">
        <!-- LIDAR sensor configuration -->
      </sensor>
    </link>
  </model>
</sdf>

World Configuration

<!-- Example world configuration -->
<sdf version="1.6">
  <world name="vla_test_world">
    <!-- Physics engine configuration -->
    <physics type="ode">
      <max_step_size>0.001</max_step_size>
      <real_time_factor>1.0</real_time_factor>
    </physics>

    <!-- Environment with rooms and objects -->
    <include>
      <uri>model://kitchen</uri>
      <pose>0 0 0 0 0 0</pose>
    </include>

    <include>
      <uri>model://living_room</uri>
      <pose>5 0 0 0 0 0</pose>
    </include>

    <model name="red_cup">
      <pose>2 1 0.8 0 0 0</pose>
      <link name="cup_link">
        <visual name="visual">
          <geometry>
            <cylinder>
              <radius>0.05</radius>
              <length>0.1</length>
            </cylinder>
          </geometry>
        </visual>
      </link>
    </model>
  </world>
</sdf>

Setting Up the Simulation Environment

Prerequisites

ROS2 Installation: Ensure ROS2 (Humble Hawksbill or later) is installed
Gazebo Garden: Install Gazebo Garden for simulation
Navigation2: Install Navigation2 stack for path planning
Robot Models: Download or create appropriate robot models

Installation Steps

Install ROS2 and Gazebo:

# Follow official ROS2 installation guide for your platform
# Install Gazebo Garden
sudo apt install ros-humble-gazebo-ros-pkgs

Install Navigation2:

sudo apt install ros-humble-navigation2 ros-humble-nav2-bringup

Set up workspace:

mkdir -p ~/vla_ws/src
cd ~/vla_ws
colcon build
source install/setup.bash

Launch the simulation environment:

# Launch Gazebo with the test world
ros2 launch vla_simulation bringup.launch.py

# In another terminal, start the voice command processing
ros2 run vla_pipeline voice_processor

Testing Basic Commands

Simple Movement Commands

Test Case 1: Move Forward

Command: "Move forward 1 meter"
Expected Behavior: Robot moves forward approximately 1 meter
Validation: Check robot position before and after command

# Example test script for move forward
import rclpy
from geometry_msgs.msg import Twist
from std_msgs.msg import String
import time

def test_move_forward():
    rclpy.init()
    node = rclpy.create_node('test_move_forward')

    # Publisher for robot movement
    cmd_vel_pub = node.create_publisher(Twist, '/cmd_vel', 10)

    # Send forward movement command
    msg = Twist()
    msg.linear.x = 0.5  # 0.5 m/s forward
    msg.angular.z = 0.0

    # Move for 2 seconds (should move ~1 meter at 0.5 m/s)
    start_time = time.time()
    while time.time() - start_time < 2.0:
        cmd_vel_pub.publish(msg)
        time.sleep(0.1)

    # Stop the robot
    msg.linear.x = 0.0
    cmd_vel_pub.publish(msg)

    node.destroy_node()
    rclpy.shutdown()

Test Case 2: Turn Left

Command: "Turn left 90 degrees"
Expected Behavior: Robot rotates left by approximately 90 degrees
Validation: Check robot orientation before and after command

Test Case 3: Go to Location

Command: "Go to the kitchen"
Expected Behavior: Robot navigates to the kitchen area
Validation: Check if robot reaches the target location

# Example test script for navigation
import rclpy
from nav2_msgs.action import NavigateToPose
from rclpy.action import ActionClient
from geometry_msgs.msg import PoseStamped
import time

def test_navigation():
    rclpy.init()
    node = rclpy.create_node('test_navigation')

    # Create action client for navigation
    action_client = ActionClient(node, NavigateToPose, 'navigate_to_pose')

    # Wait for action server
    action_client.wait_for_server()

    # Create goal for navigation to kitchen
    goal_msg = NavigateToPose.Goal()
    goal_msg.pose.header.frame_id = 'map'
    goal_msg.pose.pose.position.x = 2.0  # Kitchen x-coordinate
    goal_msg.pose.pose.position.y = 1.0  # Kitchen y-coordinate
    goal_msg.pose.pose.position.z = 0.0
    goal_msg.pose.pose.orientation.w = 1.0

    # Send goal
    future = action_client.send_goal_async(goal_msg)

    # Wait for result
    rclpy.spin_until_future_complete(node, future)

    node.destroy_node()
    rclpy.shutdown()

Testing Complex Commands

Multi-step Commands

Test Case 4: Navigate and Manipulate

Command: "Go to the kitchen and pick up the red cup"
Expected Behavior:
- Robot navigates to kitchen
- Detects red cup
- Approaches and grasps the cup
Validation: Check if both navigation and manipulation succeed

# Example test script for multi-step command
import rclpy
from vla_pipeline.command_executor import CommandExecutor
import time

def test_multistep_command():
    rclpy.init()
    node = rclpy.create_node('test_multistep')

    # Initialize command executor
    executor = CommandExecutor(node)

    # Define complex command sequence
    command_sequence = [
        {
            "type": "navigation",
            "action": "navigate_to",
            "parameters": {"location": "kitchen"}
        },
        {
            "type": "perception",
            "action": "detect_object",
            "parameters": {"object": "red_cup"}
        },
        {
            "type": "manipulation",
            "action": "pick_up",
            "parameters": {"object": "red_cup"}
        }
    ]

    # Execute sequence
    results = executor.execute_sequence(command_sequence)

    # Validate results
    success = all(result["status"] == "success" for result in results)
    print(f"Multi-step command success: {success}")

    node.destroy_node()
    rclpy.shutdown()

Audio Simulation and Testing

Simulating Audio Input

Since we're in a simulation environment, we need to simulate audio input for testing:

# Audio simulation for testing
import wave
import numpy as np
import struct

class AudioSimulator:
    def __init__(self):
        self.sample_rate = 16000
        self.channels = 1
        self.sample_width = 2  # 16-bit

    def generate_test_audio(self, text_command: str, filename: str):
        """
        Generate simulated audio file for a text command
        In a real implementation, this would use text-to-speech
        """
        # For simulation purposes, we'll create a simple placeholder
        # In practice, you would use a TTS system to generate actual audio

        # Create a simple tone pattern to simulate speech
        duration = len(text_command) * 0.1  # Approximate duration
        frames = int(duration * self.sample_rate)

        # Generate simple waveform (in practice, use TTS)
        wave_data = []
        for i in range(frames):
            # Create simple oscillating pattern
            value = int(10000 * np.sin(2 * np.pi * 440 * i / self.sample_rate))
            wave_data.append(struct.pack('<h', value))

        # Write to WAV file
        with wave.open(filename, 'w') as wav_file:
            wav_file.setnchannels(self.channels)
            wav_file.setsampwidth(self.sample_width)
            wav_file.setframerate(self.sample_rate)
            wav_file.writeframes(b''.join(wave_data))

        print(f"Generated test audio for: '{text_command}' -> {filename}")
        return filename

# Example usage
simulator = AudioSimulator()
audio_file = simulator.generate_test_audio("Move forward 2 meters", "test_command.wav")

Performance Testing

Success Rate Testing

def test_success_rate():
    """Test the success rate of voice command processing"""
    test_commands = [
        ("Move forward", 0.95),  # Expected high success rate
        ("Turn left", 0.95),
        ("Go to the kitchen", 0.90),  # Slightly lower due to complexity
        ("Pick up the red cup", 0.85),  # Lower due to manipulation complexity
        ("Navigate to the living room and turn on the lamp", 0.80)  # Multi-step complexity
    ]

    total_tests = len(test_commands)
    successful_tests = 0

    for command_text, expected_success_rate in test_commands:
        # Simulate processing the command multiple times
        successes = 0
        for _ in range(20):  # Test each command 20 times
            result = simulate_command_processing(command_text)
            if result["success"]:
                successes += 1

        actual_success_rate = successes / 20
        print(f"Command: '{command_text}'")
        print(f"  Expected: {expected_success_rate:.2f}, Actual: {actual_success_rate:.2f}")

        # Count as successful if within 0.1 of expected rate
        if abs(actual_success_rate - expected_success_rate) <= 0.1:
            successful_tests += 1

    overall_success_rate = successful_tests / total_tests
    print(f"\nOverall success rate: {overall_success_rate:.2f} ({successful_tests}/{total_tests})")

    # Validate against 90% requirement
    if overall_success_rate >= 0.90:
        print("✅ Success rate requirement (90%) met")
        return True
    else:
        print("❌ Success rate requirement (90%) not met")
        return False

Latency Testing

def test_processing_latency():
    """Test the latency of voice command processing"""
    import time

    test_commands = [
        "Move forward",
        "Turn left",
        "Go to the kitchen"
    ]

    latencies = []

    for command in test_commands:
        start_time = time.time()

        # Simulate full processing pipeline
        result = simulate_full_pipeline(command)

        end_time = time.time()
        latency = (end_time - start_time) * 1000  # Convert to milliseconds
        latencies.append(latency)

        print(f"Command: '{command}', Latency: {latency:.2f}ms")

    avg_latency = sum(latencies) / len(latencies)
    max_latency = max(latencies)

    print(f"\nAverage latency: {avg_latency:.2f}ms")
    print(f"Maximum latency: {max_latency:.2f}ms")

    # Validate against performance requirements
    # Requirement: < 2 seconds for voice-to-text, < 5 seconds for planning
    if avg_latency < 7000:  # 7 seconds (2+5 for both phases)
        print("✅ Latency requirements met")
        return True
    else:
        print("❌ Latency requirements not met")
        return False

Troubleshooting Guide

Common Issues and Solutions

Issue 1: Audio Not Detected

Symptoms: Voice commands not being processed
Solutions:
- Check audio input device configuration
- Verify microphone permissions
- Test with pre-recorded audio files

Issue 2: Low Transcription Accuracy

Symptoms: Commands misinterpreted frequently
Solutions:
- Improve audio quality (reduce background noise)
- Adjust Whisper model settings
- Use higher quality Whisper model (medium/large instead of base)

Symptoms: Robot fails to reach destinations
Solutions:
- Verify map accuracy
- Check navigation parameters
- Ensure proper localization

Issue 4: Manipulation Failures

Symptoms: Robot fails to grasp objects
Solutions:
- Verify object detection accuracy
- Check robot kinematics
- Adjust manipulation parameters

Debugging Commands

For debugging purposes, the simulation environment supports special commands:

debug show_map: Display the current navigation map
debug show_objects: List detected objects in the environment
debug reset_position: Reset robot to starting position
debug log_level [level]: Set logging level (debug, info, warn, error)

Validation Criteria

Success Metrics

The simulation test environment validates against these criteria:

90% Success Rate: Basic voice commands should succeed 90% of the time
Sub-10 Second Response: Voice-to-action should complete within 10 seconds
Robust Error Handling: Clear feedback for ambiguous or impossible commands
Safety Compliance: No unsafe robot behaviors during execution

Test Scenarios

The environment includes predefined test scenarios:

Basic Navigation Test: Simple movement and navigation commands
Object Manipulation Test: Pick-and-place operations
Multi-step Task Test: Complex commands with multiple actions
Error Recovery Test: Handling of failed commands and recovery
Stress Test: Continuous command processing over extended periods

This simulation test environment provides a comprehensive framework for validating the VLA pipeline with voice commands in a safe, reproducible setting.

Simulation Test Environment for Voice Commands

Overview

Simulation Architecture

Gazebo Environment Setup

Required Components

Robot Model Configuration

World Configuration

Setting Up the Simulation Environment

Prerequisites

Installation Steps

Testing Basic Commands

Simple Movement Commands

Test Case 1: Move Forward

Test Case 2: Turn Left

Navigation Commands

Test Case 3: Go to Location

Testing Complex Commands

Multi-step Commands

Test Case 4: Navigate and Manipulate

Audio Simulation and Testing

Simulating Audio Input

Performance Testing

Success Rate Testing

Latency Testing

Troubleshooting Guide

Common Issues and Solutions

Issue 1: Audio Not Detected

Issue 2: Low Transcription Accuracy

Issue 3: Navigation Failures

Issue 4: Manipulation Failures

Debugging Commands

Validation Criteria

Success Metrics

Test Scenarios

Overview​

Simulation Architecture​

Gazebo Environment Setup​

Required Components​

Robot Model Configuration​

World Configuration​

Setting Up the Simulation Environment​

Prerequisites​

Installation Steps​

Testing Basic Commands​

Simple Movement Commands​

Test Case 1: Move Forward​

Test Case 2: Turn Left​

Navigation Commands​

Test Case 3: Go to Location​

Testing Complex Commands​

Multi-step Commands​

Test Case 4: Navigate and Manipulate​

Audio Simulation and Testing​

Simulating Audio Input​

Performance Testing​

Success Rate Testing​

Latency Testing​

Troubleshooting Guide​

Common Issues and Solutions​

Issue 1: Audio Not Detected​

Issue 2: Low Transcription Accuracy​

Issue 3: Navigation Failures​

Issue 4: Manipulation Failures​

Debugging Commands​

Validation Criteria​

Success Metrics​

Test Scenarios​

Overview

Simulation Architecture

Gazebo Environment Setup

Required Components

Robot Model Configuration

World Configuration

Setting Up the Simulation Environment

Prerequisites

Installation Steps

Testing Basic Commands

Simple Movement Commands

Test Case 1: Move Forward

Test Case 2: Turn Left

Navigation Commands

Test Case 3: Go to Location

Testing Complex Commands

Multi-step Commands

Test Case 4: Navigate and Manipulate

Audio Simulation and Testing

Simulating Audio Input

Performance Testing

Success Rate Testing

Latency Testing

Troubleshooting Guide

Common Issues and Solutions

Issue 1: Audio Not Detected

Issue 2: Low Transcription Accuracy

Issue 3: Navigation Failures

Issue 4: Manipulation Failures

Debugging Commands

Validation Criteria

Success Metrics

Test Scenarios