Complex Command Test Scenarios

Overview

This document outlines comprehensive test scenarios for complex natural language commands that require the full cognitive planning and execution capabilities of the VLA pipeline. These scenarios test the system's ability to handle multi-step tasks, ambiguous commands, and complex environmental interactions.

Test Scenario Categories

Scenario 1.1: "Go to the kitchen and pick up the red cup"

Command: "Go to the kitchen and pick up the red cup"
Expected Actions:
1. Navigate to kitchen area
2. Locate red cup using perception
3. Approach and grasp the red cup
Environmental Context:
- Kitchen at coordinates [5, 0, 0]
- Red cup at coordinates [5.5, 0.5, 0.8]
Success Criteria: Robot navigates to kitchen and successfully grasps the red cup
Validation: Check robot's end-effector position and object attachment

Scenario 1.2: "Move to the living room, find the blue ball, and bring it to me"

Command: "Move to the living room, find the blue ball, and bring it to me"
Expected Actions:
1. Navigate to living room
2. Detect blue ball using perception system
3. Approach and grasp the blue ball
4. Navigate back to user position
5. Release the ball at delivery point
Environmental Context:
- Living room at coordinates [2, 3, 0]
- Blue ball at coordinates [2.5, 3.5, 0.8]
- User position at coordinates [0, 0, 0]
Success Criteria: Robot delivers the blue ball to the user
Validation: Verify object is at user's location

Scenario 1.3: "Go to the office, turn on the lamp, and wait for 30 seconds"

Command: "Go to the office, turn on the lamp, and wait for 30 seconds"
Expected Actions:
1. Navigate to office
2. Locate and activate lamp using interaction system
3. Wait for specified duration
Environmental Context:
- Office at coordinates [-2, 1, 0]
- Lamp device available at coordinates [-1.5, 1.5, 1.0]
Success Criteria: Lamp is turned on and robot waits for 30 seconds
Validation: Check lamp status and timing

Category 2: Conditional and Context-Aware Tasks

Scenario 2.1: "If you see the red book, pick it up; otherwise, go to the kitchen"

Command: "If you see the red book, pick it up; otherwise, go to the kitchen"
Expected Actions:
1. Perform perception to detect red book
2. If book detected: approach and grasp it
3. If book not detected: navigate to kitchen
Environmental Context:
- Red book at coordinates [1, 1, 0.8] (for positive case)
- Kitchen at coordinates [5, 0, 0]
Success Criteria: Robot performs correct action based on book detection
Validation: Check if robot picked up book OR is in kitchen

Scenario 2.2: "Navigate to the kitchen only if it's not occupied"

Command: "Navigate to the kitchen only if it's not occupied"
Expected Actions:
1. Check kitchen occupancy using perception
2. If not occupied: navigate to kitchen
3. If occupied: report inability to proceed
Environmental Context:
- Kitchen at coordinates [5, 0, 0]
- Occupancy status available through perception system
Success Criteria: Robot only enters kitchen if unoccupied
Validation: Verify robot's position and occupancy check

Category 3: Time-Sensitive Tasks

Scenario 3.1: "Go to the charging station before 6 PM"

Command: "Go to the charging station before 6 PM"
Expected Actions:
1. Check current time
2. If before 6 PM: navigate to charging station
3. If after 6 PM: report deadline missed
Environmental Context:
- Charging station at coordinates [8, 0, 0]
- Current time check capability available
Success Criteria: Robot reaches charging station before deadline OR reports appropriately
Validation: Check arrival time and robot status

Scenario 3.2: "Monitor the living room for 5 minutes and report any movement"

Command: "Monitor the living room for 5 minutes and report any movement"
Expected Actions:
1. Navigate to vantage point in living room
2. Activate monitoring system
3. Detect and log any movement during 5-minute window
4. Report findings after timeout
Environmental Context:
- Living room at coordinates [2, 3, 0]
- Perception system with motion detection capability
Success Criteria: System monitors for full duration and reports findings
Validation: Check monitoring logs and duration

Category 4: Complex Manipulation Tasks

Scenario 4.1: "Stack the red block on the blue block"

Command: "Stack the red block on the blue block"
Expected Actions:
1. Locate red and blue blocks using perception
2. Approach and grasp red block
3. Navigate to blue block location
4. Precisely place red block on top of blue block
Environmental Context:
- Red block at coordinates [1, 1, 0.5]
- Blue block at coordinates [2, 1, 0.5]
- Manipulation system with precision placement capability
Success Criteria: Red block is successfully stacked on blue block
Validation: Verify final positions and stability

Scenario 4.2: "Pour water from the bottle into the glass"

Command: "Pour water from the bottle into the glass"
Expected Actions:
1. Locate bottle and glass using perception
2. Grasp bottle with proper orientation
3. Navigate to position above glass
4. Execute pouring motion with appropriate control
Environmental Context:
- Bottle and glass at known coordinates
- Manipulation system with pouring capability
Success Criteria: Water is transferred from bottle to glass
Validation: Verify relative positions and simulated liquid transfer

Category 5: Human Interaction Tasks

Scenario 5.1: "Follow me to the garden"

Command: "Follow me to the garden"
Expected Actions:
1. Activate person-following mode
2. Detect and track human movement
3. Maintain safe following distance
4. Navigate to garden as human leads
Environmental Context:
- Human detection and tracking capabilities
- Garden destination at coordinates [10, 5, 0]
Success Criteria: Robot successfully follows human to garden
Validation: Track robot-to-human distance and final position

Scenario 5.2: "Greet the person in the room and ask for their name"

Command: "Greet the person in the room and ask for their name"
Expected Actions:
1. Detect person using perception system
2. Navigate to appropriate interaction distance
3. Execute greeting sequence (verbal and physical)
4. Request name using speech system
Environmental Context:
- Person detection capability
- Speech synthesis and recognition systems
Success Criteria: Robot greets person and attempts to get their name
Validation: Check greeting execution and interaction attempt

Test Implementation Framework

Test Case Structure

Each test scenario follows this structure:

class VLATestCase:
    def __init__(self, name, command, environment_context, expected_actions):
        self.name = name
        self.command = command
        self.environment_context = environment_context
        self.expected_actions = expected_actions

    def setup_environment(self):
        """Set up the simulation environment for the test"""
        pass

    def execute_test(self):
        """Execute the test scenario"""
        pass

    def validate_results(self):
        """Validate that the results match expectations"""
        pass

    def cleanup(self):
        """Clean up after the test"""
        pass

Test Execution Example

def test_scenario_1_1():
    """Test Scenario 1.1: Go to kitchen and pick up red cup"""

    # Define the test case
    test_case = VLATestCase(
        name="Kitchen Navigation and Cup Pickup",
        command="Go to the kitchen and pick up the red cup",
        environment_context={
            "robot_position": [0, 0, 0],
            "object_locations": {
                "red_cup": [5.5, 0.5, 0.8],
                "kitchen": [5.0, 0.0, 0.0]
            },
            "robot_capabilities": ["navigation", "manipulation"],
            "map_bounds": [-10, -10, 10, 10]
        },
        expected_actions=[
            {"type": "navigation", "action": "navigate_to", "params": {"location": [5.0, 0.0, 0.0]}},
            {"type": "perception", "action": "detect_object", "params": {"object": "red_cup"}},
            {"type": "manipulation", "action": "pick_up", "params": {"object": "red_cup"}}
        ]
    )

    # Setup environment
    test_case.setup_environment()

    # Execute the VLA pipeline
    planner = LLMPlanningModule(api_key="test-key")
    plan_result = planner.plan_command(
        test_case.command,
        PlanContext(**test_case.environment_context)
    )

    if plan_result["success"]:
        executor = ROS2ActionExecutor(simulation_node)
        execution_result = executor.execute_action_sequence(
            plan_result["plan"]["action_sequence"],
            test_case.environment_context["robot_capabilities"],
            test_case.environment_context
        )

        # Validate results
        success = test_case.validate_results(execution_result)
        return success
    else:
        print(f"Planning failed: {plan_result['error']}")
        return False

# Run the test
result = test_scenario_1_1()
print(f"Test Scenario 1.1 Result: {'PASS' if result else 'FAIL'}")

Success Metrics

Each test scenario is evaluated on:

Task Completion Rate: Percentage of tasks completed successfully
Action Accuracy: How closely the executed actions match expected actions
Time Efficiency: Whether tasks are completed within expected timeframes
Safety Compliance: Whether safety constraints are respected
Error Recovery: How well the system handles and recovers from errors

Expected Success Rates

Based on the system requirements:

Simple Multi-Step Tasks: 85% success rate
Conditional Tasks: 80% success rate
Time-Sensitive Tasks: 75% success rate
Complex Manipulation: 70% success rate
Human Interaction: 80% success rate

Edge Case Scenarios

Scenario EC-1: Ambiguous Object Reference

Command: "Pick up the cup" (when multiple cups are present)
Expected Behavior: System asks for clarification or identifies the closest cup

Scenario EC-2: Impossible Task

Command: "Go through the wall to get the object"
Expected Behavior: System explains why the task is impossible and suggests alternatives

Scenario EC-3: Resource Conflict

Command: "Use the arm to do something while it's already busy"
Expected Behavior: System manages resource conflicts and queues tasks appropriately

Scenario EC-4: Perception Failure

Command: "Find the blue object" (when no blue object exists)
Expected Behavior: System reports inability to find the requested object

Validation Framework

The test scenarios are validated using:

Automated Checks: Scripted validation of robot positions, object states, and action outcomes
Performance Metrics: Tracking of success rates, response times, and resource usage
Manual Verification: Human validation of complex interaction scenarios
Repeatability Tests: Ensuring consistent results across multiple executions

These comprehensive test scenarios ensure the VLA pipeline can handle complex, real-world commands that require sophisticated cognitive planning and multi-step execution.

Overview​

Test Scenario Categories​

Category 1: Multi-Step Navigation Tasks​

Scenario 1.1: "Go to the kitchen and pick up the red cup"​

Scenario 1.2: "Move to the living room, find the blue ball, and bring it to me"​

Scenario 1.3: "Go to the office, turn on the lamp, and wait for 30 seconds"​

Category 2: Conditional and Context-Aware Tasks​

Scenario 2.1: "If you see the red book, pick it up; otherwise, go to the kitchen"​

Scenario 2.2: "Navigate to the kitchen only if it's not occupied"​

Category 3: Time-Sensitive Tasks​

Scenario 3.1: "Go to the charging station before 6 PM"​

Scenario 3.2: "Monitor the living room for 5 minutes and report any movement"​

Category 4: Complex Manipulation Tasks​

Scenario 4.1: "Stack the red block on the blue block"​

Scenario 4.2: "Pour water from the bottle into the glass"​

Category 5: Human Interaction Tasks​

Scenario 5.1: "Follow me to the garden"​

Scenario 5.2: "Greet the person in the room and ask for their name"​

Test Implementation Framework​

Test Case Structure​

Test Execution Example​

Success Metrics​

Expected Success Rates​

Edge Case Scenarios​

Scenario EC-1: Ambiguous Object Reference​

Scenario EC-2: Impossible Task​

Scenario EC-3: Resource Conflict​

Scenario EC-4: Perception Failure​

Validation Framework​