Complex Command Test Scenarios
Overview
This document outlines comprehensive test scenarios for complex natural language commands that require the full cognitive planning and execution capabilities of the VLA pipeline. These scenarios test the system's ability to handle multi-step tasks, ambiguous commands, and complex environmental interactions.
Test Scenario Categories
Category 1: Multi-Step Navigation Tasks
Scenario 1.1: "Go to the kitchen and pick up the red cup"
- Command: "Go to the kitchen and pick up the red cup"
- Expected Actions:
- Navigate to kitchen area
- Locate red cup using perception
- Approach and grasp the red cup
- Environmental Context:
- Kitchen at coordinates [5, 0, 0]
- Red cup at coordinates [5.5, 0.5, 0.8]
- Success Criteria: Robot navigates to kitchen and successfully grasps the red cup
- Validation: Check robot's end-effector position and object attachment
Scenario 1.2: "Move to the living room, find the blue ball, and bring it to me"
- Command: "Move to the living room, find the blue ball, and bring it to me"
- Expected Actions:
- Navigate to living room
- Detect blue ball using perception system
- Approach and grasp the blue ball
- Navigate back to user position
- Release the ball at delivery point
- Environmental Context:
- Living room at coordinates [2, 3, 0]
- Blue ball at coordinates [2.5, 3.5, 0.8]
- User position at coordinates [0, 0, 0]
- Success Criteria: Robot delivers the blue ball to the user
- Validation: Verify object is at user's location
Scenario 1.3: "Go to the office, turn on the lamp, and wait for 30 seconds"
- Command: "Go to the office, turn on the lamp, and wait for 30 seconds"
- Expected Actions:
- Navigate to office
- Locate and activate lamp using interaction system
- Wait for specified duration
- Environmental Context:
- Office at coordinates [-2, 1, 0]
- Lamp device available at coordinates [-1.5, 1.5, 1.0]
- Success Criteria: Lamp is turned on and robot waits for 30 seconds
- Validation: Check lamp status and timing
Category 2: Conditional and Context-Aware Tasks
Scenario 2.1: "If you see the red book, pick it up; otherwise, go to the kitchen"
- Command: "If you see the red book, pick it up; otherwise, go to the kitchen"
- Expected Actions:
- Perform perception to detect red book
- If book detected: approach and grasp it
- If book not detected: navigate to kitchen
- Environmental Context:
- Red book at coordinates [1, 1, 0.8] (for positive case)
- Kitchen at coordinates [5, 0, 0]
- Success Criteria: Robot performs correct action based on book detection
- Validation: Check if robot picked up book OR is in kitchen
Scenario 2.2: "Navigate to the kitchen only if it's not occupied"
- Command: "Navigate to the kitchen only if it's not occupied"
- Expected Actions:
- Check kitchen occupancy using perception
- If not occupied: navigate to kitchen
- If occupied: report inability to proceed
- Environmental Context:
- Kitchen at coordinates [5, 0, 0]
- Occupancy status available through perception system
- Success Criteria: Robot only enters kitchen if unoccupied
- Validation: Verify robot's position and occupancy check
Category 3: Time-Sensitive Tasks
Scenario 3.1: "Go to the charging station before 6 PM"
- Command: "Go to the charging station before 6 PM"
- Expected Actions:
- Check current time
- If before 6 PM: navigate to charging station
- If after 6 PM: report deadline missed
- Environmental Context:
- Charging station at coordinates [8, 0, 0]
- Current time check capability available
- Success Criteria: Robot reaches charging station before deadline OR reports appropriately
- Validation: Check arrival time and robot status
Scenario 3.2: "Monitor the living room for 5 minutes and report any movement"
- Command: "Monitor the living room for 5 minutes and report any movement"
- Expected Actions:
- Navigate to vantage point in living room
- Activate monitoring system
- Detect and log any movement during 5-minute window
- Report findings after timeout
- Environmental Context:
- Living room at coordinates [2, 3, 0]
- Perception system with motion detection capability
- Success Criteria: System monitors for full duration and reports findings
- Validation: Check monitoring logs and duration
Category 4: Complex Manipulation Tasks
Scenario 4.1: "Stack the red block on the blue block"
- Command: "Stack the red block on the blue block"
- Expected Actions:
- Locate red and blue blocks using perception
- Approach and grasp red block
- Navigate to blue block location
- Precisely place red block on top of blue block
- Environmental Context:
- Red block at coordinates [1, 1, 0.5]
- Blue block at coordinates [2, 1, 0.5]
- Manipulation system with precision placement capability
- Success Criteria: Red block is successfully stacked on blue block
- Validation: Verify final positions and stability
Scenario 4.2: "Pour water from the bottle into the glass"
- Command: "Pour water from the bottle into the glass"
- Expected Actions:
- Locate bottle and glass using perception
- Grasp bottle with proper orientation
- Navigate to position above glass
- Execute pouring motion with appropriate control
- Environmental Context:
- Bottle and glass at known coordinates
- Manipulation system with pouring capability
- Success Criteria: Water is transferred from bottle to glass
- Validation: Verify relative positions and simulated liquid transfer
Category 5: Human Interaction Tasks
Scenario 5.1: "Follow me to the garden"
- Command: "Follow me to the garden"
- Expected Actions:
- Activate person-following mode
- Detect and track human movement
- Maintain safe following distance
- Navigate to garden as human leads
- Environmental Context:
- Human detection and tracking capabilities
- Garden destination at coordinates [10, 5, 0]
- Success Criteria: Robot successfully follows human to garden
- Validation: Track robot-to-human distance and final position
Scenario 5.2: "Greet the person in the room and ask for their name"
- Command: "Greet the person in the room and ask for their name"
- Expected Actions:
- Detect person using perception system
- Navigate to appropriate interaction distance
- Execute greeting sequence (verbal and physical)
- Request name using speech system
- Environmental Context:
- Person detection capability
- Speech synthesis and recognition systems
- Success Criteria: Robot greets person and attempts to get their name
- Validation: Check greeting execution and interaction attempt
Test Implementation Framework
Test Case Structure
Each test scenario follows this structure:
class VLATestCase:
def __init__(self, name, command, environment_context, expected_actions):
self.name = name
self.command = command
self.environment_context = environment_context
self.expected_actions = expected_actions
def setup_environment(self):
"""Set up the simulation environment for the test"""
pass
def execute_test(self):
"""Execute the test scenario"""
pass
def validate_results(self):
"""Validate that the results match expectations"""
pass
def cleanup(self):
"""Clean up after the test"""
pass
Test Execution Example
def test_scenario_1_1():
"""Test Scenario 1.1: Go to kitchen and pick up red cup"""
# Define the test case
test_case = VLATestCase(
name="Kitchen Navigation and Cup Pickup",
command="Go to the kitchen and pick up the red cup",
environment_context={
"robot_position": [0, 0, 0],
"object_locations": {
"red_cup": [5.5, 0.5, 0.8],
"kitchen": [5.0, 0.0, 0.0]
},
"robot_capabilities": ["navigation", "manipulation"],
"map_bounds": [-10, -10, 10, 10]
},
expected_actions=[
{"type": "navigation", "action": "navigate_to", "params": {"location": [5.0, 0.0, 0.0]}},
{"type": "perception", "action": "detect_object", "params": {"object": "red_cup"}},
{"type": "manipulation", "action": "pick_up", "params": {"object": "red_cup"}}
]
)
# Setup environment
test_case.setup_environment()
# Execute the VLA pipeline
planner = LLMPlanningModule(api_key="test-key")
plan_result = planner.plan_command(
test_case.command,
PlanContext(**test_case.environment_context)
)
if plan_result["success"]:
executor = ROS2ActionExecutor(simulation_node)
execution_result = executor.execute_action_sequence(
plan_result["plan"]["action_sequence"],
test_case.environment_context["robot_capabilities"],
test_case.environment_context
)
# Validate results
success = test_case.validate_results(execution_result)
return success
else:
print(f"Planning failed: {plan_result['error']}")
return False
# Run the test
result = test_scenario_1_1()
print(f"Test Scenario 1.1 Result: {'PASS' if result else 'FAIL'}")
Success Metrics
Each test scenario is evaluated on:
- Task Completion Rate: Percentage of tasks completed successfully
- Action Accuracy: How closely the executed actions match expected actions
- Time Efficiency: Whether tasks are completed within expected timeframes
- Safety Compliance: Whether safety constraints are respected
- Error Recovery: How well the system handles and recovers from errors
Expected Success Rates
Based on the system requirements:
- Simple Multi-Step Tasks: 85% success rate
- Conditional Tasks: 80% success rate
- Time-Sensitive Tasks: 75% success rate
- Complex Manipulation: 70% success rate
- Human Interaction: 80% success rate
Edge Case Scenarios
Scenario EC-1: Ambiguous Object Reference
- Command: "Pick up the cup" (when multiple cups are present)
- Expected Behavior: System asks for clarification or identifies the closest cup
Scenario EC-2: Impossible Task
- Command: "Go through the wall to get the object"
- Expected Behavior: System explains why the task is impossible and suggests alternatives
Scenario EC-3: Resource Conflict
- Command: "Use the arm to do something while it's already busy"
- Expected Behavior: System manages resource conflicts and queues tasks appropriately
Scenario EC-4: Perception Failure
- Command: "Find the blue object" (when no blue object exists)
- Expected Behavior: System reports inability to find the requested object
Validation Framework
The test scenarios are validated using:
- Automated Checks: Scripted validation of robot positions, object states, and action outcomes
- Performance Metrics: Tracking of success rates, response times, and resource usage
- Manual Verification: Human validation of complex interaction scenarios
- Repeatability Tests: Ensuring consistent results across multiple executions
These comprehensive test scenarios ensure the VLA pipeline can handle complex, real-world commands that require sophisticated cognitive planning and multi-step execution.