Noise Triggers Tool Dropping

Models systematically drop enhancement features when processing semantic noise

90%

Clean requests
use 4 tools

67%

Poem noise requests
use 4 tools

14/27

4-tool requests
dropped to 3

🔍 The Pattern

When models encounter semantic noise, they maintain the core workflow (Search → Check → Reserve) but systematically drop "nice-to-have" features like:

Getting detailed restaurant information
Reading reviews
Sending confirmation emails

This is statistically significant (p=0.028) and shows models prioritize essential tasks under cognitive load.

🎯 Why This Matters

This demonstrates that LLMs have an implicit task hierarchy. When processing becomes more complex due to noise, they shed non-essential features while preserving core functionality - just like humans under stress.