Function Calling Done Right: Tool Schemas, Validation & Recovery
How to design LLM tool/function schemas that actually work in production. Strict schemas, parallel calls, error recovery, multi-step reasoning, and the failure modes nobody warns you about.
Why Function Calling is Harder Than the Docs Suggest
- 20+ parameters with complex types
- Required vs optional fields
- Mutually exclusive arguments
- Side effects you can't roll back
- Multi-step dependencies
If you don't design for these, your AI silently does the wrong thing.
Strict Schemas with Pydantic
from pydantic import BaseModel, Field, model_validator
from enum import Enumclass AppointmentType(str, Enum): in_person = "in_person" telehealth = "telehealth"
class CreateAppointment(BaseModel): """Create a new patient appointment.""" patient_id: str = Field(description="UUID of the patient") provider_id: str = Field(description="UUID of the provider") starts_at: datetime = Field(description="ISO 8601 timestamp in UTC") duration_minutes: int = Field(ge=15, le=120, description="15-120 minutes") type: AppointmentType reason: str = Field(min_length=10, max_length=500) insurance_id: str | None = None
@model_validator(mode='after') def must_be_future(self): if self.starts_at < datetime.utcnow(): raise ValueError("starts_at must be in the future") return self ```
Generating the Tool Schema
import jsondef tool_schema(model: type[BaseModel]) -> dict: return { "name": model.__name__, "description": model.__doc__, "input_schema": model.model_json_schema(), }
tools = [tool_schema(CreateAppointment), tool_schema(CancelAppointment)] ```
The Execution Loop
async def run_agent(user_message: str, max_steps: int = 10):
messages = [{"role": "user", "content": user_message}]for step in range(max_steps): response = await anthropic.messages.create( model="claude-sonnet-4-6", tools=tools, messages=messages, )
if response.stop_reason == "end_turn": return response.content[0].text
# Tool use — could be multiple in parallel tool_results = [] for block in response.content: if block.type != "tool_use": continue try: args = TOOL_MAP[block.name](**block.input) # Pydantic validates result = await execute_tool(block.name, args) tool_results.append({"tool_use_id": block.id, "content": str(result)}) except ValidationError as e: tool_results.append({"tool_use_id": block.id, "is_error": True, "content": f"Validation failed: {e}"})
messages.append({"role": "assistant", "content": response.content}) messages.append({"role": "user", "content": tool_results})
raise RuntimeError(f"Exceeded {max_steps} steps") ```
Failure Modes to Test
- 1Hallucinated arguments — IDs that don't exist. Validate against your DB before executing.
- 2Wrong tool selection — Add few-shot examples in your system prompt for ambiguous cases.
- 3Infinite loops — Always set max_steps. Log every step. Alert on loops.
- 4Parallel side effects — If the LLM proposes 5 tools in parallel, are they safe to run concurrently?
- 5Partial failure — If tool 3 of 5 fails, what's the recovery story?
Tooling Anti-Patterns
- Tools with > 8 parameters → split into multiple tools
- Boolean parameters with unclear meaning → use enums
- Tools that return free-text instead of structured data
- Tools that mutate state without an idempotency key