Dilip Singh logo
All posts
AI ArchitectureIntermediate2026-05-12·10 min read

Function Calling Done Right: Tool Schemas, Validation & Recovery

How to design LLM tool/function schemas that actually work in production. Strict schemas, parallel calls, error recovery, multi-step reasoning, and the failure modes nobody warns you about.

Why Function Calling is Harder Than the Docs Suggest

  • 20+ parameters with complex types
  • Required vs optional fields
  • Mutually exclusive arguments
  • Side effects you can't roll back
  • Multi-step dependencies

If you don't design for these, your AI silently does the wrong thing.

Strict Schemas with Pydantic

python
from pydantic import BaseModel, Field, model_validator
from enum import Enum

class AppointmentType(str, Enum): in_person = "in_person" telehealth = "telehealth"

class CreateAppointment(BaseModel): """Create a new patient appointment.""" patient_id: str = Field(description="UUID of the patient") provider_id: str = Field(description="UUID of the provider") starts_at: datetime = Field(description="ISO 8601 timestamp in UTC") duration_minutes: int = Field(ge=15, le=120, description="15-120 minutes") type: AppointmentType reason: str = Field(min_length=10, max_length=500) insurance_id: str | None = None

@model_validator(mode='after') def must_be_future(self): if self.starts_at < datetime.utcnow(): raise ValueError("starts_at must be in the future") return self ```

Generating the Tool Schema

python
import json

def tool_schema(model: type[BaseModel]) -> dict: return { "name": model.__name__, "description": model.__doc__, "input_schema": model.model_json_schema(), }

tools = [tool_schema(CreateAppointment), tool_schema(CancelAppointment)] ```

The Execution Loop

python
async def run_agent(user_message: str, max_steps: int = 10):
    messages = [{"role": "user", "content": user_message}]

for step in range(max_steps): response = await anthropic.messages.create( model="claude-sonnet-4-6", tools=tools, messages=messages, )

if response.stop_reason == "end_turn": return response.content[0].text

# Tool use — could be multiple in parallel tool_results = [] for block in response.content: if block.type != "tool_use": continue try: args = TOOL_MAP[block.name](**block.input) # Pydantic validates result = await execute_tool(block.name, args) tool_results.append({"tool_use_id": block.id, "content": str(result)}) except ValidationError as e: tool_results.append({"tool_use_id": block.id, "is_error": True, "content": f"Validation failed: {e}"})

messages.append({"role": "assistant", "content": response.content}) messages.append({"role": "user", "content": tool_results})

raise RuntimeError(f"Exceeded {max_steps} steps") ```

Failure Modes to Test

  1. 1Hallucinated arguments — IDs that don't exist. Validate against your DB before executing.
  2. 2Wrong tool selection — Add few-shot examples in your system prompt for ambiguous cases.
  3. 3Infinite loops — Always set max_steps. Log every step. Alert on loops.
  4. 4Parallel side effects — If the LLM proposes 5 tools in parallel, are they safe to run concurrently?
  5. 5Partial failure — If tool 3 of 5 fails, what's the recovery story?

Tooling Anti-Patterns

  • Tools with > 8 parameters → split into multiple tools
  • Boolean parameters with unclear meaning → use enums
  • Tools that return free-text instead of structured data
  • Tools that mutate state without an idempotency key
DS
Dilip Singh
Lead Software Architect · Hureka Technologies

14+ years building enterprise software and AI systems. Architecting multi-agent AI platforms, RAG pipelines, voice AI, and high-performance SaaS for global clients.