3. Error Handling in Agent Tools¶

Domain 1 (18%) — Tool Design & MCP Integration

The three error categories


Transient	Network timeouts, 503s. Handle with automatic retry + backoff inside the tool. Model never sees it.
Permanent	Business rule violations. Return immediately with structured detail. `retryable: false`.
Uncertain state	Write-operation timeout — you don't know if it succeeded. Never mark retryable. "Status unknown. May have been sent. Do not retry."

Danger

Treating all errors identically is the single most common mistake — the model has to guess, producing unpredictable behavior.

Structured error response + isError flag

{
  "isError": true,
  "content": [{
    "type": "text",
    "text": "{ \"error_category\": \"NOT_FOUND\", \"retryable\": false, \"message\": \"Customer not found\", \"customer_explanation\": \"We couldn't locate that account.\" }"
  }]
}

isError: true signals structured failure — Claude receives this as a tool result and adapts
Include error_category + retryable flag for Claude to decide next action
Different from returning bad data with isError: false — always use the flag!

Danger

When a tool throws an exception, the framework presents a generic error to the model, stripping all detail. Always return errors as tool content.

MCP error tiers


Protocol-level errors	Malformed requests, missing required params, invalid JSON-RPC. → JSON-RPC error response. The request itself was invalid.
Application-level errors	API 404, service unavailable, business rule violation. → Tool result with `isError: true`. Tool was invoked correctly but operation failed.