Chain-of-Thought 还是有用的,特别是数学和逻辑推理类任务。
久念
-
别再往 ChatGPT 里复制粘贴了:构建一个提示词流水线 -
GitHub Copilot 订阅指南:学生和开源贡献者免费支付被拒是什么原因?换了好几张卡都不行。
-
Building AI Agents That Don't Hallucinate: A Practical Guide to Function Calling in 2026
If you've built anything with LLMs in the last year, you've probably hit the same wall everyone does: the model confidently invents a function signature, hallucinates parameter values, or calls the wrong tool entirely. Function calling was supposed to fix this. In practice, it often makes things worse because now your agent is confidently wrong at scale.
Most implementations look like a simple chat.completions.create with tools schema. This works fine for demos. It falls apart in production for three reasons: Schema bloat — you pass 15 tools, the model picks the wrong one. Parameter hallucination — the model invents values that match the type but not the intent. Cascading errors — one bad tool call leads to a chain of incorrect reasoning.
The fix isn't bigger models. It's better architecture.
Pattern 1: Narrow the Tool Space. Never pass all available tools in every turn. Use a two-stage router: first classify intent with a cheap model, then only expose relevant tools. This single pattern reduces wrong-tool errors by 60-70%.
Pattern 2: Structured Outputs as a Hard Constraint. Stop relying on the model to "mostly" return valid JSON. Use structured outputs enforced at the API level with Pydantic models. Constraints reduce hallucination more than prompt engineering does.
Pattern 3: The Validation Sandwich. Every tool call should go through: User Input → Pre-validation → Model → Post-validation → Execution. When validation fails, return the error back to the model as a tool response — models fix their own parameter errors 80% of the time on the second attempt.
Pattern 4: Token Budgeting for Agent Loops. The #1 production failure mode is infinite loops. Hard limits are not a hack — they're a requirement. Any agent system without a maximum iteration count will eventually loop forever on some edge case.
Pattern 5: Multi-Model Orchestration. Different models have different strengths. A practical system uses: small/fast models for intent routing, mid-tier models for tool selection, frontier models for complex planning, and small/fast models for output formatting. This cuts costs by 10-15x with negligible quality loss.
Common Pitfalls: Don't trust tool descriptions alone — add examples. Don't return raw API responses as tool results. Don't chain agents without checkpoints.
Measuring Success: Tool Selection Accuracy, Parameter Validity Rate, Task Completion Rate. If any one drops below 90%, you have a production problem.
The future of AI development isn't prompt engineering. It's system design — constraints, validation, fallbacks, and smart orchestration. Start with narrow tool spaces. Add structured outputs. Build validation layers. Set hard limits. Orchestrate multiple models.
(此帖无评论)