Your LLM app can make a fool of you. Ours did.
Your LLM app can suddenly stop working in production. Ours did.
Your LLM app can run without you knowing how it’s used. Ours did.
You don’t wanna run into these issues. Make your LLM apps reliable by making sure you can debug them, monitor them and have fallbacks.
The Three Pillars of LLM Application Reliability
1. Development Observability: Your AI Development Safety Net
When building LLM applications, you’re often dealing with complex chains of prompts. Especially those using agent frameworks and automated decision-making processes.
The challenge? You’re essentially giving up direct control of the flow, making it crucial to implement proper observability.
Real-world Example:
We created an LLM agent with Mistral that can scrape websites. It gave us amazing results during our tests. The results were so good, we didn’t know from which subpage the agent took its information. We thought “wow we achieved AGI, an agent doing better work than a human”.
Then we checked Langsmith and realized:
Our agent didn’t use any tools, it just hallucinated the tool usage! It invented everything. Lol.
Without proper observability, this hallucination would have gone unnoticed.
Tools for observability show you all important metrics and intermediate steps in a LLM pipeline. E.g. how long the pipeline lasted, which LLM calls occured, which tool calls, what was the metadata sent to the LLMs etc.
We’re big fans of langsmith to debug our agents in development. Langsmith tracking is very easy to set up:
- Create an account on https://langsmith.com
- Get your API Key
- Add 2 environment variables to your project
And boom! You’re ready to troubleshoot the most complicated LLM setups!
Step 1 of 3 done to increase LLM Application Reliability ✅
Tools for Observability:

2. Production Monitoring: Keeping Your AI Apps Healthy
Remember: Risk management is boring until it suddenly becomes very exciting.
Production monitoring is your early warning system for potential issues.
Just today, we experienced a situation where OpenAI didn’t automatically top up our credits. Thanks to our monitoring setup in LangSmith, we quickly detected the surge in errors in our DentroChat application and could immediately address the issue.
Monitoring and observability are very close cousins: one is for development purposes, the other for production.
Finding out why an LLM setup fails in production is difficult if you just have the logs of your application. That’s why we recommend the observability tools also to be used for monitoring in production.
Depending on your app, you might also want to track other metrics. How does the user use the web page, transactional emails, server load etc. But the first step is to have LLM monitoring ready to increase your LLM Application Reliability!
Best Practices for Production Monitoring:
- Track intermediate steps
- Track user inputs/outputs only when you can stay privacy compliant
- Monitor system health metrics
- Analyse your app usage
- Track custom metadata such as user tracking to identify power users
Tools for Monitoring (same as the tools for observability):

3. Implementing Fallbacks: Your Safety Net
With the increasing frequency of model outages (just check status.openai.com for recent examples), implementing fallbacks isn’t optional – it’s essential for LLM Application Reliability.
Example: You have an AI chatbot that uses the Anthropic API under that hood.
One day, the requests to the Anthropic API suddely fail!
There can be many reasons for failed requests:
- Anthropic outage
- failed to pay your Anthropic bill
- model deprecation, i.e. the LLM model that you requested doesn’t exist anymore
- Anthropic API is blocking you due to bad IP address
- Expired API Key
We just took Anthropic here as an example. But we at Dentro had all those issues in the past with various LLM APIs!
What a fallback does is routing your LLM requests to LLM B in case LLM A fails.
If the request to Anthropic fails, it sends the request instead to e.g. OpenAI!
That way your users can be served, even though there’s an issue with a certain model provider.
Fallbacks can also do more things, such as retrying a few times before routing to the backup LLM.
We often use the inbuilt fallback functionality of Langchain at Dentro. But you can also just write custom code to handle failed requests gracefully.
How to Implement Fallbacks:
- Set up multiple model providers (e.g., OpenAI and Anthropic)
- Configure automatic failover in your framework
- Consider using multiple models from the same provider for better compatibility (use an OpenAI model as a fallback for an OpenAI main model)
- Consider using models from different providers for higher fail safety (use an Anthropic model as a fallback for an OpenAI main model)

Taking Action: Your LLM Application Reliability Checklist
Ready to make your LLM applications more reliable? Here’s how to start:
- Assess Your Current Setup
- Can you fully understand what your LLM app is doing?
- Do you have visibility into each step of your AI pipeline?
- Implement Observability
- Sign up for LangSmith if using LangChain
- Or integrate alternative tools like OpenTelemetry or Weights & Biases
- Set Up Production Monitoring
- Configure privacy-compliant tracking
- Set up usage analytics
- Use LLM Fallbacks
- Identify backup model providers
- Configure automatic failover
The Path Forward
Building fail-safe LLM applications isn’t just about preventing failures – it’s about creating LLM Application Reliability that your users can trust. By implementing these three pillars of reliability, you’re not just building better applications. You’re building the foundation for the future of AI development.
Remember: Don’t wait for failure. Act now.