The need to Self-Host Retrieval Augmented Generation
How to self-host retrieval augmented generation (RAG)? We’ve received three such inquiries from German companies in the past month alone. This growing interest highlights an important shift towards data privacy and operational efficiency.
More and more companies want to use large language models (LLMs) like ChatGPT with their internal business data. But how can they do this while maintaining control over their sensitive information?
We came up with an elegant approach: using cloud infrastructure as a blueprint for on-premise setups. A strategy that mitigates risks and optimizes resources.
The Potential
Imagine your company’s internal chat system can answer employees precisely about policy questions. Or augment your customer support team with AI-driven first-level support. Companies using RAG for lead preparation and document generation increase efficiency and satisfaction.
Tap into your most sensitive business data with AI
The Cloud Blueprint: A Strategic Approach
When self-host retrieval augmented generation pipelines, one needs strong processing power. This often means companies need to buy costly hardware in the range of tens of thousands of dollars. In specific graphic cards such as NVIDIA H100s. This is a hefty upfront cost.
Solution
Deploy the RAG setup initially in the cloud, and test with less sensitive data. This way, companies can test and refine their AI solution. The cloud-based pilot serves as a blueprint, that can be later mirrored on-premise. Using highly sensitive data only once we are confident in its effectiveness. This approach reduces financial risks because the company can buy the same GPUs as used in the cloud. It also ensures data privacy and an efficient approach.
Key Components of a self-hosted retrieval augmented generation System
To successfully self-host retrieval augmented generation setups, companies need to manage several critical components:
- Large Language Model: This is the core of the system, processing natural language queries and generating responses (e.g. LLama 3.3 70b).
- Embedding Model: Essential for transforming text into numerical representations, enabling efficient data retrieval (e.g. Nomic Embed Text v1.5)
- Vector Store: A database that stores and retrieves these embeddings, ensuring quick access to relevant information (e.g. Weaviate).
Businesses maintain full control over their data by self-hosting these components. Firms can adhere to privacy regulations while leveraging AI’s full potential.
Mitigating Risks with the Cloud Blueprint
The cloud blueprint approach offers several advantages:
- Cost Efficiency: Avoids the upfront investment in expensive hardware. We can test the system in a flexible, scalable cloud environment.
- Data Privacy: Allows companies to keep sensitive data on their internal system at all times.
- Customization: Provides a tailored solution that fits the unique needs of each organization. We have the ability to adapt and optimize over time.
Steps to Implementing the Cloud Blueprint Approach
- Cloud Pilot Setup: Begin by deploying your RAG system in the cloud, focusing on non-sensitive data to ensure compliance.
- Evaluate and Refine: Use the cloud environment to test system performance. Make necessary adjustments to meet your operational needs.
- Blueprint Creation: Document the cloud setup, creating a detailed blueprint that can be mirrored on-premise.
- On-Premise Transition: Once satisfied with the cloud pilot, replicate the setup on your internal infrastructure. The result is that all components are self-hosted on-premise.
- Continuous Optimization: Re-activate your cloud setup to improve and add future use cases, using it again as a blueprint.
Us
At Dentro, we guide companies through this innovative process. We deliver fast and tangible results. By adopting the cloud blueprint approach, businesses can use RAG while mitigating risk. Contact us to explore how we can help you to self-host retrieval augmented generation systems tailored to your needs.