🍏 Basic Retrieval-Augmented Generation (RAG) with AIProjectClient 🍎¶
In this notebook, we'll demonstrate a basic RAG flow using:
azure-ai-projects
(AIProjectClient)azure-ai-inference
(Embeddings, ChatCompletions)azure-ai-search
(for vector or hybrid search)
Our theme is Health & Fitness 🍏 so we’ll create a simple set of health tips, embed them, store them in a search index, then do a query that retrieves relevant tips, and pass them to an LLM to produce a final answer.
Disclaimer: This is not medical advice. For real health questions, consult a professional.
What is RAG?¶
Retrieval-Augmented Generation (RAG) is a technique where the LLM (Large Language Model) uses relevant retrieved text chunks from your data to craft a final answer. This helps ground the model's response in real data, reducing hallucinations.
1. Setup¶
We'll import libraries, load environment variables, and create an AIProjectClient
.
Prerequisites¶
- Python 3.8+
pip install azure-ai-projects azure-ai-inference azure-search-documents azure-identity
.env
with:PROJECT_CONNECTION_STRING=<your-conn-string> MODEL_DEPLOYMENT_NAME=<some-chat-model> SEARCH_INDEX_NAME=<your-search-index>
- An Azure AI Search connection in your project, or any index ready to store embeddings.
- You also have a deployed LLM for chat + an embeddings model deployment (like
text-embedding-ada-002
or any other embedding model).
import os
import time
import json
from dotenv import load_dotenv
# azure-ai-projects
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
# We'll embed with azure-ai-inference
from azure.ai.inference import EmbeddingsClient, ChatCompletionsClient
from azure.ai.inference.models import UserMessage, SystemMessage
# For vector search or hybrid search
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.core.credentials import AzureKeyCredential
load_dotenv()
conn_string = os.environ.get("PROJECT_CONNECTION_STRING")
chat_model = os.environ.get("MODEL_DEPLOYMENT_NAME", "gpt-4o-mini")
search_index_name = os.environ.get("SEARCH_INDEX_NAME", "healthtips-index")
try:
project_client = AIProjectClient.from_connection_string(
credential=DefaultAzureCredential(),
conn_str=conn_string,
)
print("✅ AIProjectClient created successfully!")
except Exception as e:
print("❌ Error creating AIProjectClient:", e)
2. Create Sample Health Data¶
We'll create a few short doc chunks. In a real scenario, you might read from CSV or PDFs, chunk them up, embed them, and store them in your search index.
health_tips = [
{
"id": "doc1",
"content": "Daily 30-minute walks help maintain a healthy weight and reduce stress.",
"source": "General Fitness"
},
{
"id": "doc2",
"content": "Stay hydrated by drinking 8-10 cups of water per day.",
"source": "General Fitness"
},
{
"id": "doc3",
"content": "Consistent sleep patterns (7-9 hours) improve muscle recovery.",
"source": "General Fitness"
},
{
"id": "doc4",
"content": "For cardio endurance, try interval training like HIIT.",
"source": "Workout Advice"
},
{
"id": "doc5",
"content": "Warm up with dynamic stretches before running to reduce injury risk.",
"source": "Workout Advice"
},
{
"id": "doc6",
"content": "Balanced diets typically include protein, whole grains, fruits, vegetables, and healthy fats.",
"source": "Nutrition"
},
]
print("Created a small list of health tips.")
3. Generate Embeddings + Store in Azure Search¶
We'll show a minimal approach:
- Get embeddings client.
- Embed each doc.
- Upsert docs into Azure AI Search index with
embedding
field.
3.1. Connect to Azure Search¶
We'll do so by retrieving the default search connection from the project, then building a SearchClient
from the azure.search.documents
library.
After that, we embed each doc, then upsert into your index.
from azure.ai.projects.models import ConnectionType
# Get the default search connection, with credentials
search_conn = project_client.connections.get_default(
connection_type=ConnectionType.AZURE_AI_SEARCH, include_credentials=True
)
if not search_conn:
raise RuntimeError("❌ No default Azure AI Search connection found!")
search_client = SearchClient(
endpoint=search_conn.endpoint_url,
index_name=search_index_name,
credential=AzureKeyCredential(search_conn.key)
)
print("✅ Created Azure SearchClient.")
# Now create embeddings client
embeddings_client = project_client.inference.get_embeddings_client()
print("✅ Created embeddings client.")
search_docs = []
for doc in health_tips:
# embed doc content
emb_response = embeddings_client.embed(
input=[doc["content"]]
)
emb_vec = emb_response.data[0].embedding
# We'll build a doc with 'id', 'content', 'source', 'embedding'
search_docs.append(
{
"id": doc["id"],
"content": doc["content"],
"source": doc["source"],
"embedding": emb_vec,
}
)
result = search_client.upload_documents(documents=search_docs)
print(f"Uploaded {len(search_docs)} docs to Search index '{search_index_name}'.")
4. Basic RAG Flow¶
4.1. Retrieve¶
When a user queries, we:
- Embed user question.
- Search vector index with that embedding to get top docs.
4.2. Generate answer¶
We then pass the retrieved docs to the chat model.
In a real scenario, you'd have a more advanced approach to chunking & summarizing. We'll keep it simple.
def rag_chat(query: str, top_k: int = 3) -> str:
# 1) Embed user query
user_vec = embeddings_client.embed(input=[query]).data[0].embedding
# 2) Vector search
# We'll assume your index has the vector field named 'embedding',
# using vector search or hybrid. We'll do a minimal example.
results = search_client.search(
search_text="", # for vector search we don't rely on textual search
vector=user_vec,
vector_fields="embedding",
top=top_k
)
# gather the top docs
top_docs_content = []
for r in results:
# Each doc is a SearchResult object with doc: dict.
c = r["content"]
s = r["source"]
top_docs_content.append(f"Source: {s} => {c}")
# 3) Chat with retrieved docs.
# We'll pass a system message instructing the model to use them.
# We'll just do a minimal approach.
system_text = (
"You are a health & fitness assistant.\n"
"Answer user questions using ONLY the text from these docs.\n"
"Docs:\n"
+ "\n".join(top_docs_content)
+ "\nIf unsure, say 'I'm not sure'.\n"
)
with project_client.inference.get_chat_completions_client() as chat_client:
response = chat_client.complete(
model=chat_model,
messages=[
SystemMessage(content=system_text),
UserMessage(content=query)
]
)
return response.choices[0].message.content
5. Try a Query 🎉¶
Let's do a question about cardio for busy people.
user_query = "What's a good short cardio routine for me if I'm busy?"
answer = rag_chat(user_query)
print("🗣️ User Query:", user_query)
print("🤖 RAG Answer:", answer)
6. Conclusion¶
We've demonstrated a basic RAG pipeline with:
- Embedding docs & storing them in Azure AI Search.
- Retrieving top docs for user question.
- Chat with the retrieved docs.
🔎 You can expand this by adding advanced chunking, more robust retrieval, and quality checks. Enjoy your healthy coding! 🍎