Arturo Nereu - MongoDB
Embeddings
DC Comics / Dave Gibbons / John Higgins
RETRIEVAL
AUGMENTED
GENERATION
Enhances LLMs' knowledge by providing up-to-date or domain-specific expertise that wasn't in their original training data.
INGEST DATA
STORE DATA
CHUNK DATA
GENERATE EMBEDDINGS
PERFORM SEMANTIC SEARCH
PROVIDE A RESPONSE
https://github.com/ArturoNereu/AI-Study-Group
{
"books": [
{
"_id": "ObjectId",
"title": "string", // Book title
"author": "string", // Author name
"review": "string", // Personal review/summary
"link": "string" // Purchase link
}
]
}
{
"_id": "67f4a74759d7b45f2e180317",
"title": "AI Engineering: Building Applications with Foundation Models",
"author": "Chip Huyen",
"review": "If you feel lost and don't know where to start, ...",
"link": "https://www.oreilly.com/library/view/ai-engineering/9781098166298/"
}
> pip3 install pymongo
Why MongoDB?
import json
from pymongo import MongoClient
import os
def store_data():
# MongoDB connection (use environment variable for security)
connection_string = os.getenv('MONGODB_CONNECTION_STRING')
client = MongoClient(connection_string)
db = client['books_db']
collection = db['ai_books']
# Load books from JSON
with open('books.json', 'r') as f:
books = json.load(f)
# Clear existing data and insert new books
collection.delete_many({})
result = collection.insert_many(books)
print(f"✅ Successfully stored {len(result.inserted_ids)} books")
client.close()
{
"title": "Deep Learning - A Visual Approach",
"author": "Andrew Glassner",
"review": "Probably the best resource out there for building solid intuition about the many
concepts surrounding deep learning. Andrew, the author, did a wonderful job illustrating
these concepts, making it much easier to develop a real understanding of them.",
"link": "https://www.glassner.com/portfolio/deep-learning-a-visual-approach/"
}
TL;DR. If you only have limited time to learn Artificial Intelligence, here's what I recommend:
📘 Read this book: AI Engineering: Building Applications with Foundation Models
🎥 Watch this video: Deep Dive into LLMs like ChatGPT
🧠 Follow this course: 🤗 Agents Course
If you want more (and there's a lot more) keep reading.
Why this repo exists. Learning often feels like walking down a road that forks every few meters; you're always exploring, never really arriving. And that's the beauty of it.
When I was working in games, people would ask me: "How do I learn to make games?" My answer was always: "Pick a game, and build it, learn the tools and concepts along the way." I've taken the same approach with AI.
This repository is a collection of the material I've used (and continue to use) to learn AI: books, courses, papers, tools, models, datasets, and notes. It's not a curriculum, it's more like a journal. One that's helped me build, get stuck, and keep going.
Do I know AI? Not really. But I'm learning, building, and having a great time doing it.
I hope something in here is useful to you too. And if you have suggestions or feedback, I'd love to hear it.
TL;DR. If you only have limited time to learn Artificial Intelligence, here's what I recommend: 📘 Read this book: AI Engineering: Building Applications with Foundation Models 🎥 Watch this video: Deep Dive into LLMs like ChatGPT 🧠 Follow this course: 🤗 Agents Course If you want more (and there's a lot more) keep reading. Why this repo exists. Learning often feels like walking down a road that forks every few meters; you're always exploring, never really arriving. And that's the beauty of it. When I was working in games, people would ask me: "How do I learn to make games?" My answer was always: "Pick a game, and build it, learn the tools and concepts along the way." I've taken the same approach with AI. This repository is a collection of the material I've used (and continue to use) to learn AI: books, courses, papers, tools, models, datasets, and notes. It's not a curriculum, it's more like a journal. One that's helped me build, get stuck, and keep going. Do I know AI? Not really. But I'm learning, building, and having a great time doing it. I hope something in here is useful to you too. And if you have suggestions or feedback, I'd love to hear it.
Numerical representation of data.
triangle = [0.0, 1.0, -1.0]
square = [1.0, 0.0, -1.0]
circle = [1.0, 1.0, 0.0]
Embedding Models
> pip3 voyageai
import voyageai
voyage_client = voyageai.Client(api_key=os.getenv("VOYAGE_API_KEY"))
# Generate embeddings for each book review
for book in books:
try:
# Generate embedding using Voyage AI client
result = voyage_client.embed(
texts=[book['review']],
model='voyage-3'
)
embedding = result.embeddings[0]
print(f"Generated embedding (dim: {len(embedding)})")
except Exception as e:
print(f"Failed to generate embedding: {e}")
# Update the book document with embedding
for i, book in enumerate(books, 1):
print(f"Processing book {i}/{len(books)}: {book['title']}")
try:
# Generate embedding using Voyage AI client
result = voyage_client.embed(
texts=[book['review']],
model='voyage-3'
)
embedding = result.embeddings[0]
# Update the book document with embedding
collection.update_one(
{'_id': book['_id']},
{'$set': {'embedding': embedding}}
)
print(f" ✅ Generated embedding (dim: {len(embedding)})")
except Exception as e:
print(f" ❌ Failed to generate embedding: {e}")
import voyageai
def semantic_search(query, top_k=3):
# Example query: "I want to learn the very basics of AI"
voyage_api_key = os.getenv('VOYAGE_API_KEY')
# ...
result = voyage_client.embed(
texts=[query],
model='voyage-3'
)
query_embedding = result.embeddings[0]
def semantic_search(query, top_k=3):
# ...
pipeline = [
{
"$vectorSearch": {
"index": "vector_index", # Name of your vector search index
"path": "embedding", # Field containing the embeddings
"queryVector": query_embedding,
"numCandidates": 3, # Number of candidates to consider
"limit": top_k # Number of results to return
}
},
{
"$project": {
"title": 1, "author": 1, "review": 1, "link": 1,
"score": {"$meta": "vectorSearchScore"}
}
}
]
results = list(collection.aggregate(pipeline))
{
"fields": [
{
"numDimensions": 1024,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
}
]
}
> pip3 install openai
from openai import OpenAI
def provide_response(query):
"""
Generate final AI-powered book recommendation using OpenAI
"""
# Check OpenAI API key
openai_api_key = os.getenv('OPENAI_API_KEY')
if not openai_api_key:
print("❌ Please set OPENAI_API_KEY environment variable")
return
# Initialize OpenAI client
client = OpenAI(api_key=openai_api_key)
# Step 1: Get search results from vector database (from previous step)
search_results = get_search_results(query)
def provide_response(query):
#...
context = ""
for i, book in enumerate(search_results, 1):
context += f"{i}. {book['title']} by {book['author']}\n"
context += f" Review: {book['review']}\n"
context += f" Link: {book['link']}\n\n"
prompt = f"""You are an AI book recommendation assistant specializing in AI and machine learning books.
User Query: {query}
Based on the following relevant books from our database:
{context}
Please provide a helpful recommendation response that:
1. Addresses the user's specific query
2. Recommends the most suitable books from the list above
3. Explains why each book is relevant to their needs
4. Provides a brief summary of what they can expect from each recommendation
5. Suggests a reading order if applicable
Keep your response conversational and helpful."""
def provide_response(query):
#...
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful AI book recommendation assistant."},
{"role": "user", "content": prompt}
],
max_tokens=800,
temperature=0.7
)
ai_response = response.choices[0].message.content
return ai_response
except Exception as e:
print(f"❌ Error generating AI response: {e}")
return None
Arturo Nereu - MongoDB
@ArturoNereu