Branko (@brankopetric00): "Vector databases explained for people who just want to understand. You have 10,000 product descriptions. User searches for "comfortable outdoor furniture." Traditional database: - Searches for exact word matches - Finds products containing "comfortable" OR "outdoor" OR "furniture" - Misses "cozy patio seating" even though it's the same thing - Keyword matching is stupid Vector database approach: - Convert search into numbers representing meaning: [0.2, 0.8, 0.1, 0.9, ...] - Convert every product description to similar numbers - Find products with similar number patterns - Returns "cozy patio seating" because the numbers are close - Meaning matching is smart How it works: Step 1: Turn text into vectors (arrays of numbers) - "comfortable chair" becomes [0.2, 0.7, 0.1, 0.4, ...] - "cozy seat" becomes [0.3, 0.8, 0.2, 0.5, ...] - Similar meanings = similar numbers - Uses AI models like OpenAI embeddings Step 2: Store vectors efficiently - Traditional database: Stores text - Vector database: Stores arrays of numbers per item - Indexes them for fast similarity search - Optimized for "find similar" not "find exact" Step 3: Search by similarity - User query: "outdoor furniture" - Convert to vector: [0.3, 0.6, 0.2, 0.8, ...] - Find closest vectors using math (cosine similarity) - Returns items ranked by similarity score Use cases: - Product search that understands intent - Documentation search that finds relevant answers - Recommendation engines - Chatbots that find similar questions - Anomaly detection Popular vector databases: - Pinecone: Managed, easy, expensive - Weaviate: Open source, feature-rich - Milvus: Fast, scalable, complex - pgvector: Postgres extension, simple - Qdrant: Fast, Rust-based Controversial take: You don't need a vector database for most projects. Start with Postgres + pgvector extension. Vector databases are great for scale. For under 1 million vectors, your regular database with a vector extension works fine." | ab4n

Branko

@brankopetric00

Oct 26

Vector databases explained for people who just want to understand. You have 10,000 product descriptions. User searches for "comfortable outdoor furniture." Traditional database: - Searches for exact word matches - Finds products containing "comfortable" OR "outdoor" OR "furniture" - Misses "cozy patio seating" even though it's the same thing - Keyword matching is stupid Vector database approach: - Convert search into numbers representing meaning: [0.2, 0.8, 0.1, 0.9, ...] - Convert every product description to similar numbers - Find products with similar number patterns - Returns "cozy patio seating" because the numbers are close - Meaning matching is smart How it works: Step 1: Turn text into vectors (arrays of numbers) - "comfortable chair" becomes [0.2, 0.7, 0.1, 0.4, ...] - "cozy seat" becomes [0.3, 0.8, 0.2, 0.5, ...] - Similar meanings = similar numbers - Uses AI models like OpenAI embeddings Step 2: Store vectors efficiently - Traditional database: Stores text - Vector database: Stores arrays of numbers per item - Indexes them for fast similarity search - Optimized for "find similar" not "find exact" Step 3: Search by similarity - User query: "outdoor furniture" - Convert to vector: [0.3, 0.6, 0.2, 0.8, ...] - Find closest vectors using math (cosine similarity) - Returns items ranked by similarity score Use cases: - Product search that understands intent - Documentation search that finds relevant answers - Recommendation engines - Chatbots that find similar questions - Anomaly detection Popular vector databases: - Pinecone: Managed, easy, expensive - Weaviate: Open source, feature-rich - Milvus: Fast, scalable, complex - pgvector: Postgres extension, simple - Qdrant: Fast, Rust-based Controversial take: You don't need a vector database for most projects. Start with Postgres + pgvector extension. Vector databases are great for scale. For under 1 million vectors, your regular database with a vector extension works fine.

Oct 26, 2025 · 9:29 PM UTC

1,642

mark

@mrkmcknz

Oct 27

Replying to @brankopetric00

Postgres + pgvector is good enough until you can afford to hire someone to own the replacement.

1

Russ Finney

@rfinney

Oct 27

Replying to @brankopetric00

Helpful...

1

Dermot McGrath

@dermotmcg

Oct 28

Replying to @brankopetric00

Great explanation. It’s like the semantic search we were promised

1

Alex Cloudstar

@alexcloudstar

Oct 27

Replying to @brankopetric00

Love how you broke this down, Branko. It’s like teaching databases to speak human. Makes tackling complex tech a bit less daunting!

1

Varun Doshi @Varunx10

Oct 27

Replying to @brankopetric00

What is the process by which text is converted to vectors?

Steve Friedl

@SJFriedl

Oct 26

Replying to @brankopetric00

> Convert search into numbers representing meaning: [0.2, 0.8, 0.1, 0.9, ...] This feels like an important step that's hand-waved around. How does it work? A human curates a word-to-number database?

24

Mike

@mk_fsd_xai

Oct 27

Replying to @brankopetric00

great summary! one thing about step 3: Step 3: User query: "outdoor furniture" - Convert to vector: [0.3, 0.6, 0.2, 0.8, ...] One thing worth mentioning, you must use the same model as you used when creating the embeddings. So if a new model or version comes out, you can't just all of the sudden start running the user queries through that. Also I think hybrid approaches are good approach to get the best of both worlds - ie combining semantic and keyword search. Use an LLM that has the tools to execute a keyword search and combine it with the vector semantic search.

21

William Estoque

@westoque

Oct 27

Replying to @brankopetric00

you forgot the training step. someone actually needs to associate those word meanings that are close with each other and that is no easy feat. what makes a better search is how precise the training data is. arguably this is where a lot of the competition is now.

13

DataSpeeder

@DataSpeeder

Oct 27

Replying to @brankopetric00

What is populating the vector database, and how does it determine that "cozy" and "comfortable" are close?

5

Marco Jiménez @MAJG_CR

Oct 27

Replying to @brankopetric00

We do use Postgrest+PGVector for a RAG focused on a Archival System we built. It performs really good.

2

essaym @essaymnet

Oct 27

Replying to @brankopetric00

This is why I hate buying from Amazon... It never returns the product I want.

4

Shelby Heinecke

@shelbyh_ai

Oct 27

Replying to @brankopetric00

Nice overview. Now you can go deep on each step to optimize - which embedding model best, which vector DB, which search algo, etc.

3

alok upadhyay

@alok_flies

Oct 27

Replying to @brankopetric00

Nicely done, Branko. A great follow on could be on pushing the asymptote on accurate deep representation of a real world entity into vectors. Show the gen-pop on how thats hard.

2

_Miguel @_miguelho

Oct 27

Replying to @brankopetric00

Nicely put Branko.

2

.CitiZen.

@CitiZenSleuthX

Oct 27

Replying to @brankopetric00

Just use pg

1

Nikhil Bollineni

@bollineni1234

Oct 27

Replying to @brankopetric00

One of the easiest explanation that I have seen...

1

Thanh Doan

@leodoan_

Oct 27

Replying to @brankopetric00

this example nails it. i'd add that the embedding model choice totally changes what you actually find, been burned by that more than once

1

kyzooghost @kyzookyzoo

Oct 27

Replying to @brankopetric00

Pretty easy to get to 1M+ vectors I was doing a toy web scraping -> vector db project, and it got there in 1-2 months

1

ahmed.codes @urzahmed

Oct 27

Replying to @brankopetric00

Small account like me doesn't get enough attention I also wrote something about vectordb long ago how it works. give constructive feedback here

ahmed.codes @urzahmed

Aug 19

Vector Database Explained with blog link 💡 Spotify knows your mood. YouTube predicts your guilty pleasures. The secret? Vector Databases. Not magic → just math + embeddings. Let’s break down how they work 🧵👇

1

African Talent Plug - ATP 🔌 @jayanaman

Oct 27

Replying to @brankopetric00

@dre_kwasi pgvector?

1

Sergey Ionov @progsdi

Oct 27

Replying to @brankopetric00

Missing keywords for readers: k-Nearest Neighbors for vector search, Approximate Nearest Neighbors and Local Coordinate Coding.

1

Mic @RecordingBlogs

Oct 27

Replying to @brankopetric00

In the 1990s, I worked on similar things - statistical matching. It became a really interesting problem when you constrain it (e.g., multiple matches, but anything could be matched only once and everything has to be matched) - an optimisation without a deterministic solution

1

Jeremy Smith @jeremyis

Oct 26

Replying to @brankopetric00

i sure hope you have answers as i have questions: why does it break at 1M vectors? also, how do hnsw indices scale and perform given they are graphs right? will more data lead to recall loss and slower growth?

1

Subba Reddy @PostPCEra

Oct 27

Replying to @brankopetric00

>How it works: Step 1: Turn text into vectors - "comfortable chair" becomes [0.2, 0.7, 0.1, ..] - "cozy seat" becomes [0.3, 0.8, 0.2, ..] -Similar meanings = similar numbers -Uses AI model like OpenAI embeddings -Vector database: Store &Indexes them for fast similarity search

1

Min Chon Chi @MinChonChiSF

Oct 27

Replying to @brankopetric00

Exact match searches miss so many valuable results.

1

VibigStick

@VibigStick

Oct 27

Replying to @brankopetric00

Did Endeca used it for like decades? It's called by weight

EJ Campbell

@ejc3

Oct 27

Replying to @brankopetric00

Don’t even bother with pgvector. You need full text search anyway, so just have LLM conduct a few searches to find what the user is looking for. Way easier to reason about and much better out of box ranking functions. Many such things where trad search >> vector search.

Shelly

@swninetails

Oct 27

Replying to @brankopetric00

👏👏👏

Dinesh 🧑🏻‍💻

@10xDinesh

Oct 27

Replying to @brankopetric00

Didn't know qdrant was rust-based. nice

jkpelaez

@jkpelaez

Oct 27

Replying to @brankopetric00

Agree, you can use pgvector, even if you have multi tenant based on schemas . Here : medium.com/23blocks/using-pg…

Using PGVector with a multitenant Ruby on Rails App.

This is a short one. When using the pgvector extension to enable vector searches on Ruby On Rail Applications, if the application is also…

Alp

@alpniks

Oct 27

Replying to @brankopetric00

For most projects, MongoDB and Supabase have great vector database support too!

Sagar Dua 🇮🇳

@SagarDua297

Oct 27

Replying to @brankopetric00

Autonomous Database 26ai is also a Vector database with in-built support for dedicated VECTOR data type.

Carl Camilleri

@carlcamilleri

Oct 27

Replying to @brankopetric00

“Convert search into numbers” and “turn text into numbers” - how computationally expensive are these steps? Does it remain feasible for a fast-changing dataset, compared to traditional Lucene-based lexical search?

kkiran

@kkiran

Oct 27

Replying to @brankopetric00

I always look for no-nonsense people to follow and glad I found a gem. Thank you for the byte-sized info that can readily be used. Followed!

André Pulcherio

@andrepul

Oct 27

Replying to @brankopetric00

Great primer for laymen!

Dhruv

@importdhruv

Oct 28

Replying to @brankopetric00

@grok please summarize for people who wants the summary