AI8 min

Vector database showdown 2026: Pinecone vs Weaviate vs pgvector vs Qdrant

A production comparison of Pinecone, Weaviate, pgvector, and Qdrant in 2026 — pricing, latency, hybrid search, filtering, and when pgvector is good enough.

Most vector database comparisons turn into feature checklists. That is not how the decision actually gets made in production. The real questions are how hybrid search behaves under load, how filtered queries scale when every tenant has its own subset, how much DevOps the team has to spare, and what the bill looks like at 10x today's volume. This is the 2026 head-to-head for Pinecone, Weaviate, pgvector, and Qdrant — with real performance numbers, the pricing patterns that bite, and a straight answer on when pgvector is simply good enough.

The field, in one table

Each of the four represents a different bet on who runs the infrastructure, how hybrid search should work, and where filtering should happen.

DatabaseShapeBest forWatch out for
PineconeFully managed SaaSTeams that want zero ops, serverless scalingFiltered-query latency, per-QPS cost at scale
WeaviateManaged or self-hostedBuilt-in hybrid search, modular architectureMore concepts to learn, ops overhead if self-hosted
pgvectorPostgres extensionAlready running Postgres, under 10M vectorsPerformance degrades past tens of millions
QdrantManaged or self-hosted, Rust-nativeFiltered search performance, cost at scaleSmaller ecosystem than Pinecone

The best vector database is the one the team can operate confidently at the scale the product will actually reach. Benchmarks on a laptop with ten million random vectors tell you almost nothing about how any of these behave at 99% recall under filtered multi-tenant load.

Latency and recall under load

Raw query latency is where the four separate. Published benchmarks in 2026 put all four in the same order of magnitude at modest scale, but divergence appears under filtering, at higher recall thresholds, and on multi-tenant workloads.

Databasep50 latency (1M vectors, 99% recall)Filtered-query costHybrid search
Qdrant~4–5 msLow — filtering is first-classNative (vector + payload + BM25)
pgvector (HNSW)~5–8 msLow when filter matches an indexNative (via tsvector + vector)
Weaviate~20–40 msModerateNative, well-tuned
Pinecone (serverless)~30–80 msHigher — filter evaluation adds latencySupported via sparse-dense vectors

Two things worth flagging before anyone quotes these numbers in a meeting: Pinecone's pod-based tier is meaningfully faster than serverless but significantly more expensive, and every one of these numbers shifts once end-to-end embedding generation gets added. Total search latency including embedding time is typically 200–400 ms regardless of which database sits underneath.

Pricing at scale

Sticker pricing is easy. Real-world economics at 10M vectors, at 100 QPS, with filtering and metadata, look different. A few patterns hold up across client projects:

  • Managed services cost 1.5–3x self-hosted at 10M vectors once sustained QPS is in the hundreds.
  • Pinecone serverless is competitive below 10 QPS and scales expensively above — it is the right pick when the workload is bursty and idle most of the day.
  • Qdrant and Weaviate self-hosted on existing Kubernetes are the cheapest options at scale if the team has the DevOps capacity.
  • pgvector on an existing Postgres cluster is effectively free until index memory becomes the bottleneck.

If the team is already running Postgres and the corpus is under 10M vectors, pgvector is almost certainly good enough. HNSW indexing, filtered queries that hit existing Postgres indexes, and no new infrastructure to operate — the dedicated vector database can wait until measured performance says otherwise.

Hybrid search and filtering

Hybrid search — combining dense vector similarity with keyword BM25 and sometimes rerankers — has become the default for RAG systems that need to handle both fuzzy semantic matches and precise exact-keyword queries. The four databases handle it differently.

  • Weaviate's hybrid search is the most polished out of the box — one query blends vector and BM25 with a tunable alpha.
  • Qdrant pairs vector search with structured payload filters at very low cost and integrates well with external BM25 stages.
  • pgvector gets hybrid for free via tsvector full-text search combined with vector ordering in the same query.
  • Pinecone supports hybrid through sparse-dense vectors, but the plumbing requires a separate sparse-encoder pipeline.

Filtering matters more than raw speed

A multi-tenant SaaS app almost always runs filtered queries — customer ID, document type, date range — and filter performance dominates real-world latency. Qdrant and pgvector (with proper indexing) handle filters cleanly because filters evaluate against indexed structures. Pinecone's pre-filter adds noticeable latency at scale because filtering happens inside the vector index rather than before it.

Operational complexity

The hidden cost of a vector database is the cost of operating it. Index rebuilds, version upgrades, snapshots, monitoring, and capacity planning all take engineering time. The right pick depends on how much of that the team wants to own.

DatabaseOps burdenMulti-tenant isolationBackups and recovery
PineconeNear zeroNamespaces per tenantManaged, limited control
Weaviate CloudLowMulti-tenancy is first-classManaged
pgvectorAlready handled by Postgres opsRow-level security or per-tenant tablesWhatever Postgres backup already does
Qdrant CloudLowCollections or payload filtersManaged, snapshot API available

pgvector at scale

pgvector has crossed the line from 'fine for prototypes' to 'genuinely fine for a lot of production workloads' over the past two years. HNSW indexes build in reasonable time up to tens of millions of vectors; queries are milliseconds once the index fits in shared buffers. The limit is memory, not correctness — HNSW indexes consume 2–5x the memory of IVFFlat, and when the index spills to disk, query latency climbs sharply.

  • Under 10M vectors — pgvector is almost always the right default, especially on teams already running Postgres.
  • 10M–50M vectors — pgvector still works on beefy instances, but the ops cost of tuning Postgres for vector workloads starts to approach the cost of a dedicated database.
  • 50M+ vectors — reach for Qdrant, Weaviate, or Pinecone. pgvector can get there but the margin for error shrinks.

pgvector's HNSW is memory-hungry. Budget at least 1.5x the raw index size in shared memory, and plan for a rebuild window when dimensions or distance metrics change. A forgotten REINDEX on a large table can stall a production database for longer than anyone expects.

How we pick

When our team walks into a new project and the question is which vector database to use, the decision usually falls out of four questions.

  1. Is the team already running Postgres and the corpus under 10M vectors? Use pgvector. Ship it.
  2. Is ops capacity the scarce resource and the workload bursty? Use Pinecone serverless. Pay the premium for zero ops and move on.
  3. Does the workload need first-class hybrid search with a simple API? Use Weaviate. The hybrid implementation is the most polished of the four.
  4. Is filter-heavy multi-tenant search at scale the core workload? Use Qdrant. Filtering is its strongest suit and self-hosted economics are the cheapest at volume.

Key takeaways

  • All four databases are fast enough for most RAG workloads. The decision is driven by ops capacity, filtering patterns, and scale, not raw latency.
  • pgvector is the right default under 10M vectors — simpler, cheaper, and already handled by existing Postgres operations.
  • Pinecone serverless is the right default when the team cannot afford to operate anything and the workload is bursty.
  • Qdrant wins on filtered search and self-hosted cost at scale. Weaviate wins on out-of-the-box hybrid search.
  • Benchmarks on synthetic data mislead. Measure on real traffic with real filters before committing.
#vector-database#pinecone#pgvector#qdrant#weaviate#rag#ai-infrastructure
Working on something similar?

Let's build it together.

We ship production SaaS, marketplaces, and web apps. If you want an engineering partner — not a consultancy — let's talk.