On large catalog and marketplace sites, search is not a nice-to-have feature; it is the primary navigation layer. When you have hundreds of thousands or millions of products, categories, filters and sellers, the way you design your search infrastructure directly affects conversion rate, average order value and even SEO. Slow or irrelevant results mean users give up quickly, while fast, relevant search with rich filters makes your site feel as solid as leading global marketplaces.
In this article, we will walk through how to design search infrastructure for large catalog and marketplace sites using Elasticsearch or OpenSearch, how to size VPS and server resources realistically, and which hosting architectures make sense at different stages of growth. We will also share practical sizing examples from real-world projects at dchost.com and show how to evolve from a single VPS to a dedicated search cluster or even colocation, without drama and without overpaying from day one.
İçindekiler
- 1 Why Search Infrastructure Matters for Large Catalogs and Marketplaces
- 2 Elasticsearch vs OpenSearch for Marketplace Search
- 3 Designing Indexes for Large Catalogs and Marketplaces
- 4 Sizing VPS and Servers for Elasticsearch/OpenSearch
- 5 Hosting Architecture Choices: VPS, Dedicated or Colocation?
- 6 Operations: Monitoring, Backups, and Scaling Your Search Cluster
- 7 Step‑By‑Step Rollout Plan for Teams
- 8 Bringing It All Together for Your Marketplace at dchost.com
Why Search Infrastructure Matters for Large Catalogs and Marketplaces
On small e‑commerce sites, you can sometimes get away with basic database LIKE queries or simple full‑text indexes. Large catalogs and marketplaces are very different. Here is what changes once your catalog grows:
- Query volume and complexity explode: Users combine keyword search with multiple filters, price ranges, availability, brands, attributes and sorting.
- Freshness becomes critical: Inventory, prices and promotions must be reflected in search results within seconds or minutes, not hours.
- Relevance expectations rise: Users expect typo tolerance, synonyms (“tee” vs “t‑shirt”), localized results, and smart ranking by popularity and conversion.
- SEO depends on fast faceted navigation: Category and filter pages generated via search must load quickly to satisfy Core Web Vitals and search engine crawlers.
Relational databases are excellent for transactions but less ideal for complex free‑text search, scoring and aggregations at scale. That is why most serious marketplaces offload product discovery to a dedicated search engine like Elasticsearch or OpenSearch, and keep the main database focused on orders, users and transactional integrity. The challenge is doing this in a way that fits your current scale and budget, while leaving a clear path to grow.
Elasticsearch vs OpenSearch for Marketplace Search
Elasticsearch and OpenSearch are both distributed search and analytics engines built on Apache Lucene. For most catalog and marketplace scenarios, they feel very similar in day‑to‑day use: you index JSON documents and query them with a rich DSL (Domain Specific Language) that supports full‑text search, filters, aggregations and sorting.
Core Concepts You Must Understand
Before deciding on hosting and sizing, it helps to understand a few basic concepts that drive resource usage:
- Index: A logical collection of documents (e.g. products_en, products_de). Each index is split into shards.
- Document: A single JSON record (e.g. one product or one offer).
- Shard: A partition of an index stored on a node. More shards mean more parallelism, but also more overhead.
- Replica: A copy of a shard stored on a different node for high availability and extra read throughput.
- Mapping: The schema for your documents (field types, analyzers, nested fields, etc.).
For large catalogs, the way you choose shard counts, replicas and mappings determines how much CPU, RAM and disk you actually need on your VPS or dedicated servers.
Features Marketplaces Typically Need
Most marketplaces and large catalogs end up using a similar set of search features:
- Autocomplete and suggestions: Prefix search and completion suggesters for fast typeahead.
- Typo tolerance and fuzzy matching: Handling minor spelling errors (“iphon” → “iphone”).
- Faceted filtering: Instant filters for categories, brands, attributes, price ranges and availability.
- Sorting and boosting: By price, popularity, rating, recency, or promotion flags.
- Language‑aware analyzers: Stemming, stop words and tokenization per language.
- Aggregations and analytics: For reporting (top brands, price distributions, etc.).
Both Elasticsearch and OpenSearch handle these well. The bigger questions are: how you model your data, how you size the underlying servers, and how you design a hosting architecture that can evolve.
Designing Indexes for Large Catalogs and Marketplaces
Good index design often saves more hardware than any fancy optimization. Poor index design can make even powerful servers struggle.
Single vs Multiple Indexes
For a typical marketplace, you will usually have at least:
- One index per language (e.g. products_en, products_fr) so you can use language‑specific analyzers.
- Optionally, separate indexes for products and offers/listings if many sellers share the same product.
- Sometimes a separate index for categories and content (blog, guides) if you want unified search.
For small and mid‑sized catalogs, a single index per language is simpler and fast enough. You can keep most fields (title, description, attributes) in that index and use filters for brand, seller, category and availability.
Multi‑Tenant Design for Marketplaces
If you run a marketplace with thousands of sellers, you usually have two options:
- Single shared index: Add a seller_id field and filter by it. This is the most common pattern and scales well if your documents are reasonably small.
- One index per seller: Only works if you have a small number of large sellers. Otherwise you quickly end up with too many small indexes, which adds overhead for the cluster.
In most cases, we recommend a shared index with a seller_id field and careful use of filters and routing. That keeps your shard count under control and simplifies capacity planning on your VPS or dedicated servers.
Handling Variants, Prices and Inventory
Product variants (size, color, packaging) and offers (multiple sellers per product) are often what bloat documents and slow queries. There are three typical patterns:
- Flattened variants: Keep a simplified set of variant attributes (e.g. min/max price, available colors) on the main product document. Good for fast category and search listings.
- Nested documents: Use nested fields for variants or offers when you need precise filtering (e.g. show only sizes that are actually in stock). More accurate, but heavier on memory and CPU.
- Separate offer index: Keep products and offers in separate indexes and join them in your application. This offloads some complexity from search but increases application logic.
Each choice affects how many documents you store, how big they are, and how heavy your queries become. When we work with clients at dchost.com, we usually prototype queries on a staging cluster first and then decide index structure based on actual latency and resource profiles.
Synonyms, Analyzers and Relevance
Marketplace search rarely works well with default analyzers. You almost always need:
- Custom analyzers per language (lowercasing, stop words, stemming) matched to your content.
- Synonym lists for common variants (“t‑shirt” vs “tee”, “tv” vs “television”).
- Keyword fields for exact matches (SKUs, brand codes, model numbers).
- Boosts for high‑margin products, promoted brands or better‑converting items.
These decisions do not just impact relevance; they also influence index size and memory usage. More fields and more complex analyzers increase the size of your index and the working set in RAM, which affects how you size your VPS or dedicated servers.
Sizing VPS and Servers for Elasticsearch/OpenSearch
Elasticsearch and OpenSearch are memory‑hungry and IO‑sensitive. For a comfortable production setup, especially for large catalogs, you should treat them as first‑class workloads, not as a side process on an overloaded web server.
CPU, RAM and Disk Basics
A few practical rules of thumb for sizing:
- RAM: Search engines love RAM. You typically allocate up to 50% of system RAM to the JVM heap (e.g. 8 GB heap on a 16 GB server) and rely on the OS page cache for Lucene segments. Many medium marketplaces start at 16–32 GB RAM per search node.
- CPU: Search and indexing are CPU‑intensive, especially with complex analyzers and aggregations. 4 vCPUs is a minimum for small setups; 8–16 vCPUs per node is common as you grow.
- Disk: You want fast SSD or preferably NVMe storage with good IOPS and low latency. Spinning disks are usually a bottleneck. Our NVMe VPS hosting guide explains in detail why NVMe makes such a difference for search‑heavy workloads.
Also remember to reserve headroom. If you expect your index to be 200 GB, planning for 400–500 GB of disk on the data nodes is more realistic once you factor in segment merges, snapshots and growth.
Small to Medium Catalog Sizing Examples
Let us look at realistic scenarios we often see at dchost.com. Numbers below assume optimized mappings and decent query design; poorly designed queries can easily double or triple resource needs.
Scenario 1: Early‑Stage Marketplace MVP
- Up to ~200,000 products, 1–2 languages.
- Peak search rate ~10–20 queries per second.
- Moderate use of filters and autocomplete.
Typical starting point:
- One dedicated VPS for search: 4 vCPU, 8–16 GB RAM, NVMe storage (200–400 GB).
- Application and database can run on a separate VPS.
- 1 index per language, 3–5 primary shards, 1 replica (for small MVP, you may start with 1 node/no replica but plan for 2–3‑node cluster soon).
You can co‑locate search and application on the same VPS in the very beginning, but as traffic or catalog size grows, separating them is one of the best upgrades you can make.
Scenario 2: Growing Marketplace
- 500,000–2,000,000 products, 2–4 languages.
- Peak search rate ~50–150 queries per second.
- Heavy use of faceted filters and aggregations.
Typical architecture:
- 3‑node search cluster on VPS or dedicated servers.
- Each node: 8 vCPU, 32 GB RAM, NVMe (500 GB–1 TB).
- Dedicated master+data nodes for simplicity (all nodes hold data, 1 replica).
At this stage, you should definitely keep search on its own servers, separate from application and primary database. This is also where careful capacity planning for vCPU, RAM and IOPS becomes critical. Our article on choosing VPS specs for WooCommerce, Laravel and Node.js uses a similar sizing mindset and is worth reading alongside this one.
Scenario 3: Established Marketplace with Heavy Traffic
- Millions of products, multiple languages.
- Peak search rate in the hundreds or thousands of queries per second.
- Advanced personalization, recommendation and reporting queries.
Typical architecture:
- 3–6+ dedicated search nodes, sometimes combined with separate master nodes.
- Each node: 16–32 vCPU, 64–128 GB RAM, NVMe or enterprise SSD with high IOPS.
- Possibility of tiered storage (hot/warm indices) and dedicated reporting nodes.
At this scale, you are usually deciding between large high‑end VPS plans, dedicated servers, or even colocation of your own hardware in our data centers, depending on your budget, compliance needs and in‑house expertise.
Right‑Sizing, Not Over‑Paying
It is very easy to over‑ or under‑spec search servers. Under‑spec and you fight constant timeouts; over‑spec and you burn budget on idle resources. The key is to:
- Measure current query rate, index size and latency.
- Run realistic load tests on staging.
- Plan for 2–3x headroom for peaks and growth.
We covered this philosophy in our guide on cutting hosting costs by right‑sizing VPS, bandwidth and storage. The same approach applies perfectly to Elasticsearch/OpenSearch clusters.
Hosting Architecture Choices: VPS, Dedicated or Colocation?
Once you have a rough idea of how much CPU, RAM and disk you need, the next question is what kind of hosting architecture to choose. At dchost.com, we usually walk clients through a simple decision tree.
When a VPS Is Enough for Search
VPS is ideal when:
- Your index size is modest (tens of GB, not TB).
- You want quick provisioning and easy scaling (vertical and horizontal).
- You prefer managed virtualization over owning hardware.
For many small and medium marketplaces, a 2–3 node Elasticsearch/OpenSearch cluster on NVMe‑backed VPS instances is a sweet spot between cost, performance and flexibility. You can add nodes, upgrade plans or split roles (master/data) over time without hardware purchases.
When Dedicated Servers Make Sense
Dedicated servers are a good fit when:
- Your index is large (hundreds of GB or more) and IO‑heavy.
- You need predictable performance, pinned CPUs and full control over hardware.
- You want to separate noisy workloads (e.g. heavy analytics, reporting) from customer‑facing traffic.
Our article Dedicated Server vs VPS: Which One Fits Your Business? walks through this comparison in general. For search clusters, the decision often comes down to index size, IO requirements and whether you prefer scaling out with more VPS nodes or scaling up with fewer, more powerful dedicated servers.
When Colocation Becomes Attractive
At very large scales or in regulated environments, some teams prefer to own hardware but still use a professional data center. In that case, colocation services are a strong option. Typical reasons:
- Custom hardware (e.g. very high RAM nodes, special NVMe configurations).
- Regulatory or contractual requirements around asset ownership.
- Long‑term cost optimization once hardware is amortized.
dchost.com can provide rack space, power, cooling, network and remote hands while you control the exact search hardware profile you want.
Separating Search from Application and Database
Even on a VPS, you will eventually want to separate search from your main application/database stack. This is similar to the pattern we described in our article on when to separate database and application servers. The benefits are similar:
- Search spikes no longer slow down checkouts or API responses.
- You can tune OS, JVM and storage just for search.
- Scaling search (add nodes) does not require touching the app/DB stack.
In practice, we often see a three‑tier model emerge: web/API servers, database servers, and search servers — each on their own VPS or dedicated nodes, but all within the same dchost.com data center region to keep latency low.
Server Location and Latency
Search queries are latency‑sensitive. Even 100–150 ms extra round‑trip time is noticeable in autocomplete and filter updates. That is why we recommend keeping search nodes:
- In the same data center region as your web/API servers.
- As close as possible to your primary user base.
For a deeper discussion on how geography affects performance and SEO, take a look at our article Does server location affect SEO and speed?.
Operations: Monitoring, Backups, and Scaling Your Search Cluster
Good infrastructure is not just about initial sizing; it is about keeping the cluster healthy over time.
Monitoring and Alerting
At minimum, you should monitor:
- Cluster health (green/yellow/red).
- Indexing and search latency.
- Heap usage and garbage collection time.
- Disk usage, IO wait and read/write latency.
- Number of shards per node and per index.
If you already use tools like Prometheus and Grafana for your VPS and applications, you can extend them with exporters for Elasticsearch/OpenSearch. We discussed general VPS monitoring patterns in our guide on VPS monitoring and alerts with Prometheus and Grafana; the same approach applies neatly to search clusters.
Backups and Snapshots
Elasticsearch/OpenSearch provide snapshot APIs that let you take incremental backups of indices to external storage. Best practice is:
- Use object storage (S3‑compatible) as the snapshot repository.
- Configure regular automatic snapshots (e.g. hourly or daily).
- Test restoring snapshots to a staging cluster, not just production.
Choosing between block, file and object storage for your environment is an important early decision. Our article Object Storage vs Block Storage vs File Storage explains the trade‑offs in the context of web apps and backups; search snapshots fit right into that picture.
Scaling Strategies
As your marketplace grows, you have several levers:
- Vertical scaling: Upgrade VPS plans or use bigger dedicated servers (more vCPU, RAM, NVMe).
- Horizontal scaling: Add more data nodes and rebalance shards.
- Index optimization: Reduce unnecessary fields, tune mappings, and merge indices when practical.
- Query optimization: Avoid heavy wildcard queries, reduce nested aggregations, precompute expensive metrics.
Often, a round of index and query tuning cuts resource usage dramatically, postponing the need for more hardware. That is why we always recommend load testing and profiling before committing to a big capacity jump.
Zero‑Downtime Reindexing
Large catalogs inevitably need reindexing: new analyzers, improved mappings, or changed document structure. Basic pattern:
- Create a new index (e.g. products_v2) with the new settings.
- Use the reindex API or your ETL pipeline to populate it from the old index or source DB.
- Switch aliases from products_current to the new index.
- Keep the old index around temporarily for rollback.
If your cluster is sized properly and you plan reindex windows carefully, you can do all this without impacting live customers.
Step‑By‑Step Rollout Plan for Teams
To tie everything together, here is a pragmatic rollout plan for adding or improving search on a large catalog or marketplace.
Phase 1: Validate the Model on a Small Cluster
- Start with a modest VPS: 4 vCPU, 8–16 GB RAM, NVMe storage.
- Index a representative subset of your catalog with realistic mappings and analyzers.
- Implement core search features: keyword search, filters, sorting, autocomplete.
- Load test with realistic traffic levels plus some growth room.
At this stage, your goal is to validate relevance and basic performance, not yet to build a final cluster.
Phase 2: Separate Search and Harden for Production
- Move search to its own VPS or small 2–3 node cluster.
- Ensure application and database live on separate servers.
- Set up snapshots to external (object) storage and monitoring/alerts.
- Introduce index aliases so you can reindex without breaking clients.
Here, you are targeting consistent sub‑100 ms search responses under normal peak load, and predictable behavior during campaigns.
Phase 3: Scale Out and Add Redundancy
- Upgrade nodes (more vCPU/RAM/NVMe) or add more nodes to the cluster.
- Fine‑tune shard counts and set 1–2 replicas for high availability.
- Consider separate hot/warm nodes if you have very old data that is rarely queried.
- Introduce dedicated nodes for heavy analytics if needed.
This is where you may switch from purely VPS‑based clusters to a mix of large VPS plans, dedicated servers, or even colocated hardware, depending on your long‑term cost and control preferences.
Phase 4: Continuous Tuning
- Regularly review slow queries and heavy aggregations.
- Optimize analyzers and synonym lists based on real user behavior.
- Adjust index structure as your marketplace model evolves.
- Plan capacity for seasonal peaks and major marketing campaigns.
Our hosting scaling checklist for traffic spikes and big campaigns is a good complement here — many of the same principles apply specifically to search traffic surges.
Bringing It All Together for Your Marketplace at dchost.com
Designing search infrastructure for large catalog and marketplace sites is a balancing act between relevance, performance, cost and operational simplicity. Elasticsearch and OpenSearch give you powerful tools, but the real magic comes from choosing the right index design, sizing your VPS or servers realistically, and building a hosting architecture that can grow with your business.
For early‑stage projects, a well‑tuned NVMe VPS with a single‑node or small cluster can take you surprisingly far. As your catalog, traffic and feature set expand, moving to multi‑node clusters, dedicated servers or colocation inside our data centers lets you keep latency low and availability high, without rewriting your application or search logic.
At dchost.com, we work with teams running everything from lean MVP marketplaces to high‑traffic, multi‑country platforms. If you are planning a new search deployment or struggling with an existing one, we can help you choose the right combination of VPS, dedicated servers or colocation, and design a search topology that fits both your current needs and your growth roadmap. Reach out to our team, share your catalog size, traffic profile and business goals, and we will help you turn search into one of the strongest assets of your marketplace instead of a constant bottleneck.
