Trie-Based Search Systems (Typeahead)
“The user knows what they want. Your job is to tell them before they finish typing.”
TL;DR
Typeahead systems must return suggestions in under 50ms at millions of QPS. A weighted trie with pre-computed top-K candidates at each node enables O(1) lookups by avoiding runtime subtree traversal. The architecture combines a static trie built offline from query logs with a dynamic trie for real-time trending topics, all served behind CDN caching and prefix-based sharding. Memory optimization through Radix Trees reduces pointer overhead by 60%. For the data structure foundations behind this, see the broader ML System Design series, and for how vector search complements keyword retrieval, see Vector Databases.

1. Problem Statement
Typeahead (or Autocomplete) is the most heavily used feature on the internet.
- Volume: Every keystroke triggers a query. If a user types “iphone case”, that’s 11 requests.
- Latency: Must be < 50ms (Human perception of “instant”).
- Freshness: If “Earthquake in Japan” happens, it must appear in suggestions within minutes.
The System Design: Build a backend for a Google-style search bar that suggests top 10 completions based on prefix.
2. Understanding the Requirements
2.1 The Scale
- DAU: 500 Million.
- Queries: 5 billion searches / day.
- Keystrokes: ~50 billion typeahead requests / day.
- QPS: Peak 1 million QPS. (This requires massive horizontal scaling).
2.2 The Ranking Problem
We don’t just want matching strings.
- Prefix: “ja”
- Candidates: “java”, “japan”, “javascript”, “jacket”.
- We need to return top 5 sorted by
Score.Score = Transformation(Frequency, User_History, Location, Freshness)
3. High-Level Architecture
We need a dedicated Typeahead Service separate from the main Search Service.
[Browser]
| (GET /suggest?q=sys)
v
[Load Balancer] --> [CDN (Caches popular: "fa"->"facebook")]
|
v
[Typeahead Aggregator]
|
+---> [Service Node 1 (A-C)]
+---> [Service Node 2 (D-F)]
+---> [Personalization Service (Reranking)]
4. Component Deep-Dives
4.1 The Data Structure: The Weighted Trie
In RAM, we store a specialized Trie. Each Node contains:
- Children Pointers:
{'a': Node, 'b': Node}. - Top-K Cache: A pre-sorted list of the top 10 most popular completed queries that pass through this node.
Why Pre-compute Top-K?
If we wait until runtime to traverse the whole subtree of “sys” to find “system”, “systolic”… it’s too slow.
We aggregate scores offline. At node “s-y-s”, we store ["system", "sysadmin"] directly.
Time Complexity: O(1) (Pointer chase + Return list).
4.2 Data Sharding
The Dictionary is too big for one machine.
- Option A: Range Partitioning (A-C, D-F)
- Risk: Hotspots. ‘S’ is popular. ‘X’ is quiet.
- Option B: Hash Partitioning
ShardID = Hash(prefix) % 100.- Risk: We can only hash the prefix. If user types “s”, do we query all shards? No. Usually we shard by the first 2-3 characters.
5. Data Flow: The Log Processing Pipeline
How do queries get into the Trie?
- Log Collection: Browser emits “User searched: java” -> Kafka.
- Aggregation (Spark/MapReduce):
- Window: Last 7 days.
- Group By
Query. CountFrequency.
- Trie Building (Offline):
- Worker reads aggregated list.
- Builds the Weighted Trie.
- Serializes to file (
trie_snapshot_v100.bin).
- Deployment:
- Push file to S3.
- Typeahead Servers download and hot-swap memory.
6. Scaling Strategies
6.1 The “Trending” Problem (Real-time)
Offline builds happen weekly. But “Breaking News” needs to appear now. Hybrid Architecture:
- Result = Merge(Static_Trie, Dynamic_Trie).
- Dynamic Trie: Size limited (e.g., 200MB). Stores only high-velocity queries from the last 15 mins (via a Storm/Flink stream).
6.2 Sampling
We don’t log every keystroke. We log the final submission. Or we sample 1% of traffic for analytics.
7. Implementation: A Basic Typeahead Class
class TrieNode:
def __init__(self):
self.children = {}
self.top_candidates = [] # Stores top 5 tuples (score, text)
def insert(self, word, score):
# Insert into cache if score is high enough
self.top_candidates.append((score, word))
self.top_candidates.sort(key=lambda x: -x[0])
self.top_candidates = self.top_candidates[:5]
class TypeaheadSystem:
def __init__(self):
self.root = TrieNode()
def add_query(self, query, score):
node = self.root
for char in query:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
# Crucial: Update the pre-computed top-k at every step
node.insert(query, score)
def suggest(self, prefix):
node = self.root
for char in prefix:
if char not in node.children:
return []
node = node.children[char]
return [text for score, text in node.top_candidates]
# Usage
ta = TypeaheadSystem()
ta.add_query("amazon", 100)
ta.add_query("amazing", 50)
ta.add_query("am", 10)
print(ta.suggest("am"))
# Returns ['amazon', 'amazing', 'am'] sorted by score (because we sorted on insert)
8. Monitoring & Metrics
- Zero Results Rate: % of prefixes that yield no suggestions.
- Keystroke Latency: p99 latency (must be < 50ms).
- Selection Rate: How often users click suggestion #3 vs completing the typing.
- High click rate on #1 = Good Ranking.
9. Failure Modes
- The “Dirty Words” Problem: The algorithm is unbiased. If users search for hate speech, the Trie suggests it.
- Mitigation:: Strict “Blacklist” filter applied during the Offline Build phase.
- Shard Failure: If ‘S’ shard goes down, nobody can search ‘System’.
- Mitigation:: Replica sets (3 replicas per shard).
10. Real-World Case Study: Facebook Search
Facebook uses a system called Unicorn.
It’s a graph-based search engine.
When you type “John”, it doesn’t just look for string “John”.
It looks for: Friends named John > Friends of Friends > Celebrities.
They use Social Graph Context as a ranking signal, not just global frequency.
11. Cost Analysis
Storing 1 Billion queries in a naive Trie is RAM heavy. Optimization: Ternary Search Trees (TST) or Radix Trees. These compress the storage.
- Nodes with single child are merged (
sys->tem). - Reduces RAM/Pointer overhead by 60%.
12. Key Takeaways
- Pre-computation: Never sort at query time. Sort at write time.
- Prefix Sharding: Hashing is messy for ranges. Use prefix-based sharding with careful heat-balancing.
- Browser Caching: The fastest query is the one served from
localStorage. - Privacy: Typeahead leaks intent. Encrypt the channel (HTTPS) and anonymize the logs.
FAQ
How does a typeahead system return results so fast?
Instead of traversing the entire subtree at query time, each trie node stores a pre-sorted list of the top-K most popular completions that pass through it. This means looking up suggestions for any prefix is just a pointer chase to the right node followed by returning the cached list, achieving O(1) time complexity regardless of the total dictionary size.
How do you handle trending searches in a typeahead system?
A hybrid architecture merges a static trie built from weekly offline aggregation with a small dynamic trie (limited to around 200MB) that ingests high-velocity queries from the last 15 minutes via a stream processor like Flink or Storm. The final result is a merge of both sources, ensuring breaking news appears within minutes.
What data structure optimizations reduce trie memory usage?
Ternary Search Trees and Radix Trees compress nodes with single children by merging path segments like “sys” and “tem” into a single node. This eliminates intermediate pointer overhead and reduces RAM consumption by roughly 60%, making it feasible to store billions of queries in memory.
How is a typeahead system sharded across multiple servers?
Prefix-based sharding assigns query ranges to different servers (e.g., A-C on Node 1, D-F on Node 2), though careful heat-balancing is needed since prefixes like “S” see far more traffic than “X”. Hash partitioning on the first 2-3 characters is an alternative that distributes load more evenly but complicates range queries.
Originally published at: arunbaby.com/ml-system-design/0048-trie-based-search
Want to work together?
I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.
Get in touch