Word Ladder (BFS)
“Transforming ‘cold’ to ‘warm’ one letter at a time.”
TL;DR
Word Ladder is a shortest-path problem on an implicit graph where each word is a node and edges connect words differing by one letter. Standard BFS guarantees the shortest transformation sequence in O(M^2N) time. Bidirectional BFS is the key optimization, reducing search space from b^d to 2b^(d/2) by searching from both start and end words simultaneously. Generate neighbors by trying all 26 characters at each position rather than comparing against the full word list. See also Clone Graph for more BFS graph traversal patterns, or explore graph-based retrieval in ML systems.
1. Problem Statement
A transformation sequence from word beginWord to word endWord using a dictionary wordList is a sequence of words beginWord -> s1 -> s2 -> ... -> sk such that:
- Every adjacent pair of words differs by a single letter.
- Every
sifor1 <= i <= kis inwordList. Note thatbeginWorddoes not need to be inwordList. sk == endWord.
Given two words, beginWord and endWord, and a dictionary wordList, return the number of words in the shortest transformation sequence from beginWord to endWord, or 0 if no such sequence exists.
Example 1:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
Output: 5
Explanation: hit -> hot -> dot -> dog -> cog
Example 2:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
Output: 0
Explanation: The endWord "cog" is not in wordList, therefore no valid transformation sequence exists.
2. BFS Solution (Shortest Path in Unweighted Graph)
Intuition:
- Treat each word as a node in a graph.
- An edge exists between two nodes if they differ by exactly one letter.
- The problem becomes finding the shortest path from
beginWordtoendWord. - BFS (Breadth-First Search) is the standard algorithm for finding the shortest path in an unweighted graph because it explores nodes layer by layer.
Graph Construction Strategy: There are two main ways to find neighbors of a word:
- Compare with all other words: For a word
W, iterate through the entirewordListand check if the difference is 1 character. This takes O(N \cdot M), whereNis the number of words andMis the word length. Doing this for every word in BFS gives O(N^2 \cdot M). This is too slow ifNis large (e.g., 5000 words). - Generate all possible neighbors: For a word
W, change each of its characters to ‘a’ through ‘z’. This generates26 \cdot Mpotential words. Check if each potential word exists inwordSet. This takes O(26 \cdot M \cdot M) (hashing takes O(M)). This is much faster whenNis large butMis small (e.g.,M=5).
Algorithm:
- Convert
wordListto a setwordSetfor O(1) lookups. - Initialize a queue with
(beginWord, 1). The level starts at 1 because the sequence length includes the start word. - Initialize a
visitedset to avoid cycles and redundant processing. - While the queue is not empty:
- Dequeue the current word and its level.
- If the word is
endWord, return the level. - Generate all possible neighbors by changing one character at a time.
- If a neighbor is in
wordSetand not visited, add it to the queue and mark as visited.
from collections import deque
from typing import List
class Solution:
def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
wordSet = set(wordList)
if endWord not in wordSet:
return 0
queue = deque([(beginWord, 1)]) # (current_word, level)
visited = set([beginWord])
while queue:
word, level = queue.popleft()
if word == endWord:
return level
# Generate all possible neighbors
for i in range(len(word)):
original_char = word[i]
for char_code in range(ord('a'), ord('z') + 1):
char = chr(char_code)
if char == original_char:
continue
# Create new word: hit -> ait, bit, ... hat, hbt ...
new_word = word[:i] + char + word[i+1:]
if new_word in wordSet and new_word not in visited:
visited.add(new_word)
queue.append((new_word, level + 1))
return 0
Time Complexity Analysis:
- Preprocessing: Converting list to set takes O(N \cdot M).
- BFS Traversal: In the worst case, we visit every word in the
wordList. - Neighbor Generation: For each word, we iterate through its length
M. For each position, we try 26 characters. Creating the new string takes O(M). Checking existence in the set takes O(M) (average). - Total Complexity: O(N \cdot M^2).
- If
N = 5000andM = 5,N \cdot M^2 = 5000 \cdot 25 = 125,000operations. - Comparing all pairs would be
N^2 \cdot M = 25,000,000 \cdot 5 = 125,000,000. - The generation approach is significantly faster for typical constraints.
Space Complexity:
- O(N \cdot M) to store
wordSet,visitedset, and the queue.
3. Bi-directional BFS (Optimization)
Intuition:
Standard BFS searches a tree that grows exponentially. If the branching factor is b and the distance to the target is d, BFS visits roughly b^d nodes.
Bi-directional BFS runs two simultaneous searches: one from the start and one from the end. They meet in the middle.
- Search 1: Start -> Middle (distance
d/2) - Search 2: End -> Middle (distance
d/2) - Total nodes visited:
b^{d/2} + b^{d/2} = 2 \cdot b^{d/2}. - This is exponentially smaller than
b^d.
Algorithm:
- Maintain two sets:
beginSet(initially{beginWord}) andendSet(initially{endWord}). - Maintain a
visitedset containing all words visited by either search. - In each step, always expand the smaller set. This keeps the search balanced and minimizes the number of generated neighbors.
- For each word in the current set, generate neighbors.
- If a neighbor is found in the opposite set, the two searches have met! Return
level + 1. - Otherwise, if the neighbor is valid (in
wordSetand not visited), add it to the next layer.
class Solution:
def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
wordSet = set(wordList)
if endWord not in wordSet:
return 0
beginSet = {beginWord}
endSet = {endWord}
visited = {beginWord, endWord}
length = 1
while beginSet and endSet:
# Always expand the smaller set to minimize work
if len(beginSet) > len(endSet):
beginSet, endSet = endSet, beginSet
newSet = set()
for word in beginSet:
for i in range(len(word)):
original_char = word[i]
for char_code in range(ord('a'), ord('z') + 1):
char = chr(char_code)
if char == original_char:
continue
new_word = word[:i] + char + word[i+1:]
# If the neighbor is in the opposite set, we connected the paths
if new_word in endSet:
return length + 1
if new_word in wordSet and new_word not in visited:
visited.add(new_word)
newSet.add(new_word)
beginSet = newSet
length += 1
return 0
Performance Comparison:
Imagine a graph where each node has 10 neighbors (b=10) and the shortest path is 6 steps (d=6).
- Standard BFS: Visits
\approx 10^6 = 1,000,000nodes. - Bi-directional BFS: Visits
\approx 2 \times 10^3 = 2,000nodes. - Speedup: 500x faster!
4. Word Ladder II (Return All Shortest Paths)
Problem: Instead of just the length, return all shortest transformation sequences.
Example: hit -> hot -> dot -> dog -> cog AND hit -> hot -> lot -> log -> cog.
Challenge:
- We need to store the path structure.
- Standard BFS only stores the distance.
- Storing full paths in the queue consumes exponential memory.
Optimized Approach:
- BFS for Graph Building: Run BFS from
beginWordto find the shortest distance toendWord. While doing this, build a DAG (Directed Acyclic Graph) or aparentsmap whereparents[child]contains allparentsthat lead tochildwith the shortest distance.- Crucial: Only add edges from level
Lto levelL+1. Do not add edges within the same level or back to previous levels.
- Crucial: Only add edges from level
- DFS for Path Reconstruction: Run DFS (Backtracking) from
endWordback tobeginWordusing theparentsmap to reconstruct all paths.
from collections import defaultdict
class Solution:
def findLadders(self, beginWord: str, endWord: str, wordList: List[str]) -> List[List[str]]:
wordSet = set(wordList)
if endWord not in wordSet:
return []
# BFS initialization
layer = {beginWord}
parents = defaultdict(set) # word -> set of parents
wordSet.discard(beginWord) # Remove start word to avoid cycles
found = False
while layer and not found:
next_layer = set()
# Remove words in current layer from wordSet to prevent visiting them again in future layers
# But allow visiting them multiple times in the *current* layer (for multiple paths)
for word in layer:
if word in wordSet:
wordSet.remove(word)
# Actually, we need to remove words visited in the *current* layer from wordSet *after* processing the layer
# Let's refine the logic:
# 1. Generate next layer
# 2. If endWord found, stop after this layer
# 3. Remove next_layer words from wordSet
current_layer_visited = set()
for word in layer:
for i in range(len(word)):
for char_code in range(ord('a'), ord('z') + 1):
new_word = word[:i] + chr(char_code) + word[i+1:]
if new_word == endWord:
parents[endWord].add(word)
found = True
elif new_word in wordSet:
next_layer.add(new_word)
parents[new_word].add(word)
current_layer_visited.add(new_word)
wordSet -= current_layer_visited
layer = next_layer
if not found:
return []
# DFS Backtracking to reconstruct paths
res = []
def backtrack(current_word, path):
if current_word == beginWord:
res.append(path[::-1]) # Reverse path to get begin -> end
return
for parent in parents[current_word]:
path.append(parent)
backtrack(parent, path)
path.pop()
backtrack(endWord, [endWord])
return res
Complexity of Word Ladder II:
- Time: The number of shortest paths can be exponential. In the worst case, we might traverse a huge number of paths. However, the BFS part is still polynomial. The DFS part depends on the output size.
- Space: Storing the
parentsmap is proportional to the number of edges in the shortest-path DAG.
5. Deep Dive: Pre-processing for Faster Neighbor Generation
If wordList is very sparse (e.g., words are 10 chars long, but only 1000 words exist), the 26 \cdot M generation strategy might generate many invalid words.
We can pre-process the dictionary using a wildcard map.
Concept:
- Word:
hot - Intermediate states:
*ot,h*t,ho* - Map:
*ot->[hot, dot, lot]h*t->[hot, hit, hat]ho*->[hot, how]
Algorithm:
- Build the
all_combo_dict. - In BFS, for a word
hot, look up*ot,h*t,ho*in the dictionary. - The values are the direct neighbors.
from collections import defaultdict
class Solution:
def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
if endWord not in wordList:
return 0
# Pre-processing
L = len(beginWord)
all_combo_dict = defaultdict(list)
for word in wordList:
for i in range(L):
all_combo_dict[word[:i] + "*" + word[i+1:]].append(word)
# BFS
queue = deque([(beginWord, 1)])
visited = {beginWord}
while queue:
current_word, level = queue.popleft()
for i in range(L):
intermediate_word = current_word[:i] + "*" + current_word[i+1:]
for neighbor in all_combo_dict[intermediate_word]:
if neighbor == endWord:
return level + 1
if neighbor not in visited:
visited.add(neighbor)
queue.append((neighbor, level + 1))
# Optimization: Clear the list to prevent reprocessing
# all_combo_dict[intermediate_word] = []
return 0
Trade-offs:
- Pros: Very fast neighbor lookup O(M) instead of O(26 \cdot M).
- Cons: High memory usage to store the dictionary. If words are dense (many words match
*ot), the adjacency lists become long.
6. Deep Dive: A* Search (Heuristic Search)
BFS is “blind”, it explores in all directions equally. A* Search uses a heuristic to prioritize paths that seem closer to the target.
Heuristic Function h(n):
We need an admissible heuristic (never overestimates the cost).
- Hamming Distance: The number of positions where characters differ.
hitvscog: 3 differences.hotvscog: 2 differences.hotis “closer” tocogthanhit.
Algorithm:
Use a Priority Queue instead of a standard Queue. Priority = g(n) + h(n), where:
g(n): Cost from start to current node (current level).h(n): Estimated cost from current node to end.
import heapq
class Solution:
def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
wordSet = set(wordList)
if endWord not in wordSet:
return 0
def heuristic(word):
return sum(c1 != c2 for c1, c2 in zip(word, endWord))
# Priority Queue: (estimated_total_cost, current_cost, word)
pq = [(heuristic(beginWord) + 1, 1, beginWord)]
visited = {beginWord: 1} # word -> cost
while pq:
_, cost, word = heapq.heappop(pq)
if word == endWord:
return cost
if cost > visited.get(word, float('inf')):
continue
for i in range(len(word)):
original_char = word[i]
for char_code in range(ord('a'), ord('z') + 1):
char = chr(char_code)
if char == original_char:
continue
new_word = word[:i] + char + word[i+1:]
if new_word in wordSet:
new_cost = cost + 1
if new_cost < visited.get(new_word, float('inf')):
visited[new_word] = new_cost
priority = new_cost + heuristic(new_word)
heapq.heappush(pq, (priority, new_cost, new_word))
return 0
Why A* isn’t always better here: In high-dimensional spaces like this (where branching factor is ~25), the heuristic (Hamming distance) is often not strong enough to prune the search space significantly compared to Bi-directional BFS. Bi-directional BFS is usually the winner for this specific problem.
7. Deep Dive: Real-world Applications
While transforming “hit” to “cog” is a puzzle, the underlying concepts have serious applications:
- Genetic Sequencing (Edit Distance):
- DNA sequences can be modeled as strings.
- Mutations are single-character changes.
- Finding the shortest path between two DNA sequences helps trace evolutionary history.
- Spell Checking:
- If a user types “wrd”, we want to find the closest valid word.
- This is a BFS of depth 1 or 2 from the typo.
- Error Correction Codes:
- Hamming codes allow detecting and correcting single-bit errors.
- This is essentially finding the nearest valid “codeword” in the graph of all possible binary strings.
- Semantic Word Embeddings:
- In NLP, we often want to move from one concept to another.
- “King” - “Man” + “Woman” = “Queen”.
- This is a continuous version of a word ladder in vector space.
8. Deep Dive: Variations and Constraints
Variation 1: Variable Word Lengths
- Rule: You can insert, delete, or replace a character.
- Graph: Neighbors include
hot->ho(delete),hot->shot(insert),hot->hat(replace). - Complexity: Branching factor increases significantly (
26 \cdot Mreplacements +Mdeletions +26 \cdot (M+1)insertions).
Variation 2: Weighted Edges
- Rule: Changing a vowel costs 2, consonant costs 1.
- Algorithm: Use Dijkstra’s Algorithm instead of BFS.
Variation 3: Constraint Satisfaction
- Rule: Must pass through a specific word (e.g.,
hit-> … ->dot-> … ->cog). - Algorithm: Run BFS twice:
hit->dotANDdot->cog. Sum the lengths.
Comparison Table
| Approach | Time Complexity | Space Complexity | Pros | Cons |
|---|---|---|---|---|
| Simple BFS | O(M^2 N) | O(MN) | Simple, guarantees shortest path | Slow for large graphs |
| Bi-directional BFS | O(M^2 N^{0.5}) | O(MN^{0.5}) | Fastest for 2-point search | More code, needs start/end known |
| Word Ladder II | Exponential | Exponential | Finds ALL paths | Very memory intensive |
| A* Search | Depends on heuristic | O(MN) | Good for single path | Heuristic overhead, PQ overhead |
| Pre-processed Map | O(M^2 N) | O(M^2 N) | Fast neighbor lookup | High memory usage |
Implementation in Other Languages
C++:
class Solution {
public:
int ladderLength(string beginWord, string endWord, vector<string>& wordList) {
unordered_set<string> wordSet(wordList.begin(), wordList.end());
if (wordSet.find(endWord) == wordSet.end()) return 0;
queue<pair<string, int>> q;
q.push({beginWord, 1});
while (!q.empty()) {
auto [word, level] = q.front();
q.pop();
if (word == endWord) return level;
for (int i = 0; i < word.size(); ++i) {
char original = word[i];
for (char c = 'a'; c <= 'z'; ++c) {
if (c == original) continue;
word[i] = c;
if (wordSet.count(word)) {
q.push({word, level + 1});
wordSet.erase(word); // Mark as visited
}
}
word[i] = original;
}
}
return 0;
}
};
Java:
class Solution {
public int ladderLength(String beginWord, String endWord, List<String> wordList) {
Set<String> wordSet = new HashSet<>(wordList);
if (!wordSet.contains(endWord)) return 0;
Queue<String> queue = new LinkedList<>();
queue.offer(beginWord);
int level = 1;
while (!queue.isEmpty()) {
int size = queue.size();
for (int i = 0; i < size; i++) {
String currentWord = queue.poll();
char[] wordChars = currentWord.toCharArray();
for (int j = 0; j < wordChars.length; j++) {
char originalChar = wordChars[j];
for (char c = 'a'; c <= 'z'; c++) {
if (c == originalChar) continue;
wordChars[j] = c;
String newWord = new String(wordChars);
if (newWord.equals(endWord)) return level + 1;
if (wordSet.contains(newWord)) {
queue.offer(newWord);
wordSet.remove(newWord);
}
}
wordChars[j] = originalChar;
}
}
level++;
}
return 0;
}
}
Top Interview Questions
Q1: What if the word list is too large to fit in memory? Answer: If the list is on disk, we can’t use a hash set for O(1) lookups.
- Bloom Filter: Load a Bloom Filter into memory. It can tell us if a word is definitely not in the set or probably in the set. If probably, check disk.
- Trie (Prefix Tree): Store words in a Trie to save space (shared prefixes).
- Distributed Search: Partition the words across multiple machines (sharding).
Q2: How would you handle words of different lengths? Answer: The problem definition implies equal lengths. If we allow insertions/deletions (Edit Distance = 1), we generate neighbors by:
- Substitution:
hit->hat - Insertion:
hit->hits,chit - Deletion:
hit->hi,itWe must check if these new words exist in the dictionary.
Q3: Can we use DFS? Answer: DFS is not suitable for finding the shortest path in an unweighted graph.
- DFS dives deep. It might find a path of length 100 before finding the optimal path of length 5.
- To find the shortest path with DFS, you’d have to explore all paths and compare them, which is O(N!). BFS finds the shortest path as soon as it reaches the target.
Q4: What is the maximum number of edges from a word?
Answer:
For a word of length M and an alphabet size of 26:
- Each of the
Mpositions can be changed to 25 other characters. - Total potential neighbors =
M \times 25. - However, the number of valid edges depends on how many of these potential neighbors actually exist in
wordList.
Q5: How does Bi-directional BFS handle the meeting point? Answer: The meeting point is not necessarily a single node. It’s an edge.
- Search A reaches node
U. - Search B reaches node
V. - If
Vis a neighbor ofU, the paths connect. - The total length is
level(U) + level(V). In my implementation, I checkif new_word in endSet, which effectively checks for this connection.
Key Takeaways
- Graph Modeling: The core skill is recognizing that words are nodes and edits are edges.
- BFS for Shortest Path: Always the first choice for unweighted shortest path problems.
- Bi-directional BFS: A critical optimization for search problems where start and end are known. It reduces the search space from
b^dto2 \cdot b^{d/2}. - Neighbor Generation: Iterating ‘a’-‘z’ (
26M) is usually faster than iterating the word list (N) whenNis large. - Word Ladder II: Requires a two-phase approach: BFS to build the shortest-path graph, then DFS to extract paths.
Summary
| Aspect | Insight |
|---|---|
| Core Problem | Shortest path in a graph of words |
| Best Algorithm | Bi-directional BFS |
| Key Optimization | Generate neighbors by swapping chars, not iterating list |
| Applications | Spell checking, genetic sequencing, puzzle solving |
FAQ
Why is BFS the correct algorithm for Word Ladder instead of DFS?
BFS explores nodes level by level, which guarantees that the first time we reach the target word, we have found the shortest transformation sequence. DFS dives deep along one path and may find a much longer path before discovering the optimal one. To find the shortest path with DFS, you would need to explore all possible paths and compare their lengths, which is exponentially slower.
How does bidirectional BFS improve performance over standard BFS?
Standard BFS explores roughly b^d nodes where b is the branching factor and d is the distance to the target. Bidirectional BFS searches from both the start word and the end word simultaneously, each exploring b^(d/2) nodes. The total work is 2*b^(d/2) instead of b^d. For a branching factor of 10 and distance 6, this means visiting 2,000 nodes instead of 1,000,000.
What is the wildcard pattern optimization for neighbor generation?
Instead of comparing every word pair O(N^2), create intermediate wildcard states like *ot for hot. Build a map from each wildcard pattern to all matching words. During BFS, look up a word’s wildcard patterns to find neighbors in O(M) time per word, where M is the word length. This trades memory for speed and is especially effective with large dictionaries.
What are real-world applications of the Word Ladder algorithm?
Word Ladder concepts apply to spell checking (finding closest valid words), genetic sequence alignment (mutations as single-character changes), error correction codes (Hamming codes for detecting and correcting bit errors), and semantic word embeddings in NLP where moving between concepts in vector space is a continuous version of word ladders.
Originally published at: arunbaby.com/dsa/0032-word-ladder