20 minute read

“Transforming ‘cold’ to ‘warm’ one letter at a time.”

TL;DR

Word Ladder is a shortest-path problem on an implicit graph where each word is a node and edges connect words differing by one letter. Standard BFS guarantees the shortest transformation sequence in O(M^2N) time. Bidirectional BFS is the key optimization, reducing search space from b^d to 2b^(d/2) by searching from both start and end words simultaneously. Generate neighbors by trying all 26 characters at each position rather than comparing against the full word list. See also Clone Graph for more BFS graph traversal patterns, or explore graph-based retrieval in ML systems.

1. Problem Statement

A transformation sequence from word beginWord to word endWord using a dictionary wordList is a sequence of words beginWord -> s1 -> s2 -> ... -> sk such that:

  1. Every adjacent pair of words differs by a single letter.
  2. Every si for 1 <= i <= k is in wordList. Note that beginWord does not need to be in wordList.
  3. sk == endWord.

Given two words, beginWord and endWord, and a dictionary wordList, return the number of words in the shortest transformation sequence from beginWord to endWord, or 0 if no such sequence exists.

Example 1:

Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
Output: 5
Explanation: hit -> hot -> dot -> dog -> cog

Example 2:

Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
Output: 0
Explanation: The endWord "cog" is not in wordList, therefore no valid transformation sequence exists.

2. BFS Solution (Shortest Path in Unweighted Graph)

Intuition:

  • Treat each word as a node in a graph.
  • An edge exists between two nodes if they differ by exactly one letter.
  • The problem becomes finding the shortest path from beginWord to endWord.
  • BFS (Breadth-First Search) is the standard algorithm for finding the shortest path in an unweighted graph because it explores nodes layer by layer.

Graph Construction Strategy: There are two main ways to find neighbors of a word:

  1. Compare with all other words: For a word W, iterate through the entire wordList and check if the difference is 1 character. This takes O(N \cdot M), where N is the number of words and M is the word length. Doing this for every word in BFS gives O(N^2 \cdot M). This is too slow if N is large (e.g., 5000 words).
  2. Generate all possible neighbors: For a word W, change each of its characters to ‘a’ through ‘z’. This generates 26 \cdot M potential words. Check if each potential word exists in wordSet. This takes O(26 \cdot M \cdot M) (hashing takes O(M)). This is much faster when N is large but M is small (e.g., M=5).

Algorithm:

  1. Convert wordList to a set wordSet for O(1) lookups.
  2. Initialize a queue with (beginWord, 1). The level starts at 1 because the sequence length includes the start word.
  3. Initialize a visited set to avoid cycles and redundant processing.
  4. While the queue is not empty:
    • Dequeue the current word and its level.
    • If the word is endWord, return the level.
    • Generate all possible neighbors by changing one character at a time.
    • If a neighbor is in wordSet and not visited, add it to the queue and mark as visited.
from collections import deque
from typing import List

class Solution:
    def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
        wordSet = set(wordList)
        if endWord not in wordSet:
            return 0

            queue = deque([(beginWord, 1)]) # (current_word, level)
            visited = set([beginWord])

            while queue:
                word, level = queue.popleft()

                if word == endWord:
                    return level

                    # Generate all possible neighbors
                    for i in range(len(word)):
                        original_char = word[i]
                        for char_code in range(ord('a'), ord('z') + 1):
                            char = chr(char_code)
                            if char == original_char:
                                continue

                                # Create new word: hit -> ait, bit, ... hat, hbt ...
                                new_word = word[:i] + char + word[i+1:]

                                if new_word in wordSet and new_word not in visited:
                                    visited.add(new_word)
                                    queue.append((new_word, level + 1))

                                    return 0

Time Complexity Analysis:

  • Preprocessing: Converting list to set takes O(N \cdot M).
  • BFS Traversal: In the worst case, we visit every word in the wordList.
  • Neighbor Generation: For each word, we iterate through its length M. For each position, we try 26 characters. Creating the new string takes O(M). Checking existence in the set takes O(M) (average).
  • Total Complexity: O(N \cdot M^2).
  • If N = 5000 and M = 5, N \cdot M^2 = 5000 \cdot 25 = 125,000 operations.
  • Comparing all pairs would be N^2 \cdot M = 25,000,000 \cdot 5 = 125,000,000.
  • The generation approach is significantly faster for typical constraints.

Space Complexity:

  • O(N \cdot M) to store wordSet, visited set, and the queue.

3. Bi-directional BFS (Optimization)

Intuition: Standard BFS searches a tree that grows exponentially. If the branching factor is b and the distance to the target is d, BFS visits roughly b^d nodes. Bi-directional BFS runs two simultaneous searches: one from the start and one from the end. They meet in the middle.

  • Search 1: Start -> Middle (distance d/2)
  • Search 2: End -> Middle (distance d/2)
  • Total nodes visited: b^{d/2} + b^{d/2} = 2 \cdot b^{d/2}.
  • This is exponentially smaller than b^d.

Algorithm:

  1. Maintain two sets: beginSet (initially {beginWord}) and endSet (initially {endWord}).
  2. Maintain a visited set containing all words visited by either search.
  3. In each step, always expand the smaller set. This keeps the search balanced and minimizes the number of generated neighbors.
  4. For each word in the current set, generate neighbors.
  5. If a neighbor is found in the opposite set, the two searches have met! Return level + 1.
  6. Otherwise, if the neighbor is valid (in wordSet and not visited), add it to the next layer.
class Solution:
    def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
        wordSet = set(wordList)
        if endWord not in wordSet:
            return 0

            beginSet = {beginWord}
            endSet = {endWord}
            visited = {beginWord, endWord}
            length = 1

            while beginSet and endSet:
                # Always expand the smaller set to minimize work
                if len(beginSet) > len(endSet):
                    beginSet, endSet = endSet, beginSet

                    newSet = set()
                    for word in beginSet:
                        for i in range(len(word)):
                            original_char = word[i]
                            for char_code in range(ord('a'), ord('z') + 1):
                                char = chr(char_code)
                                if char == original_char:
                                    continue

                                    new_word = word[:i] + char + word[i+1:]

                                    # If the neighbor is in the opposite set, we connected the paths
                                    if new_word in endSet:
                                        return length + 1

                                        if new_word in wordSet and new_word not in visited:
                                            visited.add(new_word)
                                            newSet.add(new_word)

                                            beginSet = newSet
                                            length += 1

                                            return 0

Performance Comparison: Imagine a graph where each node has 10 neighbors (b=10) and the shortest path is 6 steps (d=6).

  • Standard BFS: Visits \approx 10^6 = 1,000,000 nodes.
  • Bi-directional BFS: Visits \approx 2 \times 10^3 = 2,000 nodes.
  • Speedup: 500x faster!

4. Word Ladder II (Return All Shortest Paths)

Problem: Instead of just the length, return all shortest transformation sequences. Example: hit -> hot -> dot -> dog -> cog AND hit -> hot -> lot -> log -> cog.

Challenge:

  • We need to store the path structure.
  • Standard BFS only stores the distance.
  • Storing full paths in the queue consumes exponential memory.

Optimized Approach:

  1. BFS for Graph Building: Run BFS from beginWord to find the shortest distance to endWord. While doing this, build a DAG (Directed Acyclic Graph) or a parents map where parents[child] contains all parents that lead to child with the shortest distance.
    • Crucial: Only add edges from level L to level L+1. Do not add edges within the same level or back to previous levels.
  2. DFS for Path Reconstruction: Run DFS (Backtracking) from endWord back to beginWord using the parents map to reconstruct all paths.
from collections import defaultdict

class Solution:
    def findLadders(self, beginWord: str, endWord: str, wordList: List[str]) -> List[List[str]]:
        wordSet = set(wordList)
        if endWord not in wordSet:
            return []

            # BFS initialization
            layer = {beginWord}
            parents = defaultdict(set) # word -> set of parents
            wordSet.discard(beginWord) # Remove start word to avoid cycles

            found = False
            while layer and not found:
                next_layer = set()
                # Remove words in current layer from wordSet to prevent visiting them again in future layers
                # But allow visiting them multiple times in the *current* layer (for multiple paths)
                for word in layer:
                    if word in wordSet:
                        wordSet.remove(word)

                        # Actually, we need to remove words visited in the *current* layer from wordSet *after* processing the layer
                        # Let's refine the logic:
                        # 1. Generate next layer
                        # 2. If endWord found, stop after this layer
                        # 3. Remove next_layer words from wordSet

                        current_layer_visited = set()

                        for word in layer:
                            for i in range(len(word)):
                                for char_code in range(ord('a'), ord('z') + 1):
                                    new_word = word[:i] + chr(char_code) + word[i+1:]

                                    if new_word == endWord:
                                        parents[endWord].add(word)
                                        found = True
                                    elif new_word in wordSet:
                                        next_layer.add(new_word)
                                        parents[new_word].add(word)
                                        current_layer_visited.add(new_word)

                                        wordSet -= current_layer_visited
                                        layer = next_layer

                                        if not found:
                                            return []

                                            # DFS Backtracking to reconstruct paths
                                            res = []
    def backtrack(current_word, path):
        if current_word == beginWord:
            res.append(path[::-1]) # Reverse path to get begin -> end
            return

            for parent in parents[current_word]:
                path.append(parent)
                backtrack(parent, path)
                path.pop()

                backtrack(endWord, [endWord])
                return res

Complexity of Word Ladder II:

  • Time: The number of shortest paths can be exponential. In the worst case, we might traverse a huge number of paths. However, the BFS part is still polynomial. The DFS part depends on the output size.
  • Space: Storing the parents map is proportional to the number of edges in the shortest-path DAG.

5. Deep Dive: Pre-processing for Faster Neighbor Generation

If wordList is very sparse (e.g., words are 10 chars long, but only 1000 words exist), the 26 \cdot M generation strategy might generate many invalid words. We can pre-process the dictionary using a wildcard map.

Concept:

  • Word: hot
  • Intermediate states: *ot, h*t, ho*
  • Map:
  • *ot -> [hot, dot, lot]
  • h*t -> [hot, hit, hat]
  • ho* -> [hot, how]

Algorithm:

  1. Build the all_combo_dict.
  2. In BFS, for a word hot, look up *ot, h*t, ho* in the dictionary.
  3. The values are the direct neighbors.
from collections import defaultdict

class Solution:
    def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
        if endWord not in wordList:
            return 0

            # Pre-processing
            L = len(beginWord)
            all_combo_dict = defaultdict(list)
            for word in wordList:
                for i in range(L):
                    all_combo_dict[word[:i] + "*" + word[i+1:]].append(word)

                    # BFS
                    queue = deque([(beginWord, 1)])
                    visited = {beginWord}

                    while queue:
                        current_word, level = queue.popleft()
                        for i in range(L):
                            intermediate_word = current_word[:i] + "*" + current_word[i+1:]

                            for neighbor in all_combo_dict[intermediate_word]:
                                if neighbor == endWord:
                                    return level + 1
                                    if neighbor not in visited:
                                        visited.add(neighbor)
                                        queue.append((neighbor, level + 1))

                                        # Optimization: Clear the list to prevent reprocessing
                                        # all_combo_dict[intermediate_word] = []

                                        return 0

Trade-offs:

  • Pros: Very fast neighbor lookup O(M) instead of O(26 \cdot M).
  • Cons: High memory usage to store the dictionary. If words are dense (many words match *ot), the adjacency lists become long.

BFS is “blind”, it explores in all directions equally. A* Search uses a heuristic to prioritize paths that seem closer to the target.

Heuristic Function h(n): We need an admissible heuristic (never overestimates the cost).

  • Hamming Distance: The number of positions where characters differ.
  • hit vs cog: 3 differences.
  • hot vs cog: 2 differences.
  • hot is “closer” to cog than hit.

Algorithm: Use a Priority Queue instead of a standard Queue. Priority = g(n) + h(n), where:

  • g(n): Cost from start to current node (current level).
  • h(n): Estimated cost from current node to end.
import heapq

class Solution:
    def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
        wordSet = set(wordList)
        if endWord not in wordSet:
            return 0

    def heuristic(word):
        return sum(c1 != c2 for c1, c2 in zip(word, endWord))

        # Priority Queue: (estimated_total_cost, current_cost, word)
        pq = [(heuristic(beginWord) + 1, 1, beginWord)]
        visited = {beginWord: 1} # word -> cost

        while pq:
            _, cost, word = heapq.heappop(pq)

            if word == endWord:
                return cost

                if cost > visited.get(word, float('inf')):
                    continue

                    for i in range(len(word)):
                        original_char = word[i]
                        for char_code in range(ord('a'), ord('z') + 1):
                            char = chr(char_code)
                            if char == original_char:
                                continue

                                new_word = word[:i] + char + word[i+1:]

                                if new_word in wordSet:
                                    new_cost = cost + 1
                                    if new_cost < visited.get(new_word, float('inf')):
                                        visited[new_word] = new_cost
                                        priority = new_cost + heuristic(new_word)
                                        heapq.heappush(pq, (priority, new_cost, new_word))

                                        return 0

Why A* isn’t always better here: In high-dimensional spaces like this (where branching factor is ~25), the heuristic (Hamming distance) is often not strong enough to prune the search space significantly compared to Bi-directional BFS. Bi-directional BFS is usually the winner for this specific problem.

7. Deep Dive: Real-world Applications

While transforming “hit” to “cog” is a puzzle, the underlying concepts have serious applications:

  1. Genetic Sequencing (Edit Distance):
    • DNA sequences can be modeled as strings.
    • Mutations are single-character changes.
    • Finding the shortest path between two DNA sequences helps trace evolutionary history.
  2. Spell Checking:
    • If a user types “wrd”, we want to find the closest valid word.
    • This is a BFS of depth 1 or 2 from the typo.
  3. Error Correction Codes:
    • Hamming codes allow detecting and correcting single-bit errors.
    • This is essentially finding the nearest valid “codeword” in the graph of all possible binary strings.
  4. Semantic Word Embeddings:
    • In NLP, we often want to move from one concept to another.
    • “King” - “Man” + “Woman” = “Queen”.
    • This is a continuous version of a word ladder in vector space.

8. Deep Dive: Variations and Constraints

Variation 1: Variable Word Lengths

  • Rule: You can insert, delete, or replace a character.
  • Graph: Neighbors include hot -> ho (delete), hot -> shot (insert), hot -> hat (replace).
  • Complexity: Branching factor increases significantly (26 \cdot M replacements + M deletions + 26 \cdot (M+1) insertions).

Variation 2: Weighted Edges

  • Rule: Changing a vowel costs 2, consonant costs 1.
  • Algorithm: Use Dijkstra’s Algorithm instead of BFS.

Variation 3: Constraint Satisfaction

  • Rule: Must pass through a specific word (e.g., hit -> … -> dot -> … -> cog).
  • Algorithm: Run BFS twice: hit -> dot AND dot -> cog. Sum the lengths.

Comparison Table

Approach Time Complexity Space Complexity Pros Cons
Simple BFS O(M^2 N) O(MN) Simple, guarantees shortest path Slow for large graphs
Bi-directional BFS O(M^2 N^{0.5}) O(MN^{0.5}) Fastest for 2-point search More code, needs start/end known
Word Ladder II Exponential Exponential Finds ALL paths Very memory intensive
A* Search Depends on heuristic O(MN) Good for single path Heuristic overhead, PQ overhead
Pre-processed Map O(M^2 N) O(M^2 N) Fast neighbor lookup High memory usage

Implementation in Other Languages

C++:

class Solution {
public:
 int ladderLength(string beginWord, string endWord, vector<string>& wordList) {
 unordered_set<string> wordSet(wordList.begin(), wordList.end());
 if (wordSet.find(endWord) == wordSet.end()) return 0;
 
 queue<pair<string, int>> q;
 q.push({beginWord, 1});
 
 while (!q.empty()) {
 auto [word, level] = q.front();
 q.pop();
 
 if (word == endWord) return level;
 
 for (int i = 0; i < word.size(); ++i) {
 char original = word[i];
 for (char c = 'a'; c <= 'z'; ++c) {
 if (c == original) continue;
 word[i] = c;
 if (wordSet.count(word)) {
 q.push({word, level + 1});
 wordSet.erase(word); // Mark as visited
 }
 }
 word[i] = original;
 }
 }
 return 0;
 }
};

Java:

class Solution {
 public int ladderLength(String beginWord, String endWord, List<String> wordList) {
 Set<String> wordSet = new HashSet<>(wordList);
 if (!wordSet.contains(endWord)) return 0;
 
 Queue<String> queue = new LinkedList<>();
 queue.offer(beginWord);
 int level = 1;
 
 while (!queue.isEmpty()) {
 int size = queue.size();
 for (int i = 0; i < size; i++) {
 String currentWord = queue.poll();
 char[] wordChars = currentWord.toCharArray();
 
 for (int j = 0; j < wordChars.length; j++) {
 char originalChar = wordChars[j];
 for (char c = 'a'; c <= 'z'; c++) {
 if (c == originalChar) continue;
 wordChars[j] = c;
 String newWord = new String(wordChars);
 
 if (newWord.equals(endWord)) return level + 1;
 
 if (wordSet.contains(newWord)) {
 queue.offer(newWord);
 wordSet.remove(newWord);
 }
 }
 wordChars[j] = originalChar;
 }
 }
 level++;
 }
 return 0;
 }
}

Top Interview Questions

Q1: What if the word list is too large to fit in memory? Answer: If the list is on disk, we can’t use a hash set for O(1) lookups.

  1. Bloom Filter: Load a Bloom Filter into memory. It can tell us if a word is definitely not in the set or probably in the set. If probably, check disk.
  2. Trie (Prefix Tree): Store words in a Trie to save space (shared prefixes).
  3. Distributed Search: Partition the words across multiple machines (sharding).

Q2: How would you handle words of different lengths? Answer: The problem definition implies equal lengths. If we allow insertions/deletions (Edit Distance = 1), we generate neighbors by:

  1. Substitution: hit -> hat
  2. Insertion: hit -> hits, chit
  3. Deletion: hit -> hi, it We must check if these new words exist in the dictionary.

Q3: Can we use DFS? Answer: DFS is not suitable for finding the shortest path in an unweighted graph.

  • DFS dives deep. It might find a path of length 100 before finding the optimal path of length 5.
  • To find the shortest path with DFS, you’d have to explore all paths and compare them, which is O(N!). BFS finds the shortest path as soon as it reaches the target.

Q4: What is the maximum number of edges from a word? Answer: For a word of length M and an alphabet size of 26:

  • Each of the M positions can be changed to 25 other characters.
  • Total potential neighbors = M \times 25.
  • However, the number of valid edges depends on how many of these potential neighbors actually exist in wordList.

Q5: How does Bi-directional BFS handle the meeting point? Answer: The meeting point is not necessarily a single node. It’s an edge.

  • Search A reaches node U.
  • Search B reaches node V.
  • If V is a neighbor of U, the paths connect.
  • The total length is level(U) + level(V). In my implementation, I check if new_word in endSet, which effectively checks for this connection.

Key Takeaways

  1. Graph Modeling: The core skill is recognizing that words are nodes and edits are edges.
  2. BFS for Shortest Path: Always the first choice for unweighted shortest path problems.
  3. Bi-directional BFS: A critical optimization for search problems where start and end are known. It reduces the search space from b^d to 2 \cdot b^{d/2}.
  4. Neighbor Generation: Iterating ‘a’-‘z’ (26M) is usually faster than iterating the word list (N) when N is large.
  5. Word Ladder II: Requires a two-phase approach: BFS to build the shortest-path graph, then DFS to extract paths.

Summary

Aspect Insight
Core Problem Shortest path in a graph of words
Best Algorithm Bi-directional BFS
Key Optimization Generate neighbors by swapping chars, not iterating list
Applications Spell checking, genetic sequencing, puzzle solving

FAQ

Why is BFS the correct algorithm for Word Ladder instead of DFS?

BFS explores nodes level by level, which guarantees that the first time we reach the target word, we have found the shortest transformation sequence. DFS dives deep along one path and may find a much longer path before discovering the optimal one. To find the shortest path with DFS, you would need to explore all possible paths and compare their lengths, which is exponentially slower.

How does bidirectional BFS improve performance over standard BFS?

Standard BFS explores roughly b^d nodes where b is the branching factor and d is the distance to the target. Bidirectional BFS searches from both the start word and the end word simultaneously, each exploring b^(d/2) nodes. The total work is 2*b^(d/2) instead of b^d. For a branching factor of 10 and distance 6, this means visiting 2,000 nodes instead of 1,000,000.

What is the wildcard pattern optimization for neighbor generation?

Instead of comparing every word pair O(N^2), create intermediate wildcard states like *ot for hot. Build a map from each wildcard pattern to all matching words. During BFS, look up a word’s wildcard patterns to find neighbors in O(M) time per word, where M is the word length. This trades memory for speed and is especially effective with large dictionaries.

What are real-world applications of the Word Ladder algorithm?

Word Ladder concepts apply to spell checking (finding closest valid words), genetic sequence alignment (mutations as single-character changes), error correction codes (Hamming codes for detecting and correcting bit errors), and semantic word embeddings in NLP where moving between concepts in vector space is a continuous version of word ladders.


Originally published at: arunbaby.com/dsa/0032-word-ladder