Serialize and Deserialize Binary Tree
“Data is only useful if it can survive the journey from RAM to Disk and back again.”
1. Problem Statement
This is one of the most practical “System Design” questions disguised as a coding problem. Serialization is the process of converting a data structure or object into a sequence of bits so that it can be stored in a file or memory buffer, or transmitted across a network connection link to be reconstructed later in the same or another computer environment.
Design an algorithm to serialize and deserialize a binary tree. There is no restriction on how your serialization/deserialization algorithm should work. You just need to ensure that a binary tree can be serialized to a string and this string can be deserialized to the original tree structure.
Example:
1
/ \
2 3
/ \
4 5
Input: root = [1,2,3,null,null,4,5]
Output: [1,2,3,null,null,4,5]
Constraints:
- The number of nodes in the tree is in the range
[0, 10^4]. - Value range:
-1000 <= Node.val <= 1000.
2. Understanding the Problem
2.1 The Challenge of Ambiguity
If I tell you a tree has nodes [1, 2, 3], that is ambiguous.
- Is 2 left or right of 1?
- Is 3 a child of 1 or 2?
Standard traversals (Inorder, Preorder) are not unique on their own.
- Preorder
[1, 2]could be1->left(2)or1->right(2). - Key Insight: To make a traversal unique, we must record the Null Pointers.
If we record [1, 2, #, #, #] (where # is null), we know exactly that 1 has a left child 2, and 2 has no children.
2.2 System Design Context
In a real system (like DOM tree serialization or formatting a JSON response), we care about:
- Compactness: The string shouldn’t be 10x larger than the data.
- Streaming: Can we start processing the root before we receive the leaves? (DFS allows this).
- Human Readability: JSON is readable, Binary Protobuf is compact.
3. Approach 1: Preorder Traversal (DFS) – The “Streaming” Way
This is the standard recursive solution. We visit the root, then recursively serialize the left subtree, then the right subtree.
Serialization Strategy:
- If Node is null: Append
"N,"(or any sentinel). - If Node exists: Append
str(val) + ",". - Recurse Left.
- Recurse Right.
Example for the tree above: 1,2,N,N,3,4,N,N,5,N,N,
Deserialization Strategy:
- Split the string into a Queue (values list).
- Pop the first value.
- If “N”: return None.
- Else: Create Node(val).
node.left= recurse()node.right= recurse()
- Return Node.
Since the list is built in Preorder, the recursion naturally consumes items in the exact order they are needed to rebuild the tree top-down.
4. Approach 2: Level Order Traversal (BFS) – The “Layered” Way
This is how LeetCode visualizes trees (e.g., [1,2,3,null,null,4,5]). It saves space if the tree is dense (complete binary tree) but wastes space if the tree is sparse/skewed.
Strategy:
- Use a Queue.
- Push Root.
- While Queue:
- Pop Node.
- If Node: Append val. Push Left, Push Right.
- Else: Append “N”.
Deserialization:
- This is trickier. You need a Queue for the parent nodes waiting for children.
- Root = List[0]. Push to Queue.
- Pointer
i = 1. - While Queue:
- Parent = Pop().
- Left = List[i]. If not “N”, attach to Parent.left, Push Left.
- Right = List[i+1]. If not “N”, attach to Parent.right, Push Right.
Verdict: DFS (Approach 1) is generally cleaner to implement recursively. BFS is better if you want to verify the top of the tree without reading the whole string.
5. Implementation: Preorder DFS (Python)
We will implement the DFS approach because it handles skewed trees gracefully and the code is elegant.
class Codec:
def serialize(self, root):
"""Encodes a tree to a single string.
:type root: TreeNode
:rtype: str
"""
path = []
def dfs(node):
if not node:
path.append("#")
return
# Preorder: Root -> Left -> Right
path.append(str(node.val))
dfs(node.left)
dfs(node.right)
dfs(root)
# Join with comma to handle multi-digit numbers
return ",".join(path)
def deserialize(self, data):
"""Decodes your encoded data to tree.
:type data: str
:rtype: TreeNode
"""
# Create an iterator (or queue) for O(1) popping
values = iter(data.split(","))
def build():
try:
val = next(values)
except StopIteration:
return None
# Base Case: Null pointer
if val == "#":
return None
# Recursive Step
node = TreeNode(int(val))
node.left = build() # Consumes the next chunk of the stream
node.right = build() # Consumes the remainder
return node
return build()
6. Testing Strategy
Test Case 1: Skewed Tree (Linked List)
1 -> 2 -> 3
- Serialize:
1,2,3,N,N,N,N - Deserialize:
- Pop 1 (Root)
- Left = Pop 2
- Left of 2 = Pop 3
- Left of 3 = Pop N
- Right of 3 = Pop N
- Right of 2 = Pop N…
- Works perfectly.
Test Case 2: Full Null Tree
root = None
- Serialize:
# - Deserialize: Returns None immediately.
Test Case 3: Negative Values
Tree values can be negative (-55). Using , delimiter ensures we parse -55 correctly and not split it as - and 55.
7. Complexity Analysis
Time Complexity
- Serialize: $O(N)$. We visit every node once. String concatenation (with
join) is linear. - Deserialize: $O(N)$. We process every value in the string exactly once.
Space Complexity
- Serialize: $O(N)$ for the recursion stack (skewed tree) and the output string.
- Deserialize: $O(N)$ for the recursion stack and splitting the string.
8. Production Considerations
- Format Selection:
- CSV/String: Human readable, easy to debug. Used in this solution.
- JSON: Standard, but verbose (
{"val":1, "left":...}). huge overhead. - Protobuf/Binary: Ideal for production. Uses Varints (variable integers) to store small numbers in 1 byte. No commas needed. 10x smaller.
- Recursion Limit:
- Python’s default is 1000. For a deep tree (skewed), this crashes. In production, use an Iterative DFS with an explicit stack to handle trees with depth > 1000.
9. Connections to ML Systems
This problem is the exact algorithmic counterpart to Model Serialization (ML System Design).
- Tree: Neural Network Graph (Layers are nodes).
- Weights: Node Values.
- Topology: Left/Right pointers.
- ONNX Format: A standardized “string” representation of the computation graph so it can be moved from PyTorch (Python) to C++ Runtime.
Also relates to Speech Model Export (Speech Tech), where we serialize the state of streaming Transducers.
10. Interview Strategy
- Clarify Format: Ask “Can I use JSON?” or “Do I need a binary format?”. Usually, they want the logic, not the library.
- Mention Traversal Choice: “I will use Preorder DFS because it serializes the root first, making deserialization straightforward since we always know what node we are building.”
- Handle Ambiguity: Explicitly state “I will use a sentinel character ‘#’ to denote nulls to preserve uniqueness.”
11. Key Takeaways
- Sentinels are Mandatory: You cannot serialize a structure uniquely without representing the “absence” of data (nulls).
- Streaming Nature: Preorder traversal is “stream-friendly”. You can start building the tree as soon as the first byte arrives.
- State Management: Deserialization is just “replaying” the construction steps recorded during serialization.
Originally published at: arunbaby.com/dsa/0047-serialize-deserialize-tree
If you found this helpful, consider sharing it with others who might benefit.