Binary Trees

A tree is a data structure which can be used to represent data which falls into a natural hierarchy. These notes primarily deal with binary trees -- see the definitions below.

Vocabulary -- trees in general

A tree is made of nodes. Each node stores some kind of data for us.

One special node, at the top of the tree, is the root.

Each node may have child nodes. If node A has a child node B, we refer to A as the parent node.

Each node except the root node has exactly one parent node. The root node has no parent.

Two nodes are siblings if they have the same parent.

One node is an ancestor of another node if it is the parent of the second node, parent of the parent of the second node, etc.

Nodes that have no children are called leaf nodes.

Any node can be viewed as the root of a subtree: its children, their children, etc.

The depth of a node is the number of steps between it and the root. The root is at depth 0, and the children of a node at depth N are at depth N+1. Confusingly, this is sometimes called the height instead.

The depth (or height) of the tree as a whole is the maximum depth of any node.

Vocabulary -- binary trees in particular

If we limit the number of child nodes to 2 per parent, a left child and a right child, we have what is called a binary tree.

In a binary tree, a node has a left subtree (the subtree starting with its left child) and a right subtree (the subtree starting with its right child).

A binary tree is full if all leaves are at the same depth and every node has either 2 children or none.

A binary tree is complete if it can be viewed as a full tree to which we have added leaf nodes at the bottom starting at the left.

A complete binary tree does not have to be full. A full binary tree is complete.

Operations on a tree

There are various operations we might want to perform on a tree:

Create a new empty tree.
Delete an existing tree.
Copy a tree.
Add a node to a tree in some location. (This will normally require that we have a pointer to the parent of the node in question.)
Remove a specific node from a tree. (This will normally require that we have a pointer to the parent of the node in question.)
Search a tree to locate a specific node or learn it doesn't exist.
Traverse the tree, visiting every node, in some order.
Find the height of the tree.
Find the height of a specific node.

There are lots of other possibilities.

Many of these operations can be done using recursion: do something with the data in a node, and call the function recursively to act on the subtrees.

Implementation of a binary tree

A common way to implement a binary tree is to use dynamically allocated links:

     class Node
     {
      private:
       int    Value;  // Of course, we could store any kind of data.
       Node * Left;
       Node * Right;  
      public:
       // lots of stuff
     };

The root of the binary tree is simply a variable of type Node *.

An advantage of this implementation is that we only use as many nodes as we need.

One disadvantage is that it is hard to locate the parent of a given node.

Another way to implement a binary tree is to use an array of nodes:

The root is at index 0.
The left and right children of a node at index N are at index 2*N+1 and 2*N+2, respectively.

An advantage here is that it is easy to identify the parent of a node.

A disadvantages are that there may be space wasted in the array for nodes that don't yet exist. We will need some way to mark the array entries that are not in use (a sentinel or flag value of some sort).

However we implement the tree, we will usually make it a class. If we are using the version with nodes and pointers, the Tree class will be a friend of the Node class. The main program that uses the tree will use only the public data members described in the Tree class's header file, and it will not necessarily know anything about the internal structure of the tree.

Traversing a tree

Sometimes we want to visit each node in a binary tree in some order and do something such as print the data in each node. There are several standard ways to do this using recursion:

Pre-order traversal

Print the data in the root.
If the root has a left subtree, process the left subtree.
If the root has a right subtree, process the right subtree.

In-order traversal

If the root has a left subtree, process the left subtree.
Print the data in the root.
If the root has a right subtree, process the right subtree.

Post-order traversal

If the root has a left subtree, process the left subtree.

If the root has a right subtree, process the right subtree.

Print the data in the root.

In each of these, most of the program code is devoted to handling the root node and to making sure the recursion eventually stops.

Instead of simply printing the data, we could be performing some other function on each node. For instance, a binary tree can be used to help parse an arithmetic expression full of binary operations; each node could contain either an operator or a number. We could build the tree and then traverse it to evaluate the overall expression.

Binary Search Trees

Often we use a binary tree to store data in some sort of order. We will assume here that we want data to be in ascending order.

A binary tree is a binary search tree if:

The data in each node is >= the data in all nodes in the left subtree.
The data in each node is <= the data in all nodes in the right subtree.

By using these properties, we can easily search the tree, decide where to add a new node, etc.

Examples (as pseudocode)

Some of these examples are for binary trees in general and some are for binary search trees in particular.

Example

Search a binary search tree for a node with a given value, Target. The return value is a pointer to a node. We do this recursively:

     if (Root == 0)
       return a value indicating failure
     else
       if (Root->Value == Target)
         return the root's location    
       else
         if (Root->Value is >= tTarget)
           search Root->Left
         else
           search Root->Right
         endif
       endif
     endif

Example

Search a binary search tree for the parent of a node with a given value, Target. We return two values: pointer PParent to the parent and P to the child. A special case occurs when the root node contains the target value; in this case, we return two copies of the root pointer. If the target value is not found, we return P = 0 and PParent = its last value. This is not recursive.

     if (Root->Value == Target)
       return P = Root and PParent = Root
     else
       P = Root
       while (P not = 0) and (P->Value not = Target)
         if (P->Value >= Target)
           PParent = P
           P = P->Left
         else
           PParent = P
           P = P->Right
         endif
       endwhile
       if (P == 0)
         PParent = 0
       endif
       return P and PParent
     endif

Example

In a binary tree, suppose we want to find the parent of the in-order successor of a node P (which has a right child). We return two values: pointer SuccParent to the parent, and Succ to the child.

     SuccParent = P
     Succ = P->Right
     while (Succ->Left not = 0)
       SuccParent = Succ
       Succ = Succ->Left
     endwhile 
     return Succ and SuccParent

Example

Suppose we want to count all the leaves on a binary tree. We do this by calling a function LeafCount which takes one argument P, a pointer to a node. We pass it the Root pointer as its argument P.

     if (P not = 0)
      if (P->Left == 0 and P->Right == 0)
       return 1;
      else
       return LeafCount(P->Left) + LeafCount(P->Right)
     else
      return 0

Notice this uses recursion. We start by checking for the simple case which will end the recursion. It is not easy to write this without using recursion.