Union-Find Data structure

Open the UnionFind.java file, where you will find a Java interface representing the UnionFind ADT. You must implement this ADT in RasterUnionFind.java file. This implementation will generally follow the Quick-Union strategy described in the lecture (visit the corresponding notes).

In Quick-Union, each disjoint set is maintained as a general tree.

The union-find data structure comprises a forest (a collection of trees), in which each item is initially the root of its own tree; then, trees are merged by union operations.

Caution

You can find various implementations of the Union-Find data structure online. However, what is asked of you here is different! The specification is highly specialized for the image processing task at hand. I have done this intentionally so that the solution to this take-home exam would not be a Google search away! I strongly advise against looking up implementations of Union-Find. It will likely result in a skew understanding of how it is specified here.

Let's now consider the core operations of the RasterUnionFind:

The find operation will take a pixel ID and return an integer. The returned value shall be interpreted as the "label" (or ID) of the set (component or region) the pixel belongs to. If find(pixelID1) == find(pixelID2), then the two pixels (with IDs of pixelID1 and pixelID2) are part of the same region.
The union operation takes the IDs of two pixels (pixelID1 and pixelID2), merges the sets containing these pixels, and returns true. However, if the two pixels are already part of the same region, the union operation does nothing but to return false.

Notice the RasterUnionFind class has two fields:

count tracks the number of disjoint sets (connected components or regions). At the start, the number of regions in an image equals the number of pixels in it. As you employ the union method to connect the pixels and make larger regions, the count decrements. Eventually, the count must reflect the number of regions in the image.
parentID is a 2D array that stores the pixel IDs at construction. That is, at construction, parentID[y][x] == pixelID for the pixel at $(x,y)$. As you apply the union operations, the parentID[y][x] must be updated to store the pixelID value of the "parent" of the pixel at $(x,y)$.

You should write a pair of (private) helper methods getX and getY that will return the $x$ and $y$ coordinates of the pixel having a given pixelID value. Then, it will be possible to follow the path from any pixel to the root of its "tree" by repeatedly getting the parent node's pixelID from the parentID array until the root of the tree is reached. You should implement the find method to perform this strategy and return the pixelID of the root.. Your implementation must be recursive.

The union operation should first use find to determine the roots of the trees containing pixelID1 and pixelID2. There is nothing more to do if the roots are the same (other than returning false). Otherwise, the two trees must be merged. The merging shall be done by setting the root having the smaller pixelID value be the parent of the other root.

Notice the Quick-Union strategy employed here does not incorporate union-by-size nor path compression!

In the next section, we will work through an example to clarify how the union-find data structures will be used for region labeling.

Resources

These resources are provided for completeness; you do not need to consult any of these for this take-home exam!

Wikipedia's entry on Disjoint-set/Union-find data structure.

CS226 Homepage

Union-Find Data structure