Union-Find Data structure
Open the UnionFind.java
file, where you will find a Java interface representing the UnionFind ADT. You must implement this ADT in RasterUnionFind.java
file. This implementation will generally follow the Quick-Union strategy described in the lecture (visit the corresponding notes).
In Quick-Union, each disjoint set is maintained as a general tree.
The union-find data structure comprises a forest (a collection of trees), in which each item is initially the root of its own tree; then, trees are merged by union operations.
Caution
You can find various implementations of the Union-Find data structure online. However, what is asked of you here is different! The specification is highly specialized for the image processing task at hand. I have done this intentionally so that the solution to this take-home exam would not be a Google search away! I strongly advise against looking up implementations of Union-Find. It will likely result in a skew understanding of how it is specified here.
Let's now consider the core operations of the RasterUnionFind
:
-
The
find
operation will take a pixel ID and return an integer. The returned value shall be interpreted as the "label" (or ID) of the set (component or region) the pixel belongs to. Iffind(pixelID1) == find(pixelID2)
, then the two pixels (with IDs ofpixelID1
andpixelID2
) are part of the same region. -
The
union
operation takes the IDs of two pixels (pixelID1
andpixelID2
), merges the sets containing these pixels, and returnstrue
. However, if the two pixels are already part of the same region, theunion
operation does nothing but to returnfalse
.
Notice the RasterUnionFind
class has two fields:
-
count
tracks the number of disjoint sets (connected components or regions). At the start, the number of regions in an image equals the number of pixels in it. As you employ theunion
method to connect the pixels and make larger regions, thecount
decrements. Eventually, thecount
must reflect the number of regions in the image. -
parentID
is a 2D array that stores the pixel IDs at construction. That is, at construction,parentID[y][x] == pixelID
for the pixel at $(x,y)$. As you apply the union operations, theparentID[y][x]
must be updated to store the pixelID value of the "parent" of the pixel at $(x,y)$.
You should write a pair of (private) helper methods getX
and getY
that will return the $x$ and $y$ coordinates of the pixel having a given pixelID
value. Then, it will be possible to follow the path from any pixel to the root of its "tree" by repeatedly getting the parent node's pixelID
from the parentID
array until the root of the tree is reached. You should implement the find
method to perform this strategy and return the pixelID of the root.. Your implementation must be recursive.
The union
operation should first use find
to determine the roots of the trees containing pixelID1
and pixelID2
. There is nothing more to do if the roots are the same (other than returning false
). Otherwise, the two trees must be merged. The merging shall be done by setting the root having the smaller pixelID
value be the parent of the other root.
Notice the Quick-Union strategy employed here does not incorporate union-by-size nor path compression!
In the next section, we will work through an example to clarify how the union-find data structures will be used for region labeling.
Resources
These resources are provided for completeness; you do not need to consult any of these for this take-home exam!
- Wikipedia's entry on Disjoint-set/Union-find data structure.