The union-find structure and semantic parsing
In mathematics, a set is a collection of objects, regardless of their nature. For example, we can have a set of sets. Two sets are disjoint if they have no elements in common. A disjoint-set (data) structure (also called union-find structure) is a set of mutually disjoint (i.e. non-overlapping) sets.
A disjoint-set structure is, for example, {{1}, {2}, {3}, {4}, {5}}
, whose elements are called singletons because they each have exactly one element (a natural number). Two sets can be merged via the so-called union operation. For example, we can merge {1}
and {2}
, getting a new disjoint-set structure, namely {{1, 2}, {3}, {4}, {5}}
.
The simplest way of implementing a disjoint-set structure is like this:
function makeSet(x) x.parent = x function find(x) if (x.parent == x) return x else return find(x.parent) function union(x, y) xRoot = Find(x) yRoot = Find(y) xRoot.parent = yRoot
Each set is implemented as a tree and represented by its root. What are disjoint-set structures good for? The can be used, for example, to semantically analyse sentences. Consider the sentence
Brutus stabbed Caesar.
The words occurring in it can be characterised as follows:
Word | Syntax | Semantics |
---|---|---|
Brutus | NP | Brutus(x1) |
stabbed | V | stab(e1, x2, x3) |
Caesar | NP | Caesar(x4) |
Brutus and Caesar are noun phrases (NP) and stabbed is a verb (V). Semantically, stab is a ternary predicate, stab(e1, x2, x3)
, where e1
is an event representing a stabbing action, x2
is the agent of the action (the stabber) and x3
the undergoer (the stabbee).
We can use a chart (a data structure invented by Alain Colmerauer for efficient parsing) to represent the sentence:
Brutus(x1) stabbed(e1,x2,x3) Caesar(x4) +—————————————————+—————————————————+—————————————————+
The objects figuring in the sentence (or the complex event) can be represented as a disjoint-set structure: {{e1}, {x1}, {x2}, {x3}, {x4}}
.
Applying the syntactic rule VP → V NP
, we can add a new edge to the chart:
stabbed(e1,x2,x3) & Caesar(x4) +———————————————————————————————————+ Brutus(x1) |stabbed(e1,x2,x3) Caesar(x4) | +—————————————————+—————————————————+—————————————————+
However, the application of this rule also unifies x3
and x4
so we need to merge the two singletons:
stabbed(e1,x2,x3) & Caesar(x4) x3 = x4 +———————————————————————————————————+ Brutus(x1) |stabbed(e1,x2,x3) Caesar(x4) | +—————————————————+—————————————————+—————————————————+
We now apply the rule S → NP VP
, which adds another edge to the chart spanning the whole sentence:
Brutus(x1) & stabbed(e1,x2,x3) & Caesar(x4) x1 = x2 & x3 = x4 +—————————————————————————————————————————————————————+ | stabbed(e1,x2,x3) & Caesar(x4) | | x3 = x4 | | +———————————————————————————————————+ | Brutus(x1) |stabbed(e1,x2,x3) Caesar(x4) | +—————————————————+—————————————————+—————————————————+
This rule application unifies x1
and x2
. The disjoint-set structure now is {{e1}, {x1, x2}, {x3, x4}}
and if we rename the objects that are equal, the sentence is semantically represented by the formula Brutus(x1) & stab(e1,x1,x3) & Caesar(x3)
.