The union-find structure and semantic parsing

In mathematics, a set is a collection of objects, regardless of their nature. For example, we can have a set of sets. Two sets are disjoint if they have no elements in common. A disjoint-set (data) structure (also called union-find structure) is a set of mutually disjoint (i.e. non-overlapping) sets.

A disjoint-set structure is, for example, {{1}, {2}, {3}, {4}, {5}}, whose elements are called singletons because they each have exactly one element (a natural number). Two sets can be merged via the so-called union operation. For example, we can merge {1} and {2}, getting a new disjoint-set structure, namely {{1, 2}, {3}, {4}, {5}}.

The simplest way of implementing a disjoint-set structure is like this:

function makeSet(x)
    x.parent = x
 
function find(x)
    if (x.parent == x)
        return x
    else
        return find(x.parent)
 
function union(x, y)
    xRoot = Find(x)
    yRoot = Find(y)
    xRoot.parent = yRoot

Each set is implemented as a tree and represented by its root. What are disjoint-set structures good for? The can be used, for example, to semantically analyse sentences. Consider the sentence

Brutus stabbed Caesar.

The words occurring in it can be characterised as follows:

WordSyntaxSemantics
BrutusNPBrutus(x1)
stabbedVstab(e1, x2, x3)
CaesarNPCaesar(x4)

Brutus and Caesar are noun phrases (NP) and stabbed is a verb (V). Semantically, stab is a ternary predicate, stab(e1, x2, x3), where e1 is an event representing a stabbing action, x2 is the agent of the action (the stabber) and x3 the undergoer (the stabbee).

We can use a chart (a data structure invented by Alain Colmerauer for efficient parsing) to represent the sentence:

     Brutus(x1)    stabbed(e1,x2,x3)    Caesar(x4)
+—————————————————+—————————————————+—————————————————+

The objects figuring in the sentence (or the complex event) can be represented as a disjoint-set structure: {{e1}, {x1}, {x2}, {x3}, {x4}}. Applying the syntactic rule VP → V NP, we can add a new edge to the chart:

                     stabbed(e1,x2,x3) & Caesar(x4)
                  +———————————————————————————————————+
     Brutus(x1)   |stabbed(e1,x2,x3)    Caesar(x4)    |
+—————————————————+—————————————————+—————————————————+

However, the application of this rule also unifies x3 and x4 so we need to merge the two singletons:

                     stabbed(e1,x2,x3) & Caesar(x4)
                                x3 = x4
                  +———————————————————————————————————+
     Brutus(x1)   |stabbed(e1,x2,x3)    Caesar(x4)    |
+—————————————————+—————————————————+—————————————————+

We now apply the rule S → NP VP, which adds another edge to the chart spanning the whole sentence:

     Brutus(x1) & stabbed(e1,x2,x3) & Caesar(x4)
                   x1 = x2 & x3 = x4
+—————————————————————————————————————————————————————+
|                    stabbed(e1,x2,x3) & Caesar(x4)   |
|                               x3 = x4               |
|                 +———————————————————————————————————+
|    Brutus(x1)   |stabbed(e1,x2,x3)    Caesar(x4)    |
+—————————————————+—————————————————+—————————————————+

This rule application unifies x1 and x2. The disjoint-set structure now is {{e1}, {x1, x2}, {x3, x4}} and if we rename the objects that are equal, the sentence is semantically represented by the formula Brutus(x1) & stab(e1,x1,x3) & Caesar(x3).

NlpAlgorithmSemantics
Avatar for Petr Homola

Written by Petr Homola

Studied physics & CS; PhD in NLP; interested in AI, HPC & PLT

Loading

Fetching comments

Hey! 👋

Got something to say?

or to leave a comment.