ciphron / posts


Equivalence Between Integer Comparison and Disjointness Testing of "Prefix" Sets

The idea in this post is due to Dov Gordon who shared with me in late 2016 at George Mason University in connection with some crypto work we were doing. Dov has allowed me to share it here. It might be something that is already out there; I'm not sure, but since I found it particularly interesting and quite relevant to the crypto work we were/are doing, I will describe it here. First I will give some formal definitions, then prove the theorem which is the crux of this post, and finally I will include some Python code that illustrates the idea.

1. The function $\mathsf{bin}_m : \mathbb{N}_0 \to \{0, 1\}^{m}$ indexed by an integer $m$ maps a nonnegative integer to a a binary string of length $m$ corresponding to its $m$-bit binary representation and is defined as $\mathsf{bin}_m(x) = b_{m - 1}b_{m - 2}\cdots b_0$ where $b_i := \lfloor x / 2^i \rfloor \mod{2}$.
2. The function $\mathsf{int} : \{0, 1\}^+ \to \mathbb{N}_0$ maps a non-empty binary string to a nonnegative integer corresponding to the numerical value represented by the binary string and is defined as $\mathsf{int}(b_{k - 1} b_{k - 2} \cdots b_0) \mapsto \sum_{0 \leq i < k} b_i \cdot 2^{i}$.
3. The prefix set associated with a nonnegative integer is the set containing all non-empty substrings of its corresponding binary string. For a nonnegative $n$-bit integer $x \in \mathbb{N}_0$, the prefix set $P_x$ is defined as $$P_x := \{b_{n}\cdots b_{n - i} \in \{0, 1\}^{+} : 0 \leq i \leq n\}$$ where $b_nb_{n - 1}\cdots b_0 = \mathsf{bin}_{n + 1}(x)$. The set of prefix sets is denoted by $\mathbb{P} := \{P_x : x \in \mathbb{N}_0, x < 2^n\}$ and the function $\mathsf{pre} : \mathbb{N}_0 \to \mathbb{P}$ is a function that maps nonnegative integers to their associated prefix set; that is, $\mathsf{pre}(x) = P_x$, where $P_x$ is defined as above for nonnegative integer $x$.
4. The function $\mathsf{inc} : \{0, 1\}^+ \to \{0, 1\}^+$ maps a binary string representing integer $x$ to the binary string representing integer $x + 1$ and is defined as $\mathsf{inc}(s) = \mathsf{bin}_{|s|}(\mathsf{int}(s) + 1)$ where the notation $|\cdot|$ denotes the length of a binary string. By abuse of notation, we write $\mathsf{inc}(P_x) = \{\mathsf{inc}(s) : s \in P_x\}$ to apply $\mathsf{inc}$ to every prefix in a prefix set.
5. The binary relation $\preceq$ is defined on the set of prefix sets $\mathbb{P}$ as follows: $$P_x \preceq P_y := P_x \cap \mathsf{inc}(P_y) = \emptyset$$.
1. The map $\mathsf{pre}$ is a relation-preserving isomorphism between $(\{x \in \mathbb{N}_0 : x < 2^n\}, \leq)$ and $(\mathbb{P}, \preceq)$.

We first prove that for nonnegative integers $x, y < 2^n$: $x \leq y \implies P_x \preceq P_y$. For the case of $x = y$, both prefix sets are equal so adding one to the prefixes in one of the sets results in a disjoint set of prefixes. Now for the case $x < y$. In this case, there exists an $i$ such that bit $i$ is the first bit from left to right where $x$ and $y$ differ and it will be one for $y$ and zero for $x$. The set of prefixes with less than $i$ bits is equal for $P_x$ and $P_y$, so adding one to the prefixes in this subset of $P_y$ will result in a subset disjoint from that of $P_x$. Adding one to the set of prefixes in $P_y$ with $i$ or more bits results in a disjoint set to the set of prefixes in $P_x$ with $i$ or more bits. It remains to prove the other direction; that is, $P_x \preceq P_y \implies x \leq y$. If the prefix sets after adding one to the prefixes in $P_y$ are disjoint, there are two possibilities. First, the original sets $P_x$ and $P_y$ are equal, therefore $x = y$. Secondly, the first bit that $x$ and $y$ differ from left to right, position $i \geq 1$, is one for $y$ and zero for $x$ as otherwise adding one to the prefix with $i$ bits such that bit $i$ is zero would result in a prefix that is equal to that in $P_x$ and therefore we would not have that $P_x \preceq P_y$ (the sets would not be disjoint). Therefore, it follows that $x < y$ and we can conclude the proof.

Here is the code in python to generate the prefix sets and check the $\preceq$ relation between them.

def to_bin(v, nbits):
    bin_rep = [0] * nbits
    i = nbits - 1
    while v != 0 and i >= 0:
        bin_rep[i] = v % 2
        v /= 2
        i -= 1
    return bin_rep

def from_bin_str(bstr):
    bits = map(int, list(bstr))
    i = len(bits) - 1
    v = 0
    power = 1
    while i >= 0:
        v += bits[i] * power
        power *= 2
        i -= 1
    return v

def add_one_to_prefix(prefix):
    nbits = len(prefix)
    v = from_bin_str(prefix) + 1
    if v > 2**nbits - 1:
        nbits += 1
    return ''.join(map(str, to_bin(v, nbits)))

def gen_prefixes(v, nbits):
    bin_rep_s = map(str, to_bin(v, nbits))
    prefixes = set()
    for i in range(nbits):
        prefix = ''.join(bin_rep_s[:i + 1])
        prefixes.add(prefix)
    return prefixes

def lte_prefix(a, b, nbits):
    a_prefixes = gen_prefixes(a, nbits)
    b_prefixes = gen_prefixes(b, nbits)
    new_b_prefixes = set()
    for prefix in b_prefixes:
        new_b_prefixes.add(add_one_to_prefix(prefix))

    return len(a_prefixes & new_b_prefixes) == 0

To support $n$ bits, set the variable $\mathsf{nbits}$ to $n + 1$, call the function $\mathsf{lte\_prefix}$ to evaluate comparision between two integers which is implemented by using the $\preceq$ relation.

Raw form (ExtMarkdown)