Exploring boolean expression minimisation

January 08, 2025 | 11 minutes, 31 seconds

Background

I took some interest, a few years back, in the idea that you can express logic circuits as algebraic expressions. Buuuuut I'm not an electrical engineer, nor do I design hardware, nor do I actually work with circuits in the slightest. I'm a software engineer (Oh No), and invariably the crap I end up seeing ends up being stuff like:

has_meaningful_value = (input_a || input_b) && ((input_a && !input_b) || (!input_a && input_b))

"But Ally", I hear you proclaim, "surely no programmer in the history of programming actually wrote such terrible Ruby." Lest we forget. Obviously, the first || expression is redundant. Secondly, what ever happened to input_a ^ input_b? Thirdly... every programmer should strive to learn first-order logic and a little bit of higher order logic (because this kind of expression won't pass code review without a very good reason). Here's the truth table:

`input_a`	`input_b`	`has_meaningful_value`
0	0	0
0	1	1
1	0	1
1	1	0

There's a pretty natural question that arises here though: what would happen if we only had access to not(x), and(x) and or(x).

There's something to be had here about SQL, though: this sort of expression is actually more or less necessary when writing "oneof" constraints for tables. Otherwise you're stuck dealing with probably-but-maybe-not-secretly-branchless boolean casts (which won't work with MySQL):

CHECK ( ( ("input_a" IS NOT NULL)::int + ("input_b" IS NOT NULL)::int + ("input_c" IS NOT NULL)::int ) = 1 ),

Here at *checks notes* piecewise.org, we pride ourselves in dealing with some of the most contrived silliness — introducing: piecewise.

Add some piecewise

For now we'll assume that a boolean function is some function $f:{\{0,1\}}^N\to\mathbb{R}$. This will allow us to use most of the tools we're familiar with.

We begin by building up a representation of a univariate function; or, directly from the truth table.

$x$	$\overline{x}$
0	1
1	0

That is:

\[ \begin{align*} \overline{x} &= \begin{cases} 1 & x=0 \\ 0 & x=1 \end{cases} \\ &= (x-1)\begin{cases} \frac{1}{x-1} & x=0 \\ \star & \star \end{cases} \\ &= (x-1)(-1) = 1 - x \end{align*}\]

Super trivial. You can derive the identity the same way.

It is here we run into our first snag: $1-x$ doesn't particularly make sense in logic circuits, mostly because you can't subtract power, you can only really add it... so from here on out we'll only talk about variables and their complements.

We now extend to two variables; consider the following truth table:

$a$	$b$	$f(a,b)$
0	0	0
0	1	1
1	0	1
1	1	1

We can represent this with a piecewise function:

\[ \begin{align*} f(a,b) &= \begin{cases} 0 & a=0 & b=0 \\ 1 & a=0 & b=1 \\ 1 & a=1 & b=0 \\ 1 & a=1 & b=1 \\ \end{cases} \\ &= \begin{cases} \begin{cases} 0 & b=0 \\ 1 & b=1 \end{cases} & a=0 \\ \begin{cases} 1 & b=0 \\ 1 & b=1 \end{cases} & a=1 \end{cases} \\ &= \begin{cases} b & a=0 \\ 1 & a=1 \end{cases} \\ &= b+\begin{cases} 0 & a=0 \\ 1-b & a=1 \end{cases} \\ &= b + a(1-b) \\ &= b + a\overline{b} \end{align*}\]

This expression is known as a sum-of-products, which is a normal form of a boolean expression. In this instance, notice how we have one disjunction (read: or) of terms and two conjunctions of terms, despite the fact that the truth table of $f(a,b)$ is actually the logical disjunction of two variables. This expression, in fact, can also be equivalently written as $a+b$. The reason we didn't achieve the simpler expression is because we assumed $1+1=2$. Can you see where above we did that?

In general, there's a sort of analogue between addition and disjunction, and multiplication and conjunction. The only problem with this, however, is that circuits don't work under the standard addition and multiplication you see under $\mathbb{R}$. They work a little differently, per the table below:

$a$	$b$	$a+b$	$a\cdot b$
0	0	0	0
0	1	1	0
1	0	1	0
1	1	1	1

In fact, $1+1$ ends up just being $1$.

To subtract or not to subtract

We're slowly building up some techniques for generating boolean expressions, but there are a few things we need first before we can create an algorithm for doing so.

The first of these techniques comes from the drawback we saw above: unnecessary conjunctive terms.

Suppose that we have the function $f(1)=p$, $f(0)=q$. We would write it like:

\[ \begin{align*} f(x) &= \begin{cases} p & x=1 \\ q & x=0\ \text{(note $\overline{x}=1$)} \end{cases} \\ &= q + \begin{cases} p - q & x=1 \\ 0 & x=0 \end{cases} \end{align*}\]

iff $p\geq q\iff \lnot(q\land\lnot p)\iff p\lor\lnot q$ (again, we're trying to avoid subtraction; but notice that one can't subtract $1$ from $0$ here). But if this if the case, then $q+p-q=q+p$ anyway and so we yield the expression:

\[ \begin{align*} f(x) &= q + \begin{cases} p - q & x=1 \\ 0 & x=0 \end{cases} \\ &= q + \begin{cases} p & x=1 \\ 0 & x=0 \end{cases} \\ &= q + px \end{align*}\]

We can do the reverse for when $q\geq p\iff q\lor\lnot p$ and we yield the expression $p+q\overline{x}$.

Functions which can be simplified this way are known as unate functions.

General function expansions

For functions which are not positive or negative unate, we can make use of the following expansion. I've covered this technique in previous posts as well, and it bears resemblance to the lagrange polynomial technique:

\[ \begin{align*} f(x) &= \begin{cases} p & x = 1 \\ q & x = 0 \end{cases} \\ &= \begin{cases} p & x = 1 \\ 0 & x = 0 \end{cases} + \begin{cases} 0 & x = 1 \\ q & x = 0 \end{cases} \\ &= \begin{cases} p & x = 1 \\ 0 & x = 0 \end{cases} + \begin{cases} q & \overline{x} = 1 \\ 0 & \overline{x} = 0 \end{cases} \\ &= px + q\overline{x} \end{align*}\]

I've been a bit sloppy with notation here, though: $f$ isn't actually a function in just $x$; it's more a function in $x_1,x_2,\dots,x_n$, and each $p,q$ are functions in $x_1,\cdots,x_{k-1},x_{k+1},\cdots,x_n$, where $x_k$ is the variable we're expanding on. Something akin to:

\[ f(x_1,\dots,x_n) = \begin{cases} p(x_1,\cdots,x_{k-1},x_{k+1},\cdots,x_n) & x_k = 1 \\ q(x_1,\cdots,x_{k-1},x_{k+1},\cdots,x_n) & x_k = 0 \\ \end{cases}\]

This expansion is known as a Shannon expansion (or Boole's expansion theorem, but I know too many Shannons for that to be a good name).

Building an algorithm to write an expression

Every boolean function can expressed as a sum of products, so we attempt to formulate one here recursively. We ingest a truth table in $n$ variables; $2^n$ outputs.

An expression at any stage can be formulated as follows:

\[ f(x_1, \dots, x_n)=\begin{cases} \sum{\{p_k\}} & x_i = 1 \\ \sum{\{q_k\}} & x_i = 0 \end{cases}\]

For some $i\in\{1,\dots,n\}$. It can thus be simplified:

\[ f(x_1, \dots, x_n)=\sum{\{p_k\}\cap\{q_k\}}+\begin{cases} \sum{\{p_k\}\setminus{\{q_k\}}} & x_i = 1 \\ \sum{\{q_k\}\setminus{\{p_k\}}} & x_i = 0 \end{cases}\]

And then the above Shannon expansion can be applied. Therefore:

If $\displaystyle\sum{\{p_k\}\setminus{\{q_k\}}}=\displaystyle\sum{\{q_k\}\setminus{\{p_k\}}}=0$, then $f=\displaystyle\sum{\{p_k\}\cap\{q_k\}}$.
- This handles cases where each case is the same ($0$, $1$ or $p=q$).
If $\displaystyle\sum{\{q_k\}\setminus{\{p_k\}}}=0$ then $f=\displaystyle\sum{\{p_k\}\cap\{q_k\}}+\displaystyle\sum_{s\in\{p_k\}\setminus{\{q_k\}}}{x_i\cdot s}$
If $\displaystyle\sum{\{p_k\}\setminus{\{q_k\}}}=0$ then $f=\displaystyle\sum{\{p_k\}\cap\{q_k\}}+\displaystyle\sum_{s\in\{q_k\}\setminus{\{p_k\}}}{\overline{x_i}\cdot s}$
If $f$ is positive-unate then $f=\displaystyle\sum{\{p_k\}\cap\{q_k\}}+\displaystyle\sum{\{q_k\}\setminus{\{p_k\}}}+\displaystyle\sum_{s\in\{p_k\}\setminus{\{q_k\}}}{x_i\cdot s}$
If $f$ is negative-unate then $f=\displaystyle\sum{\{p_k\}\cap\{q_k\}}+\displaystyle\sum{\{p_k\}\setminus{\{q_k\}}}+\displaystyle\sum_{s\in\{q_k\}\setminus{\{p_k\}}}{\overline{x_i}\cdot s}$
Otherwise, $f=\displaystyle\sum{\{p_k\}\cap\{q_k\}}+\displaystyle\sum_{s\in\{p_k\}\setminus{\{q_k\}}}{x_i\cdot s}+\displaystyle\sum_{s\in\{q_k\}\setminus{\{p_k\}}}{\overline{x_i}\cdot s}$

In natural language: remove all the common conjunctive terms from the pieces of the piecewise function. We add them on at the end, as they're invariant terms in $x_i$. Then we attempt to simplify the cases depending on whether either of the pieces are zero (this is the easiest case), or if the function is unate (by checking each piece).

The only tenable way to check whether $f$ is a unate function or not is to read from the truth table using the rules described above in a prior section (otherwise it would involve re-parsing the expression we've already generated...and you can't really feasibly do that).

Thus the full algorithm goes as such:

If the current truth table has only 2 cases, then perform the above simplification. Otherwise,
partition the current truth table into two on a variable $x_i=1$ and $x_i=0$, and recurse on those partitions. Record whether these partitions are positive unate and/or negative unate so the 2-case step can perform the appropriate simplification.

Except I lied — how do we know what variable to partition on to give the simplest expression?

It turns out that problem is NP-Complete. Go home kids, we're done here.

Not only that, but the whole field of logic optimisation has yielded a bunch of heuristics to determine the best variable ordering.

My terrible heuristic

In the algorithm above, in the step where we record whether the function is positive or negative unate, we can also record the number of intersections and unions.

My implementation of the algorithm chooses the partitioning with the fewest number of cross-partition intersections and the fewest number of additional flags (i.e. positive & negative unate).

This means we can solve problems such as $10110001$, because only one of these cases yields only an intersection, not unateness ($bc+\overline{a}\overline{c}$).

The fatal flaw

The reason why variable ordering is a problem in this solution is the same as why it's a problem in Binary Decision Diagrams (https://en.wikipedia.org/wiki/Binary_decision_diagram).

We also suffer another problem, however: the minimal expression for the truth table $01111110$, which is symmetric across all 3 input variables, will always yield 4 disjunctive terms; not 3, which you would see is the minimum (from a K-Map). BDDs also suffer from the same problem:

flowchart TD a((a)) b((b)) b2((b)) c((c)) c2((c)) a-->|1| b a-.-|0| b2 b-->|1| c b-.->|0| 1 c-.-|0| 1 b2-->|1| 1 b2-.->|0| c2 c2-->|1| 1 1[1]

A super neat BDD visualisation tool can be found at https://nicze.de/philipp/bdds/. BDDs are by far and away more efficient than the algorithm I've designed here though: traversal of the graph yields all disjunctive terms (though not in simplest form).

Coding?

I'm not posting my code. I'm so embarrassed to have even written it. It makes at least total allocations of size $n\cdot 2^n$ bits (if optimally packed, which it isn't) which means that a truth table of 30 variables requires 30GB of RAM to compute (although not necessarily all at once: it makes use of enumerators, which lazily evaluate, sacrificing time instead). I'll leave this implementation to my partner.

Here's my code sobs https://github.com/AllyMarthaJ/Bitplop.

\(a\)	\(b\)	\(a+b\)	\(a\cdot b\)
0	0	0	0
0	1	1	0
1	0	1	0
1	1	1	1

\(x\)	\(\overline{x}\)
0	1
1	0