The wild world of indeterminates

July 17, 2022 | 13 minutes, 14 seconds

Introduction

Surprise surprise, I'm back with some more party tricks and fun things to share with all, while I work on my notes! You can find my notes written thus far on GitHub, and I'd always appreciate some feedback!

Indeterminates are an point of interest for anyone playing around with piecewise objects and corresponding APOs. Particularly in the context of multivariate or differentiation problems, we can quickly run into tedious and inconsistent methods for attempting to come up with solutions. Namely, we largely tend to deviate from induction for high level problems; we instead attempt induction on lower level theorems or ideas, and focus on application later on. To solve this, we can make use of objects called indeterminates to represent a certain set of properties or ideas that inevitably allow us to solve a class of problems.

In this post, I'll present the motivation and solution to two issues encountered when working with APOs: multivariate interpolation and interpolation with differentiation (derivatives at a point).

The dual numbers

The dual number system provides a system that provides what we call a nilpotent; a number such that when it's raised to some power of at least \(n\in\mathbb{Z}\), it vanishes. In this instance, we provide a nilpotent called epsilon, \(\varepsilon\), such that \(\varepsilon^2=0\). There are several ways of representing this element, like the imaginary numbers, such as via a matrix:

\[ \varepsilon=\begin{pmatrix} 0 & 1 \\ 0 & 0 \end{pmatrix}\]

We won't actually worry about this definition, as we'll need some way to extend it to higher powers eventually, which we'll introduce later. Instead, we focus on the property that \(\varepsilon^2=0\), and operations such as multiplication and addition on it are commutative in the usual sense, for ease - it thus remains an indeterminate; we don't know the 'value' of \(\varepsilon\), but we can work with it.

\(\varepsilon\) has a special property: let us take some dual number, \(a+b\varepsilon\). Then \(f(a+b\varepsilon)\) for some analytic function is given by \(f(a)+f'(a)b\varepsilon\).

This can be shown quite easily; let us define some function \(F(x,y)=f(x+y\varepsilon)\). Then \(\frac{\partial F}{\partial y}=f'(x+y\varepsilon)\varepsilon\). Repeating this again gives \(\frac{\partial^2 F}{\partial y^2}=f''(x+y\varepsilon)\varepsilon^2=0\).

From here, we can solve this fairly simple PDE: \(F(x,y)=A(x)y+B(x)\) for some functions \(A(x)\), \(B(x)\). Letting \(y=0\) gives us \(B(x)=f(x)\). Furthermore, differentiating \(F(x,y)\) with respect to \(y\) gives \(\frac{\partial F}{\partial y}=A(x)=f'(x+y\varepsilon)\varepsilon\), for all \(y\in\mathbb{R}\). Setting \(y=0\) again gives \(A(x)=f'(x)\varepsilon\). Therefore, \(F(x,y)=f'(x)\varepsilon+f(x)\):

\[ f(x+y\varepsilon)=f'(x)\varepsilon+f(x)\]

Alternatively, one can observe the taylor expansion of \(f\); all terms for \(n>1\) vanish under \(\varepsilon^n\).

First-order differentiation APO problems

The useful component, in our context, of dual numbers, is the following relationship:

\[ f(a_0)=a_1\land f'(a_0)=a_2\iff (f(a_0)=a_1\land f(a_0+\varepsilon)=a_1+a_2\varepsilon)\]

We can provde this as follows - going from left to right, we have that \(f(a_0+\varepsilon)=f(a_0)+f'(a_0)\varepsilon=a_1+a_2\varepsilon\). Obviously \(f(a_0)=a_1\) still holds, hence proven.

From right to left, we have that \(f(a_0+\varepsilon)=f(a_0)+f'(a_0)\varepsilon\). Equating this with \(a_1+a_2\varepsilon\) and rearranging, we get that \(f(a_0)-a_1=(a_2-f'(a_0))\varepsilon\). Since by \(f(a_0)=a_1\), the left hand side is zero, giving us \((a_2-f'(a_0))\varepsilon=0\). Furthermore, since \(a_2-f'(a_0)\) is strictly real, we have that \(f'(a_0)=a_2\) as desired.

The reason this relationship is so useful is because APO problems map function inputs explicitly to its outputs directly, rather than inputs to transformations of function outputs (such as the derivative). An example is as follows:

We want to find a function such that \(f(0)=0\), \(f'(0)=0\), \(f(1)=1\) and \(f'(1)=0\). Using our above relationship for the derivatives, we have a function \(F\) such that:

\[ F(\varepsilon)=0 \\ F(1+\varepsilon)=1\]

Representing this as an APO, we get:

\[ F(x)=\begin{cases} 0 & x=0 \\ 0 & x=\varepsilon \\ 1 & x=1 \\ 1 & x=1+\varepsilon \\ \star & \star \end{cases}\]

This gives us:

\[ F(x)=x(x-\varepsilon)\begin{cases} \frac{1}{1-\varepsilon} & x=1 \\ \frac{1}{1+\varepsilon} & x=1+\varepsilon \end{cases}\]

Luckily, we can find the values of these. We multiply and divide a 'conjugate' in each piece, giving us:

\[ F(x)=x(x-\varepsilon)\begin{cases} 1+\varepsilon & x=1 \\ 1-\varepsilon & x=1+\varepsilon \end{cases}\]

Finally, we get:

\[ F(x)=x(x-\varepsilon)\left(1+\varepsilon-(x-1)\frac{2\varepsilon}{\varepsilon}\right)\]

In this circumstance, we 'let' \(\frac{2\varepsilon}{\varepsilon}=2\), despite \(\frac{1}{\varepsilon}\) not being defined. This gives us:

\[ F(x)=x(x-\varepsilon)(3-2x+\varepsilon)\]

Since we need only the real part, we expand and then take \(f(x)=3x^2-2x^3\), as desired. This only works because the non-real parts vanish when we let \(x=a+b\varepsilon\), and our corresponding inputs/outputs are encoded in the function by definition. Once again we go into more detail in the notes.

Higher order differentiation APO problems

We extend the dual numbers in such a way that for an \(n\)-extended dual number system, we have for all \(i\in\{1,2,\dots,n\}\), we define \(\varepsilon_i\) such that \(\varepsilon_i^{i+1}=0\). This notably gives a similar set of relationships as before:

\[ \varepsilon_i^{i+1}=0\iff \frac{\partial^{i+1} F}{y^{i+1}}=0\]

For some function \(F(x,y)=f(x+y\varepsilon_n)\). This is interesting because there is now a direct correspondence between derivatives and our indeterminate:

\[ f(a_0)=b_0\land f'(a_0)=b_1\land\dots f^{(n)}(a_0)=b_n\iff\\ f(a_0)=b_0\land f(a_0+\varepsilon_1)=b_0+b_1\varepsilon_1\land\dots\land f(a_0+\varepsilon_n)=b_0+\dots+\frac{b_n}{n!}\varepsilon_n^n\]

We can apply this similarly as before, with far more algebra. When encountering expressions like:

\[ \frac{\varepsilon_2}{\varepsilon_2-\varepsilon_1}\]

We can assign them values which ultimately do not affect our solution; in this instance, we multiply numerator and denominator by \(\varepsilon_2+\varepsilon_1\) (note this is also a nilpotent expression, so this isn't actually really permitted) which gives \(\frac{\varepsilon_2^2}{\varepsilon_2^2}\): we therefore assign the value \(1\) to it.

Let us now work with the example for which we have a function \(f:\mathbb{R}\to\mathbb{R}\) such that \(f(0)=0\), \(f'(0)=0\), \(f''(0)=0\) and \(f(1)=1\).

Then, we have a function \(F\) such that \(F(0)=0\), \(F(\varepsilon_1)=0\), \(F(\varepsilon_2)=0\) and \(F(1)=1\):

\[ F(w)=\begin{cases} 0 & w=0 \\ 0 & w=\varepsilon_1 \\ 0 & w=\varepsilon_2 \\ 1 & w=1 \\ \star & \star \end{cases}\]

Finding a solution, we get:

\[ F(w)=\frac{1}{1(1-\varepsilon_1)(1-\varepsilon_2)}w(w-\varepsilon_1)(w-\varepsilon_2)\]

Dealing with the leading coefficient, notice that expanding the denominator gives \(1-\varepsilon_1-\varepsilon_2\). We can multiply and divide by \(1+\varepsilon_1-\varepsilon_2\) making the denominator \((1-\varepsilon_2)^2\). Furthermore, we can do the same with \((1+\varepsilon_2)^2\) making the denominator \(1\), and the numerator \((1+\varepsilon_2)^2(1+\varepsilon_1-\varepsilon_2)\).

Regard we only wish to obtain the real part; so, from the coefficient, we only take the \(1\) part, leaving us to deal with the remaining \(x(x-\varepsilon_1)(x-\varepsilon_2)\). Expanding, and once again taking the real part, leaves us with the solution \(f(x)=x^3\). Verify this for yourself for a nice little sanity check.

We've played it pretty loose here by taking the real part; one could fairly easily do this rigourously, but it would be tedious. The reason we take the real part is that our desired corresponding output values are strictly real; it stands to reason, therefore, that the entire real component of the function should give us a function that is not only strictly real but corresponds to our desired values.

Interestingly, we find that typically Lagrange-style interpolation fails on this style of problem as we are required to divide by nilpotent values frequently. We can therefore only deal with a Newton-style process to achieve what we want.

Combining with exponential extraction

We can also perform the same above process but with the exponential extraction; we can make use of the very handy Taylor expansion to wrangle exponential functions to work with our interpolation with derivatives.

Let us denote a function \(f:\mathbb{R}\to\mathbb{R}\) such that \(f(0)=1\) and \(f'(0)=1\).

Then we can define a function \(F\) corresponding to \(f\) such that \(F(0)=1\), \(F(\varepsilon_1)=1+\varepsilon_1\) and \(F(1)=2\), giving us:

\[ \begin{align*} F(w) &= \begin{cases} 1 & w=0 \\ 1+\varepsilon_1 & w=\varepsilon_1 \\ \star & \star \end{cases} \\ &= \left(\begin{cases} (1+\varepsilon_1)^{\frac{1}{\varepsilon_1}} & w=\varepsilon_1 \\ \star & \star \end{cases}\right)^{w} \\ \end{align*}\]

Note that we can define \((1+\varepsilon_1)^{\frac{1}{\varepsilon_1}}\) by exponentiating and applying the logarithm simultaneously: we get \(\exp\left(\frac{1}{\varepsilon_1}\ln(1+\varepsilon_1)\right)\). We note that by definition \(\ln(1+\varepsilon_1)=\varepsilon_1\) and so we simply get \(e\) as our value. Continuing:

\[ \begin{align*} F(w) &= \left(\begin{cases} e & w=\varepsilon_1 \\ \star & \star \end{cases}\right)^{w} \\ &= e^w \end{align*}\]

It's a little bit more difficult to extract the real part here, but using \(w=x+y\varepsilon_1\), we get \(e^x\cdot(1+y\varepsilon_1)\) by definition. Therefore, we can take \(f(x)=e^x\) as desired.

Multivariate interpolation

The complex numbers

The complex numbers are more or less a natural way to map between 2 dimensional vectors and some 'number' which we can algebraically manipulate, with things like multiplicative inverses (division by a complex number) being well-defined.

Notably, in our 2 dimensional interpolation problems we can map \((x,y)\mapsto x+iy\), simplify, and then take the real part as per our reasoning in the differentiation interpolation problems to obtain our solution. This is a fairly straightforward approach and nothing out of the ordinary occurs when approaching the problem, but we quickly run into a problem: how do we extend this idea to \(n\) dimensions?

Extending complex numbers

One might have the idea of using say, the quaternions, as an extension to the complex numbers for dealing with these multivariate interpolation problems, but these well-defined extensions are limited. Instead, we make use of the indeterminates to construct our own well-defined extension for our own needs.

The first property we identify as needing is the idea of a well-defined multiplicative inverse for each of our indeterminates. We then next need a formula for finding the reciprocal of an extended complex number. Doing this is not very straightforward, so let us start in two dimensions once again:

Suppose that we have some \(\omega\) such that \(\omega^2\in\mathbb{R}\implies \omega^2=\lambda\) and that \(\omega\not\in\mathbb{R}\). Furthermore, we require that for all \(a,b,c,d\in\mathbb{R}\), \(a+b\omega=c+d\omega\iff a=c\land b=d\).

Then, to define the multiplicative inverse of such a number, let us define \(w=(a+b\omega)^{-1}=x+y\omega\). Then from here, use the fact that \((a+b\omega)^{-1}(a+b\omega)=1\) to give us:

\[ 1 = (x+y\omega)(a+b\omega)\]

Solving for \(x,y\) using the above properties gives us:

\[ x=\frac{a}{a^2-\lambda b^2} \\ y=\frac{-b}{a^2-\lambda b^2}\]

We want these values to be defined for all \(a,b\in\mathbb{R}\) except probably at zero; the only value of \(\lambda\) which gives us this property is \(\lambda=-1\). This, not coincidentally, exactly matches the complex numbers.hat.

Where we differ is extensions of this same problem to higher dimensions. We instead define \(\omega_1\), \(\omega_2\), ... , \(\omega_n\) so that \(\omega_1^2=\lambda_1\in\mathbb{R}\) and so on. We also do this for the cross products (\(\omega_i\cdot\omega_j\) for \(i\neq j\)).

From here, we inductively find that for all \(i\neq j\) we have that \(\omega_i^2=-1\) and \(\omega_i\omega_j=0\). Conveniently, this gives, in \(n\) dimensions, the definition of the multiplicative inverse as:

\[ (a_0+a_1\omega_1+\dots+a_n\omega_n)^{-1}=\frac{1}{a_0^2+\dots+a_n^2}(a_0-a_1\omega_1-\dots-a_n\omega_n)\]

And thus we are now equipped to handle higher dimension problems using indeterminates.

So, for example, let us take the function \(f:\mathbb{R}^3\to\mathbb{R}\) so that \(f(0,0,0)=0\) and \(f(1,1,1)=1\). We find an equivalent function \(F\) such that \(F(0)=0\) and \(F(1+\omega_1+\omega_2)=1\). From there we get:

\[ \begin{align*} F(w) &=\begin{cases} 0 & w=0 \\ 1 & w=1+\omega_1+\omega_2 \\ \star & \star \end{cases} \\ &=w\begin{cases} \frac{1}{1+\omega_1+\omega_2} & w=1+\omega_1+\omega_2 \\ \star & \star \end{cases} \\ &=\frac{1}{3}(1-\omega_1-\omega_2)w \end{align*}\]

We now wish to expand and take the real part:

\[ \begin{align*} F(x+y\omega_1+z\omega_2) &=\frac{1}{3}(1-\omega_1-\omega_2)(x+y\omega_1+z\omega_2) \\ &= \frac{1}{3}(x+y+z)+(\text{some non-real part}) \end{align*}\]

So we find a solution to \(f\) is given by \(f(x,y,z)=\frac{1}{3}(x+y+z)\).

Intuitively, the reason we require a multiplicative inverse from this construction and not from our differentiation problems is that we haven't manipulated the outputs according to the inputs. Rather, we've remapped the inputs without remapping the outputs and so we need some ability to compensate for that.

Conclusion

The technique of creating some structure to rephrase an idea for a problem is incredibly powerful. While it may not necessarily be the most efficient method for such a problem, it allows one to get an idea for some of the potential outputs one would expect, especially if one has results already to sanity check against.

In this instance, we mapped the idea of differentiation as an operation to differentiation as more or less a space composed of indeterminates - so rather than thinking of differentiation as an operation, we think of differentiation as an extension of real space (we pick a finite number of derivatives and a finite number of points and attempt to connect them).

We also attached the idea of multiplication and inverses to vectors by mapping such vectors to a set of indeterminates. In this way, we're able to manipulate such coordinate systems to come up with ways to connect said coordinates in real space.

The ideas presented here are two major instances of reframing problems we need to solve to problems which we are already familiar with and that can be applied and understood fairly easily.