Saturday, March 10, 2012

2D Affine Transformation as Action on a Triangle

Okay, in this post we'll move on from 2D linear transformations to 2D affine transformations.

The main motivation here is that we want a to be working with a transform algebra that has all of the "nice" behaviors of linear transforms (preservation of co-linearity of points and ratios of distances along lines, composition properties, matrix representation, etc.) but dispenses with the inconvenient and anti-intuitive fixed origin. That is, we'd like to include translations as well as the rotations, dilations, shears, and reflections that we already had at hand with linear transformations, per the previous post.

We can proceed algebraically as follows: we get a transform of the desired type (termed affine) by taking a linear transformation matrix \(A\) and following it up with translation by a vector \(\mathbf{b}\):\[\mathbf{T(x)} = A\mathbf{x} + \mathbf{b}\]It turns out that we can accomplish this computationally with single matrix multiplication via the following trick: we augment all vectors with a "1" at the bottom, and augment all matrices with a row of zeros at the bottom and an extra column to the right for the translation vector. This renders the above equation as:\[\left(\begin{array}{cc}\mathbf{T(x)}\\1\end{array}\right) = \left(\begin{array}{cc}A&\mathbf{b}\\0,...,0&1\end{array}\right)\left(\begin{array}{cc}\mathbf{x}\\1\end{array}\right)\]This checks out if you grind through the matrix multiplication; as you run across each row of the augmented matrix and down the augmented column vector, you end up with the usual computation for the corresponding component of \(\mathbf{T(x)}\), plus a final term of \(b_i\cdot1\) which accomplishes the translation. The final row computes as \(0 + 0 ... + 1\), providing the "1" at the bottom of the computed result vector.

All well and good, but there is a deeper geometric interpretation lurking here as well. All of our vectors and matrices have grown by an extra component, so it appears we are now operating in a space of one higher dimension. All the matrix coefficients are still real, so an affine transformation in, say, \(\mathbb{R}^2\) seems to be uniquely identified with a linear transformation in \(\mathbb{R}^3\). Why is this so?

If you consider the "1" added to the bottom of each column vector, what has been done is that the x-y plane in \(\mathbb{R}^2\) has been mapped to the z=1 plane in \(\mathbb{R}^3\).  So our original origin in \(\mathbb{R}^2\) has been moved off the problematic origin of \(\mathbb{R}^3\) to (0, 0, 1) and can now be moved about freely by a linear transform in \(\mathbb{R}^3\); in fact, a z shear in \(\mathbb{R}^3\) will accomplish a translation of the entire z=1 plane, as illustrated here (forgive the odd transparency artifacts; order-independent transparency is a tough problem and three.js doesn't go there in order to keep complexity down).

For those interested, coordinates transformed in this way to a fixed subspace of a higher dimensional space are known as homogeneous coordinates, more generally belonging to the study of projective geometry.  That general study is beyond the scope of what I'd like to cover here, so for now I'm just cherry-picking some useful techniques and points-of-view.

Okay, so, next, let's consider the higher dimensional analogue of the computational technique described in the previous post for divining a transformation matrix from a set of input and output points. In the previous (linear 2D) case, two input points and two output points allowed us to calculate a unique 2x2 transformation matrix. Two points to two points admitted a characterization of 2D linear transforms by their action on a line segment. To do the same trick for a linear transformation matrix in \(\mathbb{R}^3\) we would require 3 input points and 3 output points. Thus, a 3D linear transform may be characterized by its action on a triangle.

Furthermore, if we restrict the points of our input and output triangles to the z=1 plane, the transformation characterized by the resulting calculated matrix will be closed in the z=1 plane (any point in the z=1 plane transformed by the resulting matrix will be carried to some other point in the z=1 plane). This follows from the fact that the transformation implemented by our result matrix must be at least linear in 3D, combined with the co-linearity preservation quality of all linear transformations (to see this, characterize any input point in the z=1 plane as a weighted vector sum of two of the sides of the input triangle).

Is such a calculated transformation actually affine in the z=1 plane, however? As it turns out, yes. This is easiest to see (well, I think!) by examination of the decomposition taxonomy of linear transformations to scales, shears, reflections, stretches, dilations, and rotations. Consider the effects of each of these restricted to cases were the z=1 plane is closed, and we arrive at a decomposition taxonomy equivalent to a linear transformation in the z=1 plane followed by a translation in the z=1 plane.

Okay, so here is a web toy for playing around with 2D affine transformations. You can click in the image to drag out a couple of triangles, then repeatedly apply the transformation whose action maps the first triangle to the second one, operating on one of several synthetic images. Click the "+" to go offsite and investigate the code.

Compare this to the linear transformation web toy in the previous post to get a feel for the difference between linear and affine transformations. With affine transformations at our disposal, we are almost ready for some geodesic dome programs! More on this in an upcoming post.