Global, Markerless, Rigid-Body Point Cloud Registration

Googling "point cloud registration" will return hundreds of results on different algorithms which can be used to take two 'clouds' of points/coordinates (assumed to be representing the same thing) in 2D or 3D space and transform one cloud such that that the points it contains optimally overlap the points in the second cloud, or in other words, the clouds are optimally aligned with one another. An example application in 2D space is 'image registration', which is regularly used in a biomedical context to align/overlap and hence compare scans from different devices, for example an MRI and PET scan of the same patient's head.

Registration Biomed

MRI and PET scan of the same patient's head registered.

Unfortunately, there are two very large assumptions made in the vast majority of these algorithms, which are not necessarily always true:

1) There are 'markers' in the point clouds, i.e there are at least three points within the two clouds which are known to be the same point in both clouds, in the context of the problem. For example, in the biomedical image registration context from above, it is possible to visually identify features within the two images to be alligned, such as the edges of the skull, which are common to both point clouds and can be used to align the images.

2) The transformation has a certain number of degrees of freedom, often the maximum number possible in the context (translation, rotation, scaling and shearing).


CT scan to point cloud

CT scan to point cloud

The context within which I required point cloud registration was during the development of a novel approach to 3D strain field calculations from CT scans. An object (a bone sample, in this instance), was placed into the CT machine and a 3D scan was taken. A force was then applied to the object in order to generate strain (i.e the object was compressed), and during this a second scan was taken. The object in question was made from a material which contained certain features (homogeneously distributed internal cavities), the positions of which could be found using image processing techniques on the scan data, resulting in a point cloud of the positions of these cavities. The movement of these individual cavities relative to each other between the two point clouds (one from the scan before the compression, and one from the scan during), could then be used to estimate the internal strains in the object.

Unfortunately, this posed some problems with regards to the two assumptions posed above. Firstly, the material in question contained many "markers" which were used to generate the point clouds, but it was impossible to guarantee which of the points were the SAME point between the two data sets, and as such there were no markers which could be used for the alignment/registration. Secondly, the shear and scaling of the second point cloud relative to the first contained the strain information that I wished to extract - so using a unlimited degree-of-freedom algorithm which allowed the point clouds to be scaled or sheared during the alignment would be of no use as it'd alter the results. As such, a rigid-body, marker-less/"blind" algorithm was required.

After this was established and I began researching suitable solutions, it also became apparent that there was a third problem arising. A lot (read: all) of the marker-less solutions to the problem I came across were only able to find the locally optimal solution. Most of the algorithms worked by iteratively attempting to minimize an objective function, usually the mean distance between the closest points between each set, using gradient descent methods. This always resulted in the algorithm finding the local minima, which was not necessarily the global minima (i.e. what we want). In short, these algorithms only worked if the two point clouds were already reasonably close to being aligned at the start - if one cloud was, for example, shifted 90 degrees on one axis, the registration would fail miserably. As such, a "global" solution was also required.

There are presumably many exciting and complicated methods of ensuring that the global minima is found when minimizing an objective function, but I settled on a simpler solution whereby I split the registration into two stages - a course, estimated registration to attempt to get a "ballpark" alignment, followed by the use of an algorithm which would converge to the local minima (which would then hopefully be the global minima due to the initial coarse registration).

Formulating the problem

The outline of an average registration problem can be generalized as such:

\large \mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{B}

Where Y is the original "target" point cloud, X is the point cloud to be mapped to Y, A is a transformation matrix and B is a translational matrix, defined as:

\mathbf{X} = \large \begin{bmatrix}x_1 & \cdots & x_n\\y_1 & \cdots & y_n\\z_1 & \cdots & z_n\\ \end{bmatrix}       \mathbf{Y} = \Large \begin{bmatrix}x'_1 & \cdots & x'_n\\y'_1 & \cdots & y'_n\\z'_1 & \cdots & z'_n\\ \end{bmatrix}

\mathbf{A} = \large \begin{bmatrix}a_1 & a_2 & a_3\\ \vdots & \vdots & \vdots \\ a_n & a_n & a_n \\ \end{bmatrix}        \mathbf{B} = \Large \begin{bmatrix}b_x &\cdots & b_n \\ b_y& \cdots &b_n\\ b_z & \cdots & b_n\\ \end{bmatrix}

Where x_ny_n and z_n are the coordinates of each point n in the cloud, so \mathbf{X} and  \mathbf{Y} are effectively matrices in which the columns are comprised of the coordinates of every point in each cloud.

So, in this scenario, the solution of the registration problem is found through the calculation of the \mathbf{A} and \mathbf{B} matrices which define the transformation that aligns the two point clouds.

This calculation appears reasonably trivial. The A and B matrices can be combined in to a single matrix (with a change to our X matrix) like so:

\mathbf{A} = \large \begin{bmatrix}a_1 & a_2 & a_3 & b1\\ \vdots & \vdots & \vdots& \vdots \\ a_n & a_n & a_n &b_n\\ \end{bmatrix}

\mathbf{X} = \large \begin{bmatrix}x_1 & \cdots & x_n\\y_1 & \cdots & y_n\\z_1 & \cdots & z_n\\1 & \cdots & 1\\ \end{bmatrix}

This gives us:

\large Y = AX

Which is trivial to solve for A, right? Well, if n = 3, as seen here, then it is:


Y and X plotted (left) and Y and AX+b plotted (right)

It is solved easily by multiplying both sides by the inverse of X, giving us YX' = A. However, it is only possible to determine the inverse of a square matrix (i.e. a matrix with the same number of columns and rows), so if we want to use this technique with more than 3 points it will not work out. To do this we need to extend the technique to ensure our X matrix is definitely square.

This is achievable by multiplying a our X by it's transpose, which is guaranteed to produce a square matrix. Thus, we can multiply both sides of our equation by X^T to get a square matrix, of which we can then find the inverse and ultimately find A, like so:

YX^T\large = (AX)X^T \Rightarrow YX^T(XX^T)' = A

This leads to an interesting complication which must be kept in mind. When using more than 3 points, there is no longer a guarantee that a transformation matrix A exists which perfectly maps the points from X on to those in Y. Instead, the solution will provide the least-mean-squares fit, meaning the mean-square error (euclidean distance) between the points occupying the same columns in X and Y is minimized.

That was easy! But wait..

This solution to the general registration problem gives us a solution with no restrictions to the degrees of freedom, so there will be scaling factors and shear factors included in the transformation matrix. This is not acceptable for us as we are looking for a rigid-body solution only. Luckily, there is a different method, proposed in "A Method for Registration of 3-D Shapes" by Paul J. Besl and Neil D. McKay, which enforces our rigid-body condition using singular value decomposition. The detailed mathematics can be found in the paper, so I'll give a quick run down of the process instead:

  1. Formulate the problem as such:

Y\large = RX + T

2. Find the "center of mass" of our two point clouds (i.e. the mean of the x, y and z coords of all points):

\large\bar{X}=\frac{1}{N}\sum\limits_{i=1}^{i=N} \begin{bmatrix}X_{1,i}\\X_{2,i}\\X_{3,i}\\\end{bmatrix}

\large\bar{Y}=\frac{1}{N}\sum\limits_{i=1}^{i=N} \begin{bmatrix}Y_{1,i}\\Y_{2,i}\\Z_{3,i}\\\end{bmatrix}

2. Find the 3x3 covariance matrix, H:

\large H =\sum\limits_{i=1}^{i=N} (X_i-\bar{X})(Y_i-\bar{Y})^T

3. Calculate the SVD and from this calculate our rotational transformation matrix, R (rigid-body equivalent to our transformation matrix A from earlier).

\large [U,S,V]=\text{SVD}(H)

\large R=VU^T

4. Calculate our translation matrix, T (equivalent to our B from earlier):

T = -R\bar{X}+\bar{Y}

And that's it! Our full rigid-body transformation to map X on to Y is fully described by R and T, as Y = RX+T. As with our previous method, this will give us the transformation which results in the minimal mean-square-error. Simple!

What now?

Our current solution assumes we know exactly which points in Y each of our points in X corresponds to. This is not the case in our problem, as we are looking for a marker-less solution. An attempt to use the current solution with the order of the columns in our X matrix randomized (in other words the corresponding pairs of points are incorrect), the registration completely fails as seen on the right here:

Not knowing the pairs of corresponding points makes the problem significantly more complicated, although there are numerous algorithms or varying complexity which can tackle this problem. Probably the simplest (yet fairly effective) algorithm is called Iterative Closest Point and is what I elected to use. The algorithm works as follows:

  1. For each point in our cloud X we pair it with the nearest point in our cloud Y (i.e. assume the closest points between the two sets correspond to one another). The distance measurements should be done using a Kd-tree structure to increase the speed (see my post on Kd-trees!).
  2. Determine the least-mean-squares transformation with our SVD technique, using the points we have matched up in step 1. (i.e. the point in any column in X should have it's nearest neighbour in Y at the same column index).
  3. Transform X using the determined transformation matrix.
  4. Iterate from step 1 until the solution converges (i.e. re-associate the points and find a new transformation until there is no further transformation required).

This technique will converge on the local minima, that is, it will only successfully align our point clouds if the initial offsets/rotations are not too high. This brings us on to our last point.

Global registration

We now have a solution to our problem which can give us a least-mean-square optimized transformation to align our two point clouds. The only issue we have now is that it converges on the local minima, which will not necessarily be the global minima (the solution we want). However, as luck would have it, we can use the handy tool known as Principal Component Analysis to give us a coarse initial registration, which we can then fine tune using our main technique (as we then know the local minima will be the global minima).

The first step is to determine a rough translation. This is very straightforward - we just calculate the position of the center of mass for our two point clouds and then subtract it from the position of every point in the clouds so that they are both centered about (0,0,0).

Following this, we want to yet again calculate a covariance matrix for both X and Y. From now on the steps are the same for both so I will just use a singular example. As we have already removed the center of mass from our data, it is slightly simpler to calculate:

\large H_x =\sum\limits_{i=1}^{i=N} X_iX_i^T

Now we need to calculate the eigendecomposition of H, which will give us the eigenvalues and eigenvectors - which just so happen to have a very useful spatial meaning. The eigenvectors of H will give us three orthogonal axes, which point along the three direction in which the point cloud has the highest variance. As our two point clouds should reasonably resemble each other, it's perfectly reasonable to assume the eigenvectors from our two clouds should be the same if the clouds were aligned. As such, we can perform our coarse registration by determining the rotational transformation which is required to map the eigenvectors from X on to those from Y. The eigenvalues correspond to the variance of the dataset along the relevant axes - which can be a good way to check the results (as the more similar the strains are, the less shears/scales between the data set, and thus the better the rigid-body registration will function). The transformation is simple to calculate:

\large\begin{bmatrix}X_{eig1}\\X_{eig2}\\X_{eig3}\\ \end{bmatrix}T = \begin{bmatrix}Y_{eig1}\\Y_{eig2}\\Y_{eig3}\\ \end{bmatrix}

Where X_{eig1} is the first eigenvector [X_{eig_x},X_{eig_y},X_{eig_z}] and so on. These matrices are 3x3 and as such T is easy to calculate by simply inverting the X_{eig} matrix: T = X'_{eig}Y_{eig}.

Applying this transformation to X will give us a good first guess to the registration, regardless of the original translation or orientation, which will ensure our local minima is our global minima and our registration is successful!

The full matlab code to perform all the steps of this algorithm is as such: (NOTE: I'm using the KD tree implementation from the Statistics and Machine Learning Toolbox, but this can be replaced with another distance-checking method pretty easily)


Kd-Tree Implementation


Today I'll be writing about the implementation of kd-trees. A kd-tree (short for k-dimensional tree) is a type of spacial partitioning data structure, and a special case of BSP trees. What's the point of it? Well, it can be used to organize a vast number of points in space in such a way that searching for the nearest point to a certain coordinate, for example, is far quicker than more traditional methods.

It has many applications, but the one of interest to me (and maybe you, if you're following the Ray-Tracer series), is in photon mapping. Photon mapping is a special procedure carried out before the main ray tracing in which the light sources in the scene emit a large number of photons. The end-position of the photons emitted are stored in a kd-tree, which can be used to quickly determine just how many photons hit close to a point in our scene, which in turn tells us a bit about the lighting at the point. A kd-tree is absolutely necessary for this, as trying to do a nearest-point search using more traditional methods on millions of photons, numerous times for each pixel on the screen, would take far too long. This photon mapping procedure allows the rendering of caustics and diffuse inter-reflection, which can greatly increase the realism of a scene.

A 1 Dimensional Kd-Tree

A 1 Dimensional Kd-Tree

So how does it work? We start with the coordinates of a large number of points in any dimensional space (2d in the case of this example), stored in a big list or array. From there, we first choose a coordinate to use to split the list - let's start with the X coordinate. We find the median x coordinate from all the points, and then split the list up into two smaller lists, one of the points with an x coordinate bigger than the median, one of the points with an x coordinate smaller than or equal to the median. We then recursively use this procedure, starting with the two new lists, but swap the coordinate each time (so we'd use the y coordinate for the second lists, then return to using the x, and so on). When we get to a state where there's only one point left in our lists, we stop the recursion, and the tree is built! Let's check out some pseudo code that can build a list:

This recursive function continues to split the list time and time again, until it is down to a bunch of lists with just a single point in them. At this point, the tree is complete. It's important to note there must be a way of linking each node/list in the tree to the previous nodes, and the following nodes, as this is what makes it possible to traverse/search the tree. In my C# implementation I achieved this by having a Kd-Tree node class which was created at the end of the split_tree function, which held some variables such as a pointer to the instance of the class that instantiated it, as well as pointers to the two instances of the class it then went on to create, and it then called the split_tree function from inside itself. There are however other (and probably better!) ways of doing this.

So no that we have our tree, it's time to use it so search for the nearest point to an arbitrary coordinate. To do this, we simply start at the top of our tree, and work our way through it by comparing the coordinate we're looking for the nearest point to with the median at each stage. Here's some pseudo code:

As you can see, it's another recursive function, which calls itself until it reaches the bottom of the tree. By comparing the coordinates point we're searching for with the median at each node, we make our way down the tree and will eventually wind up at the point closest to the point of interest.

And that's all there is to it! Kd-Trees are extremely useful as they are lightning fast compared to, say, comparing the distances between each point in the list at the point of interest. The larger the number points get, the bigger the speed boost will be, so for photon mapping it will provide a massive speed increase.

Ray Tracing: Part 3


Today we're going to cover the phong illumination model, depth checking and reflection. The phong illumination model is effectively a formula which describes the final colour of each pixel as a linear combination of a number of components:

  • Ambient light
  • Diffuse light
  • Specular light
  • More complex parts (reflections/refractions).

specularityIn our current pseudo code from part 2 we already have ambient lighting terms and diffuse lighting (i.e. our lambertian shading), so now we can look at adding specular lighting. A specular light is effectively a reflection of the light source in the object we're looking at. If the material is very reflective, it is a sharp, clear reflection. If the material is matte, the specular highlight will spread out further and blend into the diffuse shading.

To calculate the specular component, we need to yet again use a dot product to calculate the cosine of an angle between two vectors. In this case, we want to know the cosine of the angle between the reflection of the current eye-ray from the surface of the object and the vector from the surface point to the light source of interest (which we already have in the code from part 2, denoted light_vector). If the reflection vector and the vector to the light were going the same direction, we would expect the specular highlight to be large. This is why the cosine of the angle from the dot product is useful again - cos(0) = 1, so the closer the direction of the two vectors is, the bigger the specular component is.

Firstly, we'll need to calculate the reflection vector. To do this, we can use this handy formula:

\Large R=V-2N(N\cdot V)

Where R is our resultant reflection vector, N is the normal vector (which we derived in part 2), and V is the current ray direction from the "eye." There's a good derivation of this here. Now that we have the reflected vector, finding our specular component is as simple as performing a dot product, then using some math to calculate the component based on the shinyness of the surface. Here's the updated pseudocode:


Figure 1. Specularity

There we have it, specularity! Hopefully, if it's implemented correctly, it should look something like Figure 1. You can play about with the values, specular intensity should be from 0-1, and shininess is generally good from around 50-150.

Now's probably a good time to sort out some kind of depth checking. Objects appearing infront of eachother when they are not is not desirable, after all! The depth checking is relatively simple thankfully, the algorithm being as follows:

In a for loop, check_ray against each sphere. In each iteration, check if the distance to the collision point on this sphere is smaller than the last one, and then so on. When the loop is complete you're left with the collision point nearest to the screen, and you can then proceed as normal! This can be implemented something like:

This can be called in our "plane pixel" loop instead of the "for (every sphere)" section, then our actual shading code can be moved into a sphere_lighting function, which returns the colour of our pixel. That's all there is to the depth checking, really.

Now, we've calculated a nice reflection vector for the specularity, so why not put it to further use with some pretty reflections? This can be a little tricky, as it is a recursive operation. What does this mean? Well, imagine two mirrors facing eachother. Light, or one of our rays, could theoretically get trapped between them forever, so to fix this we need to keep track of the "depth" of the ray and terminate it if it goes too high.

reflection diagram

If your ray shooting functions have been reasonably well structured (unlike mine on my first attempt..), an implementation of the reflections is actually straightforward, like so:

So, firstly we calculate our standard phong illumination model. Then, if the sphere is reflective, we get the colour returned by a new ray sent in the reflection vector direction. This is the recursive part, as the shoot_ray script is calling itself - although the depth is increased with each call, and if it's above a maximum amount the recursion stops.

Figure 2. Reflections

Figure 2. Reflections

Eventually after all the recursion, our initial call of the script will receive the total colour (after all the reflections are added up), which is blended with the colour from the normal lighting, and that's that! Reflections really look great, as can be seen in Figure 2, which used a recursion depth of 3. Any higher than 3 is not usually worthwhile as the reflections get very small at that point, I usually stick with 2.


The next post will probably be on distributed ray tracing - multisample anti-aliasing, and depth-of-field.

Ray Tracing: Part 2


So we've now got a basic ray-tracer which should be able to render some coloured circles on a blank background. Exciting, eh? Next up is defining a light source, shading the spheres, and casting shadows.

lambertian shading

Lambertian shading

To shade our spheres correctly we'll be applying lambertian shading - probably the simplest way to go about doing it, but it has fantastic results. The basic idea that any surface pointing directly at a light source will receive 100% of the light coming from it. If the surface is at an angle to the light source, it will only receive a smaller percentage of the light. 90 degrees or more, and no light will be received. By considering the relation between the angle to the light and the amount of light incident on the surface, we can see that it is a cosine function. cos(0 degrees) = 1, cos(60) = 0.5. cos(90) = 0.

To calculate the percentage of light incident on the surface of our sphere, we can use the dot product. The dot product is defined as:

\Large \frac{A \cdot B}{|A||B|} = cos(\theta)

Where \theta happens to be the angle between the two vectors, A and B, so if we were to compute the dot product of two directional/unit vectors, it would give us the cosine of the angle between them. To find out the cosine of the angle between our current surface and the light, we just need to calculate the dot product of the sphere normal with a vector from the hit position to the light. We can then either use this as V in a HSV colour system to lighten/darken the colour, or simply multiply all RGB values by it. The  normal is a vector which dictates the direction the current point on the surface is facing, which in the case of a sphere is basically a continuation of the radius passing through our point. It is calculated easily by subtracting the position of the center of the sphere from the point we've hit. Note that the cosine of the angle can be negative, so it's necessary to take the maximum of the dot product or 0.

It's also fairly simple to have more than one light source. The code must loop through each light, apply the shading logic from above, then sum up every light's contribution to the pixel.

Here's the lambertian shading added onto the pseudo code from the last post:

shadow rays

Shadow rays

Looking good! Although, what if one sphere is between another and the light source? There should be a shadow cast. However, instead of "casting" shadows, we'll do the opposite - check whether the point we're focussing on is in the shadow of any other sphere by sending out a "shadow" ray. This is really simple to do, as we can use our already existing check_ray function, like so:

The most important part of the code is to ensure the sphere does not check against itself, or all of it will be always shadowed because it thinks that it is casting a shadow on itself. That's all for now - the next post will probably some combination of depth checking, multi-sampling/anti-aliasing, and reflections.

Here's an example of the content of this post in action in my implementation:


Intro to Ray Tracing

I've always wanted to write a ray tracer and this week seemed like a reasonable time to do it! A ray tracer is a program which generates (hopefully) photo-realistic images by casting "light rays" through a number of points/pixels on an imaginary plane in a 3d world, then using information from this ray to set a corresponding point on that ray to the correct colour. Figure 1 depicts and example of this.

ray tracing pic

Figure 1. Basics of Ray Tracing

The reason that ray-tracing can produce such realistic images is that the individual rays can be programmed to behave exactly as light rays would - taking into account reflections, refraction, shadows and so on. The only difference is that they work in reverse.

In real life, light sources produces light rays/particles which propagate outwards. They then bounce off of objects, sometimes being absorbed or transmitted/refracted, and eventually make their way to our eyes. It would not be possible to calculate ray-tracing this way, as billions of rays would have to be radiated in the hope that enough of them would hit our imaginary plane to make a reasonable image. There would be no obvious finite number of rays needed for decent convergence towards a good, final image and it would be impossible to achieve the same ray density per pixel across the whole image. This would result in certain parts of the generated image being much more detailed than others, which is not acceptable.

Instead, rays are created from the viewers' position and are directed through each of the pixels on the plane. The ray can then hit an object, and use some maths to establish how much light is reaching that particular point on the object's surface. A second ray, called a shadow ray, can then be cast from the point the ray hit to the light source to establish whether or not that particular point is in a shadow of another object. It's also possible to send out a reflection ray, which works in the same way as the initial ray but is created from the point the first ray struck - the colour taken from the reflection ray can then be added to the total colour for the pixel and reflections can be generated. Refraction through transparent objects can then be computed in a similar way.

So, where do you start? Well, let's write some pseudo-code to get a super basic ray-tracer up and running:

This code loops through each pixel on our imaginary plane, then works out the direction of the vector from the viewer's position to the current pixel - this is done by subtracting the coordinates of the camera from the "real world" pixel coordinate. This is because any vector, A, between two points B and C, can always be defined as: A = C - B. It's possible to have the camera facing any direction, and then calculate the "plane pixel coordinates" from that (usually using some matrix trickery), but I have opted to skip this step. Instead, our camera is situated at (0,0,0), and facing along the x axis. This means the coordinates of the pixels on the plane are given by (distance to place, px - plane_width/2, py - plane_height/2), normalized and divded by some constant. If you want to work using solely a FOV (field of view) angle, then from some trigonometry that constant should be: plane_width / (distance_to_plane * tan(FOV angle));

It then loops through every sphere in the scene, checking whether the light ray defined by this vector intersects the sphere - if it does, it draws a coloured pixel at the current pixel position on the current render target. It is apparent that this method will have some problems, with depth checking for example, but as a first test it works pretty well.

Before you can get this code up and running, you've got to write the check_ray function. Thankfully, it's reasonably straightforward to establish whether a line, or vector, intersects a sphere. First of all we require a way to mathematically represent our light ray and our spheres. The light ray is simple - it can be composed of a starting position, and a parameterized direction vector, like so:

\Large L = p_0+ t Dir

Where L is the light ray vector, p_0 is the starting point of the ray, t is a variable, and Dir is a directional vector. A sphere can also easily represented in vector form like so:

\Large (p - c)\cdot(p - c)= Radius^2

Where p is any point on the sphere and c is the center of the sphere (a vector relative to the origin, or 0,0,0). This equation is effectively saying that the distance from any point on the sphere to the center of the sphere squared is equal to the radius squared - which makes sense! (A dot product of something with itself gives the magnitude of that thing squared).

Now finding whether or not the line intersects the sphere is as easy as substituting the line equation in to the sphere equation.  After some manipulation, this gives a quadratic equation of the form ax^2 + bx +c = 0, where:

a = Dir \cdot Dir

b = 2(p_0 - c)\cdot Dir

c =(p_0-c)\cdot(p_0-c)-Radius^2

As expected with all quadratic equations, this leaves us with three possible results. The determinant, b^2 - 4ac, will allow us to determine which of these results we'll get. If it is less than 0, there will be no real results for t, which means the ray has not intersected the sphere in question. If it is equal to 0, it means there will be one real result for t, which means that the ray intersects the sphere as a tangent, ie at only one point. If it is greater than 0, there will be two results for t, meaning the ray has passed directly through the sphere. If this is the case, we want to return the smaller of the two values for t, as this is where the ray first intersects the sphere. This is illustrated in Figure 2.


Figure 2. Determinants

Now we can piece the code together, using the general solution to a quadratic equation:

\Large t=\frac{-b\pm\sqrt{b^2 - 4ac}}{2a}

Here is a pseudo-code example:

You may notice that instead of true/false, the script actually returns t1. It is possible to go on to multiply t1 by the components of the Dir light ray vector to find the position in the world at which the intersection occurred - this result will be used later on.

And there you have it! Hopefully you can now get a super simple ray-tracer up and running. The next post will cover lighting and shadows.


CNC/X-Y Table Design

I've recently been looking for a fun, useful project to design and build. I wanted something that brings together electronics, mechanics and programming. My idea? A CNC/X-Y table.

An X-Y Table is a moving platform which is precisely controlled by a computer or embedded electronics. It is the driving mechanism behind laser cutters, CNC routers and so on.  I've decided to design and build a small, 30x30cm table, with X-Y axis movement (at least for starters). The table motion will be actuated by two threaded rods powered by stepper motors. The stepper motors themselves will be controlled by an arduino, which will interpret data fed to it from a CAD program I will write. Simple, right?

Initial Design

Initial Design

I've already made a number of design choices here, which I will explain. Firstly, why threaded rods and stepper motors? Well, the rods I have bought are standard M8 size, which have a pitch of 1mm. This means that for every revolution of the rod, whatever is attached to it will move along the rod by 1mm. The rods are going to be powered by 28BYJ-48 stepper mottors, which from the specs it can be seen that they can take a step of just 0.879 degrees. This means that the table will be able to move distances as small as 0.0024mm, at least in theory. In practice, the precision will probably not come close to that value due to the tolerances in the creation of the rods and the cheapness of the motors - still, it should be more than good enough for my purposes.

So, why arduino controlled? The choice was between an arduino and a raspberry pi, but it was narrowed down by the fact I'm currently without a keyboard for my raspberry pi, and I have more ideas for that in the future. The arduino has just enough pins to control 3 stepper motors (via small controller pcbs), and with one to spare. I'm considering writing my own library for the stepper motor control, but for the time being/prototyping I'm going to be using AccelStepper, which seems to do a great job.

Lastly, why re-invent the wheel by designing my own CAD program and numerical control language? Well, the industry standard NC language is known as G-Code, which is what the vast majority of CAD programs will output. There's also a G-Code interpreter library for arduino. This is all well and good, but I've decided to invent my own NC language as I don't feel comfortable relying on libraries which I don't fully understand, plus it's much more fun to start from scratch. Plus, I'll get total control of how it works, and the language will be tailored to what my table is capable of. Unfortunately, no CAD programs will be able to output my NC language, so I'm also going to have to program my own simple one to design the movements of the table.

On to the design! I've draw up a few concepts, and started modelling a few in Solidworks. You can see the current design here:

As you can see, it's composed of three layers - a base, and X axis and a Y axis with a flat top. The three layers are support by runners which allow them to move along their respective axes. The motors/rods are not currently modelled, but will be soon. This design seems fairly good, but there is a lot of "dead space" in between the layers which may not be necessary and will probably be revised. I should point out - it's not entirely to scale, and looks a lot taller than it will be - the next draft will be done to scale so I can do some motion studies on it.

Inverse Kinematics

What do skeletal animations and robotics have in common? Inverse kinematics! What is inverse kinematics? The opposite of kinematics, unsurprisingly...

Imagine you've got a number of rigid bars each connected by joints, with one end of the structure pinned and one end free to move. Kinematics can be used to find the position of the free end compared to the pinned end by considering the lengths of the bars and the angles of the joints and applying a little trigonometry.

Inverse kinematics is therefore the opposite - given the lengths of the rods and the position of the free end, what angles do the joints need to be at? The solution to this problem is actually the basis of the motion robotic arms make. The way in which robotic arms move is by changing the angle of their joints - much like real arms - so being able to get the tip of a robotic arm at a certain position in space requires inverse kinematics.

Here's the inverse kinematics solver program in full swing, just click anywhere and the robotic arm will position it's hand at that point.

I'm deriving an inverse kinematic equation for a system with two bodies and two joints, as I'm going to be using it to write a skeletal animation program focused on human bodies. This will make animating easy as it can be applied to the legs or arms of the character. A solution for a system with only two bodies is also much easier to derive - I might look at a formula for a system with n bodies further down the line (animating an octopus anyone?).

To start, here's a diagram of what's going on. Imagine the green line is an upper arm, and the red line a forearm. If we want the person's hand to be at a certain coordinate, we need to solve for a few angles.




First up, it's possible to solve for the angles a and b using the Law of Cosines.

\Large a = cos^-1(\frac{A_1^2 + L^2 - A_2^2}{2A_1L})

\Large b = cos^-1(\frac{A_1^2 + A_2^2 - L^2}{2A_1A_2})

Now, using these results it's possible to calculate our angles \phi and \theta. If we call the angle the line L makes with the X axis XL, then:

\Large \phi = XL - a

Now we're nearly there! If we now imagine drawing a vertical line from the elbow joint to the X axis, we can calculate the angle the new line makes with the upper arm by knowing the interior angles of a triangle are equal to 180°. Now we know all of the angles around the elbow joint, so it's simple to calculate \theta:

\Large \theta = 360 - b - 90 - (180 - \phi - 90)

\Large \theta = 180 - b + \phi

There you have it! The angles that the joints need to make with the X axis to solve the problem. An interesting point to note is that there are actually two solutions to the problem. Imagine the image above was mirrored along the line L - that'd be the second solution! In a 2d problem with 2 joints, there are always 2 solutions. It gets a lot more complicated as further joints are added, as there are more possibilities. In 3D it gets even more complicated - there are infinite solutions. This is due to the same reason we have two solutions in our scenario, it's possible to flip any solution along the vector between the start and end of the arm. In 3D, there are an infinite number of "flips" you can make, as you can just rotate everything around the line L.

The implications of the numerous solutions is that often the one you wish for in a certain scenario will not be the one you attain. As a result, it's necessary to write criteria or conditions for the various solutions so that the most natural pose is attained. I'll look into this further in the future!


Acoustic Source Modelling

I recently decided to try writing a 2d acoustic modelling program. It's still a massive work in progress, but it's already quite interesting to experiment with. First up, some theory! (feel free to skip to the bolded text if you're not so interested in derivations).

The acoustic monopole source is the basic building block of acoustic modelling. The equation for the monopole source is derived by considering a pulsating sphere of radius a. Assuming the velocity on the surface of the sphere is given by \tilde{U_a} the equation for the acoustic pressure fluctuations caused by the sphere as a function of radial distance r is given by:

\Large \tilde{p}(r) = \rho _0 ca(\frac{jka}{1+jka}) \frac{ \tilde{U}_a e^{-jk(r-a)}}{r}

However, if  \tilde{U_a} is instead written as a volume velocity \tilde{q} = 4\pi a^2 \tilde{U}_a it's possible to take the limit a \xrightarrow{}{} 0 whilst keeping \tilde{q}constant. The result? The equation for a monopole source, of course!

\Large\tilde{p}(r) = \frac{j\omega \rho _0 \tilde{q} e^{-jkr}}{4\pi r}

Now, we're nearly ready to implement it into a program. The last step is to take the real part, as it's no use as it is. The final equation we get is:

\Large\tilde{p}(r) = \frac{\omega \rho _0 \tilde{q} sin(k r + \phi)}{4\pi r}

That's the end of the boring theory! On to the program. Written in C#, it uses a 2D array to create a grid in which it can store a value for the pressure at a number of points. The resolution is variable, but it was set at 2 pixels-per-grid-point in the images in this post (meaning for each pressure value in the grid there's a 2x2 square onscreen). The colour corresponding to the grid points on each corner of the squares is interpolated to get a smoother image.


Figure 1. Two monopole sources interacting

Monopole sources can  be added by left clicking in the modelling space. Each source iterates through the grid, adding to the pressure at each point by calculating the distance to the point and then using the equation above.

I then went on to add hard boundaries (currently with no absorption, but that would be trivial to add). I utilised the fact that a reflected wave is equivalent to an equal wave propagating in the opposite direction (but 180 degrees out of phase), so when a wall is added each source creates a "phantom" source the other side of the boundary which generates the reflected wave.

Unfortunately, whilst an interesting use of the properties of reflected waves, this didn't pan out so well when I wanted numerous hard surfaces, as phantom sources would then need to make their own phantom sources for additional walls and so on and so forth, ending in a nightmare.


Figure 2. Monopole source interacting with a rigid boundary

My solution to this was to re-write the monopole code to cast rays, using Bresenham's line algorithm. Now the rays can determine the pressure at any grid point using the same equation from above, but also reflect off any surface any number of times.

The result was a slightly less accurate, grainier image - and much slower processing - but also one that resulted in a lot less headaches.

I want to expand the program to visualize the pressure on a 3D surface plot, and allow the user to plot dipoles, quadrapoles, etc which can be made either as a number of out-of--phase dipoles, or using their relevant equations. Hopefully I'll get a chance to do this soon! It'll probably involve re-writing from scratch, so we will see.