Kd-Tree Implementation


Today I'll be writing about the implementation of kd-trees. A kd-tree (short for k-dimensional tree) is a type of spacial partitioning data structure, and a special case of BSP trees. What's the point of it? Well, it can be used to organize a vast number of points in space in such a way that searching for the nearest point to a certain coordinate, for example, is far quicker than more traditional methods.

It has many applications, but the one of interest to me (and maybe you, if you're following the Ray-Tracer series), is in photon mapping. Photon mapping is a special procedure carried out before the main ray tracing in which the light sources in the scene emit a large number of photons. The end-position of the photons emitted are stored in a kd-tree, which can be used to quickly determine just how many photons hit close to a point in our scene, which in turn tells us a bit about the lighting at the point. A kd-tree is absolutely necessary for this, as trying to do a nearest-point search using more traditional methods on millions of photons, numerous times for each pixel on the screen, would take far too long. This photon mapping procedure allows the rendering of caustics and diffuse inter-reflection, which can greatly increase the realism of a scene.

A 1 Dimensional Kd-Tree

A 1 Dimensional Kd-Tree

So how does it work? We start with the coordinates of a large number of points in any dimensional space (2d in the case of this example), stored in a big list or array. From there, we first choose a coordinate to use to split the list - let's start with the X coordinate. We find the median x coordinate from all the points, and then split the list up into two smaller lists, one of the points with an x coordinate bigger than the median, one of the points with an x coordinate smaller than or equal to the median. We then recursively use this procedure, starting with the two new lists, but swap the coordinate each time (so we'd use the y coordinate for the second lists, then return to using the x, and so on). When we get to a state where there's only one point left in our lists, we stop the recursion, and the tree is built! Let's check out some pseudo code that can build a list:

This recursive function continues to split the list time and time again, until it is down to a bunch of lists with just a single point in them. At this point, the tree is complete. It's important to note there must be a way of linking each node/list in the tree to the previous nodes, and the following nodes, as this is what makes it possible to traverse/search the tree. In my C# implementation I achieved this by having a Kd-Tree node class which was created at the end of the split_tree function, which held some variables such as a pointer to the instance of the class that instantiated it, as well as pointers to the two instances of the class it then went on to create, and it then called the split_tree function from inside itself. There are however other (and probably better!) ways of doing this.

So no that we have our tree, it's time to use it so search for the nearest point to an arbitrary coordinate. To do this, we simply start at the top of our tree, and work our way through it by comparing the coordinate we're looking for the nearest point to with the median at each stage. Here's some pseudo code:

As you can see, it's another recursive function, which calls itself until it reaches the bottom of the tree. By comparing the coordinates point we're searching for with the median at each node, we make our way down the tree and will eventually wind up at the point closest to the point of interest.

And that's all there is to it! Kd-Trees are extremely useful as they are lightning fast compared to, say, comparing the distances between each point in the list at the point of interest. The larger the number points get, the bigger the speed boost will be, so for photon mapping it will provide a massive speed increase.

Ray Tracing: Part 3


Today we're going to cover the phong illumination model, depth checking and reflection. The phong illumination model is effectively a formula which describes the final colour of each pixel as a linear combination of a number of components:

  • Ambient light
  • Diffuse light
  • Specular light
  • More complex parts (reflections/refractions).

specularityIn our current pseudo code from part 2 we already have ambient lighting terms and diffuse lighting (i.e. our lambertian shading), so now we can look at adding specular lighting. A specular light is effectively a reflection of the light source in the object we're looking at. If the material is very reflective, it is a sharp, clear reflection. If the material is matte, the specular highlight will spread out further and blend into the diffuse shading.

To calculate the specular component, we need to yet again use a dot product to calculate the cosine of an angle between two vectors. In this case, we want to know the cosine of the angle between the reflection of the current eye-ray from the surface of the object and the vector from the surface point to the light source of interest (which we already have in the code from part 2, denoted light_vector). If the reflection vector and the vector to the light were going the same direction, we would expect the specular highlight to be large. This is why the cosine of the angle from the dot product is useful again - cos(0) = 1, so the closer the direction of the two vectors is, the bigger the specular component is.

Firstly, we'll need to calculate the reflection vector. To do this, we can use this handy formula:

\Large R=V-2N(N\cdot V)

Where R is our resultant reflection vector, N is the normal vector (which we derived in part 2), and V is the current ray direction from the "eye." There's a good derivation of this here. Now that we have the reflected vector, finding our specular component is as simple as performing a dot product, then using some math to calculate the component based on the shinyness of the surface. Here's the updated pseudocode:


Figure 1. Specularity

There we have it, specularity! Hopefully, if it's implemented correctly, it should look something like Figure 1. You can play about with the values, specular intensity should be from 0-1, and shininess is generally good from around 50-150.

Now's probably a good time to sort out some kind of depth checking. Objects appearing infront of eachother when they are not is not desirable, after all! The depth checking is relatively simple thankfully, the algorithm being as follows:

In a for loop, check_ray against each sphere. In each iteration, check if the distance to the collision point on this sphere is smaller than the last one, and then so on. When the loop is complete you're left with the collision point nearest to the screen, and you can then proceed as normal! This can be implemented something like:

This can be called in our "plane pixel" loop instead of the "for (every sphere)" section, then our actual shading code can be moved into a sphere_lighting function, which returns the colour of our pixel. That's all there is to the depth checking, really.

Now, we've calculated a nice reflection vector for the specularity, so why not put it to further use with some pretty reflections? This can be a little tricky, as it is a recursive operation. What does this mean? Well, imagine two mirrors facing eachother. Light, or one of our rays, could theoretically get trapped between them forever, so to fix this we need to keep track of the "depth" of the ray and terminate it if it goes too high.

reflection diagram

If your ray shooting functions have been reasonably well structured (unlike mine on my first attempt..), an implementation of the reflections is actually straightforward, like so:

So, firstly we calculate our standard phong illumination model. Then, if the sphere is reflective, we get the colour returned by a new ray sent in the reflection vector direction. This is the recursive part, as the shoot_ray script is calling itself - although the depth is increased with each call, and if it's above a maximum amount the recursion stops.

Figure 2. Reflections

Figure 2. Reflections

Eventually after all the recursion, our initial call of the script will receive the total colour (after all the reflections are added up), which is blended with the colour from the normal lighting, and that's that! Reflections really look great, as can be seen in Figure 2, which used a recursion depth of 3. Any higher than 3 is not usually worthwhile as the reflections get very small at that point, I usually stick with 2.


The next post will probably be on distributed ray tracing - multisample anti-aliasing, and depth-of-field.

Ray Tracing: Part 2


So we've now got a basic ray-tracer which should be able to render some coloured circles on a blank background. Exciting, eh? Next up is defining a light source, shading the spheres, and casting shadows.

lambertian shading

Lambertian shading

To shade our spheres correctly we'll be applying lambertian shading - probably the simplest way to go about doing it, but it has fantastic results. The basic idea that any surface pointing directly at a light source will receive 100% of the light coming from it. If the surface is at an angle to the light source, it will only receive a smaller percentage of the light. 90 degrees or more, and no light will be received. By considering the relation between the angle to the light and the amount of light incident on the surface, we can see that it is a cosine function. cos(0 degrees) = 1, cos(60) = 0.5. cos(90) = 0.

To calculate the percentage of light incident on the surface of our sphere, we can use the dot product. The dot product is defined as:

\Large \frac{A \cdot B}{|A||B|} = cos(\theta)

Where \theta happens to be the angle between the two vectors, A and B, so if we were to compute the dot product of two directional/unit vectors, it would give us the cosine of the angle between them. To find out the cosine of the angle between our current surface and the light, we just need to calculate the dot product of the sphere normal with a vector from the hit position to the light. We can then either use this as V in a HSV colour system to lighten/darken the colour, or simply multiply all RGB values by it. The  normal is a vector which dictates the direction the current point on the surface is facing, which in the case of a sphere is basically a continuation of the radius passing through our point. It is calculated easily by subtracting the position of the center of the sphere from the point we've hit. Note that the cosine of the angle can be negative, so it's necessary to take the maximum of the dot product or 0.

It's also fairly simple to have more than one light source. The code must loop through each light, apply the shading logic from above, then sum up every light's contribution to the pixel.

Here's the lambertian shading added onto the pseudo code from the last post:

shadow rays

Shadow rays

Looking good! Although, what if one sphere is between another and the light source? There should be a shadow cast. However, instead of "casting" shadows, we'll do the opposite - check whether the point we're focussing on is in the shadow of any other sphere by sending out a "shadow" ray. This is really simple to do, as we can use our already existing check_ray function, like so:

The most important part of the code is to ensure the sphere does not check against itself, or all of it will be always shadowed because it thinks that it is casting a shadow on itself. That's all for now - the next post will probably some combination of depth checking, multi-sampling/anti-aliasing, and reflections.

Here's an example of the content of this post in action in my implementation:


Intro to Ray Tracing

I've always wanted to write a ray tracer and this week seemed like a reasonable time to do it! A ray tracer is a program which generates (hopefully) photo-realistic images by casting "light rays" through a number of points/pixels on an imaginary plane in a 3d world, then using information from this ray to set a corresponding point on that ray to the correct colour. Figure 1 depicts and example of this.

ray tracing pic

Figure 1. Basics of Ray Tracing

The reason that ray-tracing can produce such realistic images is that the individual rays can be programmed to behave exactly as light rays would - taking into account reflections, refraction, shadows and so on. The only difference is that they work in reverse.

In real life, light sources produces light rays/particles which propagate outwards. They then bounce off of objects, sometimes being absorbed or transmitted/refracted, and eventually make their way to our eyes. It would not be possible to calculate ray-tracing this way, as billions of rays would have to be radiated in the hope that enough of them would hit our imaginary plane to make a reasonable image. There would be no obvious finite number of rays needed for decent convergence towards a good, final image and it would be impossible to achieve the same ray density per pixel across the whole image. This would result in certain parts of the generated image being much more detailed than others, which is not acceptable.

Instead, rays are created from the viewers' position and are directed through each of the pixels on the plane. The ray can then hit an object, and use some maths to establish how much light is reaching that particular point on the object's surface. A second ray, called a shadow ray, can then be cast from the point the ray hit to the light source to establish whether or not that particular point is in a shadow of another object. It's also possible to send out a reflection ray, which works in the same way as the initial ray but is created from the point the first ray struck - the colour taken from the reflection ray can then be added to the total colour for the pixel and reflections can be generated. Refraction through transparent objects can then be computed in a similar way.

So, where do you start? Well, let's write some pseudo-code to get a super basic ray-tracer up and running:

This code loops through each pixel on our imaginary plane, then works out the direction of the vector from the viewer's position to the current pixel - this is done by subtracting the coordinates of the camera from the "real world" pixel coordinate. This is because any vector, A, between two points B and C, can always be defined as: A = C - B. It's possible to have the camera facing any direction, and then calculate the "plane pixel coordinates" from that (usually using some matrix trickery), but I have opted to skip this step. Instead, our camera is situated at (0,0,0), and facing along the x axis. This means the coordinates of the pixels on the plane are given by (distance to place, px - plane_width/2, py - plane_height/2), normalized and divded by some constant. If you want to work using solely a FOV (field of view) angle, then from some trigonometry that constant should be: plane_width / (distance_to_plane * tan(FOV angle));

It then loops through every sphere in the scene, checking whether the light ray defined by this vector intersects the sphere - if it does, it draws a coloured pixel at the current pixel position on the current render target. It is apparent that this method will have some problems, with depth checking for example, but as a first test it works pretty well.

Before you can get this code up and running, you've got to write the check_ray function. Thankfully, it's reasonably straightforward to establish whether a line, or vector, intersects a sphere. First of all we require a way to mathematically represent our light ray and our spheres. The light ray is simple - it can be composed of a starting position, and a parameterized direction vector, like so:

\Large L = p_0+ t Dir

Where L is the light ray vector, p_0 is the starting point of the ray, t is a variable, and Dir is a directional vector. A sphere can also easily represented in vector form like so:

\Large (p - c)\cdot(p - c)= Radius^2

Where p is any point on the sphere and c is the center of the sphere (a vector relative to the origin, or 0,0,0). This equation is effectively saying that the distance from any point on the sphere to the center of the sphere squared is equal to the radius squared - which makes sense! (A dot product of something with itself gives the magnitude of that thing squared).

Now finding whether or not the line intersects the sphere is as easy as substituting the line equation in to the sphere equation.  After some manipulation, this gives a quadratic equation of the form ax^2 + bx +c = 0, where:

a = Dir \cdot Dir

b = 2(p_0 - c)\cdot Dir

c =(p_0-c)\cdot(p_0-c)-Radius^2

As expected with all quadratic equations, this leaves us with three possible results. The determinant, b^2 - 4ac, will allow us to determine which of these results we'll get. If it is less than 0, there will be no real results for t, which means the ray has not intersected the sphere in question. If it is equal to 0, it means there will be one real result for t, which means that the ray intersects the sphere as a tangent, ie at only one point. If it is greater than 0, there will be two results for t, meaning the ray has passed directly through the sphere. If this is the case, we want to return the smaller of the two values for t, as this is where the ray first intersects the sphere. This is illustrated in Figure 2.


Figure 2. Determinants

Now we can piece the code together, using the general solution to a quadratic equation:

\Large t=\frac{-b\pm\sqrt{b^2 - 4ac}}{2a}

Here is a pseudo-code example:

You may notice that instead of true/false, the script actually returns t1. It is possible to go on to multiply t1 by the components of the Dir light ray vector to find the position in the world at which the intersection occurred - this result will be used later on.

And there you have it! Hopefully you can now get a super simple ray-tracer up and running. The next post will cover lighting and shadows.


CNC/X-Y Table Design

I've recently been looking for a fun, useful project to design and build. I wanted something that brings together electronics, mechanics and programming. My idea? A CNC/X-Y table.

An X-Y Table is a moving platform which is precisely controlled by a computer or embedded electronics. It is the driving mechanism behind laser cutters, CNC routers and so on.  I've decided to design and build a small, 30x30cm table, with X-Y axis movement (at least for starters). The table motion will be actuated by two threaded rods powered by stepper motors. The stepper motors themselves will be controlled by an arduino, which will interpret data fed to it from a CAD program I will write. Simple, right?

Initial Design

Initial Design

I've already made a number of design choices here, which I will explain. Firstly, why threaded rods and stepper motors? Well, the rods I have bought are standard M8 size, which have a pitch of 1mm. This means that for every revolution of the rod, whatever is attached to it will move along the rod by 1mm. The rods are going to be powered by 28BYJ-48 stepper mottors, which from the specs it can be seen that they can take a step of just 0.879 degrees. This means that the table will be able to move distances as small as 0.0024mm, at least in theory. In practice, the precision will probably not come close to that value due to the tolerances in the creation of the rods and the cheapness of the motors - still, it should be more than good enough for my purposes.

So, why arduino controlled? The choice was between an arduino and a raspberry pi, but it was narrowed down by the fact I'm currently without a keyboard for my raspberry pi, and I have more ideas for that in the future. The arduino has just enough pins to control 3 stepper motors (via small controller pcbs), and with one to spare. I'm considering writing my own library for the stepper motor control, but for the time being/prototyping I'm going to be using AccelStepper, which seems to do a great job.

Lastly, why re-invent the wheel by designing my own CAD program and numerical control language? Well, the industry standard NC language is known as G-Code, which is what the vast majority of CAD programs will output. There's also a G-Code interpreter library for arduino. This is all well and good, but I've decided to invent my own NC language as I don't feel comfortable relying on libraries which I don't fully understand, plus it's much more fun to start from scratch. Plus, I'll get total control of how it works, and the language will be tailored to what my table is capable of. Unfortunately, no CAD programs will be able to output my NC language, so I'm also going to have to program my own simple one to design the movements of the table.

On to the design! I've draw up a few concepts, and started modelling a few in Solidworks. You can see the current design here:

As you can see, it's composed of three layers - a base, and X axis and a Y axis with a flat top. The three layers are support by runners which allow them to move along their respective axes. The motors/rods are not currently modelled, but will be soon. This design seems fairly good, but there is a lot of "dead space" in between the layers which may not be necessary and will probably be revised. I should point out - it's not entirely to scale, and looks a lot taller than it will be - the next draft will be done to scale so I can do some motion studies on it.

Inverse Kinematics

What do skeletal animations and robotics have in common? Inverse kinematics! What is inverse kinematics? The opposite of kinematics, unsurprisingly...

Imagine you've got a number of rigid bars each connected by joints, with one end of the structure pinned and one end free to move. Kinematics can be used to find the position of the free end compared to the pinned end by considering the lengths of the bars and the angles of the joints and applying a little trigonometry.

Inverse kinematics is therefore the opposite - given the lengths of the rods and the position of the free end, what angles do the joints need to be at? The solution to this problem is actually the basis of the motion robotic arms make. The way in which robotic arms move is by changing the angle of their joints - much like real arms - so being able to get the tip of a robotic arm at a certain position in space requires inverse kinematics.

Here's the inverse kinematics solver program in full swing, just click anywhere and the robotic arm will position it's hand at that point.

I'm deriving an inverse kinematic equation for a system with two bodies and two joints, as I'm going to be using it to write a skeletal animation program focused on human bodies. This will make animating easy as it can be applied to the legs or arms of the character. A solution for a system with only two bodies is also much easier to derive - I might look at a formula for a system with n bodies further down the line (animating an octopus anyone?).

To start, here's a diagram of what's going on. Imagine the green line is an upper arm, and the red line a forearm. If we want the person's hand to be at a certain coordinate, we need to solve for a few angles.




First up, it's possible to solve for the angles a and b using the Law of Cosines.

\Large a = cos^-1(\frac{A_1^2 + L^2 - A_2^2}{2A_1L})

\Large b = cos^-1(\frac{A_1^2 + A_2^2 - L^2}{2A_1A_2})

Now, using these results it's possible to calculate our angles \phi and \theta. If we call the angle the line L makes with the X axis XL, then:

\Large \phi = XL - a

Now we're nearly there! If we now imagine drawing a vertical line from the elbow joint to the X axis, we can calculate the angle the new line makes with the upper arm by knowing the interior angles of a triangle are equal to 180°. Now we know all of the angles around the elbow joint, so it's simple to calculate \theta:

\Large \theta = 360 - b - 90 - (180 - \phi - 90)

\Large \theta = 180 - b + \phi

There you have it! The angles that the joints need to make with the X axis to solve the problem. An interesting point to note is that there are actually two solutions to the problem. Imagine the image above was mirrored along the line L - that'd be the second solution! In a 2d problem with 2 joints, there are always 2 solutions. It gets a lot more complicated as further joints are added, as there are more possibilities. In 3D it gets even more complicated - there are infinite solutions. This is due to the same reason we have two solutions in our scenario, it's possible to flip any solution along the vector between the start and end of the arm. In 3D, there are an infinite number of "flips" you can make, as you can just rotate everything around the line L.

The implications of the numerous solutions is that often the one you wish for in a certain scenario will not be the one you attain. As a result, it's necessary to write criteria or conditions for the various solutions so that the most natural pose is attained. I'll look into this further in the future!


Acoustic Source Modelling

I recently decided to try writing a 2d acoustic modelling program. It's still a massive work in progress, but it's already quite interesting to experiment with. First up, some theory! (feel free to skip to the bolded text if you're not so interested in derivations).

The acoustic monopole source is the basic building block of acoustic modelling. The equation for the monopole source is derived by considering a pulsating sphere of radius a. Assuming the velocity on the surface of the sphere is given by \tilde{U_a} the equation for the acoustic pressure fluctuations caused by the sphere as a function of radial distance r is given by:

\Large \tilde{p}(r) = \rho _0 ca(\frac{jka}{1+jka}) \frac{ \tilde{U}_a e^{-jk(r-a)}}{r}

However, if  \tilde{U_a} is instead written as a volume velocity \tilde{q} = 4\pi a^2 \tilde{U}_a it's possible to take the limit a \xrightarrow{}{} 0 whilst keeping \tilde{q}constant. The result? The equation for a monopole source, of course!

\Large\tilde{p}(r) = \frac{j\omega \rho _0 \tilde{q} e^{-jkr}}{4\pi r}

Now, we're nearly ready to implement it into a program. The last step is to take the real part, as it's no use as it is. The final equation we get is:

\Large\tilde{p}(r) = \frac{\omega \rho _0 \tilde{q} sin(k r + \phi)}{4\pi r}

That's the end of the boring theory! On to the program. Written in C#, it uses a 2D array to create a grid in which it can store a value for the pressure at a number of points. The resolution is variable, but it was set at 2 pixels-per-grid-point in the images in this post (meaning for each pressure value in the grid there's a 2x2 square onscreen). The colour corresponding to the grid points on each corner of the squares is interpolated to get a smoother image.


Figure 1. Two monopole sources interacting

Monopole sources can  be added by left clicking in the modelling space. Each source iterates through the grid, adding to the pressure at each point by calculating the distance to the point and then using the equation above.

I then went on to add hard boundaries (currently with no absorption, but that would be trivial to add). I utilised the fact that a reflected wave is equivalent to an equal wave propagating in the opposite direction (but 180 degrees out of phase), so when a wall is added each source creates a "phantom" source the other side of the boundary which generates the reflected wave.

Unfortunately, whilst an interesting use of the properties of reflected waves, this didn't pan out so well when I wanted numerous hard surfaces, as phantom sources would then need to make their own phantom sources for additional walls and so on and so forth, ending in a nightmare.


Figure 2. Monopole source interacting with a rigid boundary

My solution to this was to re-write the monopole code to cast rays, using Bresenham's line algorithm. Now the rays can determine the pressure at any grid point using the same equation from above, but also reflect off any surface any number of times.

The result was a slightly less accurate, grainier image - and much slower processing - but also one that resulted in a lot less headaches.

I want to expand the program to visualize the pressure on a 3D surface plot, and allow the user to plot dipoles, quadrapoles, etc which can be made either as a number of out-of--phase dipoles, or using their relevant equations. Hopefully I'll get a chance to do this soon! It'll probably involve re-writing from scratch, so we will see.