DepthKit beta 006


Version 005 for Mac OS X and Windows Released on April 16th 2013

1) Getting started

The current manual was written by Jack Armitage and edited by the team.

To use the DepthKit you'll need:

The DepthKit is compatible with the following operating system and sensor combinations

Mac OS X

We recommend the Primsesense Carmine 1.08 sensor over the Kinect because they feature the same resolution, but are smaller in size and don't require wall power allowing for more portability.

Microsoft Windows 7

External Video Camera

The DepthKit can use any HD video camera, including digital SLRs, GoPros, even newer iPhones. We'll be using the Canon 5D MkII DSLR for the purposes of the tutorials, but feel free to use what you have.

Lens Choice

In order to match the field of view between your camera and sensor so that the color information adequately covers the depth information we recommend a ~24mm lens on a full frame sensor. A ~16mm lens is adequate coverage for an APS-C sensor like an entry-level Canon DSLR. If you aren't familiar with these terms you should be able to find your crop factor in the specifications section of your camera manual and you can look up the conversion to full-frame here. When in doubt, set your zoom lens to its widest setting. Use a wide prime lens for best results.


mounts with camera

The workflow requires the use of hardware to bind the video camera and the depth sensor together. There are a few ways to do this.

Please add your own mounting solutions and designs - however simple or rudimentary - to Instructables or your 3D printable models to Thingiverse and tag them with "DepthKit" or "RGBDToolkit." For recommendations about designing your own mounts see the FAQ.

If you'd rather just buy a kit, we have a stock pile of fabricated mounts and are happy to sell them to you. We ship anywhere!

mounts with camera

Mount kits include one aluminum base and interchangeable arms that can accommodate the Asus, PrimeSense or Kinect. Please note we typically only ship mounts on Mondays. Don't forget to allow for time in customs for international orders. If you're on a timeframe or have any questions about mounts contact us at DepthKit[at]gmail[dot]com.


Print out the A4 or A3 calibration checkerboard PDF in black and white on matte paper. Glue or otherwise mount it to something flat and rigid like wood or foamcore. This can be done easily at most print shops - if you do it at home look out for bubbling or warping. It helps to attach a bracket or some way to put it to a stand or the wall.

Once you have all the items in the checklist, you're ready to calibrate the cameras!


The next step is determine the physical position of the two cameras relative to one another mathematically. This allows the software combine mash up the two data streams into one 3D scene.

Pick the right environment to calibrate

Calibration requires some ambient infrared light in the room. The sun is a good source but can be too bright if it's direct. The best setting is a living room or studio with large windows where you can get filtered sunlight without it being direct. Bear in mind that windows in newer buildings are often treated with IR-blocking coatings. If neither of those are an option, having access to 'hot lights' that emit heat, such as halogen or tungsten, will work. We've also had good luck with IR lights.

Attach the cameras together

Mount the cameras Using the mounting solution you chose from above, affix the HD camera to the depth sensor. Shooting requires that the two cameras be securely bound together and not subject to any movement in relation to each other. Make sure everything is locked down tight, including the safety catch on your quick release!

Get an IR diffuser You'll need something to diffuse the depth sensor's IR laser projector for one step during calibration the depth camera for the calibration stage. We often use a square of tissue paper, cigarette rolling paper or handkerchief.

Lock of your zoom to the widest setting and put a piece of tape over both the zoom ring and the lens body to prevent accidentally zooming later. Zooming after you've calibrated will nullify the calibration.

DepthKitCapture: Calibrate Lenses

Plug in the sensor to your computer and open the DepthKitCapture application depending on which sensor you are using, open the DepthKitCaptureKinect or DepthKitCaptureXtionPro application.

Set your working directory to a convenient place on your hard drive. All your footage will be stored here. Change it by clicking the text at the top off the DepthKitCapture window. The default directory is depthframes, which is inside of your application folder. You'll definitely want to change this. Observe that the software creates a '_calibration' folder for you inside the project directory. Files are autosaved as you go - so relax, your work will be saved.

Select Calibrate Lenses tab, first of the four views on the capture application. It is split in half horizontally; your depth camera stream, if connected via USB, should display on the left, and the right pane should be empty to begin with. If you don't see the depth camera view, see the troubleshooting page.

Note about Kinect model number There are two version of DepthKitCaptureKinect application on OS X, one for model #1414 and one for model #1473. Check the bottom of your Kinect to find the model number and open the corresponding capture application.

Kinect Model No.

Capture Lens Properties In order to accurately calibrate the two cameras, DepthKit needs to understand the subtleties of the camera lenses - imperfect manufacturing processes mean that every lens will be slightly different. These values are called lens intrinsic parameters and describe image size, field of view, optical center of the lens, and any distortion found in the lens. To determine these values we capture and analyze images from both cameras.

Calibrate Depth Camera Aim your rig at an area of space which has plenty of 'visible information' - featuring different colors, contrasts and depths. Hit the Self Calibrate Depth Camera button at the bottom of the left-hand pane. This will automatically analyze the incoming video stream (great!), and once complete should display results similar to the following results:

Self Calibrated Depth

Note the the field of views are symmetrical, and that the principal point is at the center of the depth camera's fixed 640x480 frame.

To Capture the HD camera's lens properties it takes a bit more effort and patience since we don't have a direct software connection to the camera. First, set your camera rig up on a tripod and place your checkerboard on a stand in front, a distance form the camera so that it occupies approximately 1/4 of the frame. Place the board in the top left quadrant, focus, and record a short video from this perspective. Don't worry if the checkerboard is not exactly horizontal or vertical, but do ensure that the entire checkerboard is in the frame, including the white border around the outside black squares. Make sure the board is exposed well, evenly lit, and that the lens is focused on it so the corners are crisp. Record a 1-3 second video of this, mindful of keeping the camera very still.

Repeat this process at a distance where the checkerboard occupies around 1/9th of the frame, taking 9 more images, making 13 in total.

Four Up

Download the clips onto your computer into your project's working directory, wherever you set it in the first step. It is helpful to add them to a new folder inside '_calibration', called 'slrIntrinsics' or something similarly explanatory.

Set the Square Size (cm) of the checkerboard inside the application. For reference, use 3.38 if you have used A3 sized checkerboard and 2.54 if you used the A4 sized board. If yours is a different size, measure one square precisely and use that width.

Drag all of the video clips them into the 'Capture Lenses' tab's right-hand window pane. This should automatically start the calibration process. You may need to wait for a few seconds while this takes place; the application selects the middle frame from each video, converts it into a black and white .png which is stored in your working folder's _calibration directory. It uses OpenCV library to determine the checkerboard corners to create a model of the lens.

Once the analysis is complete, the software will display a 'Total Error' figure below the checkerboard images. This is the average error across all the calibration images. Alongside this, you can view the individual error margins for each image by scrubbing the mouse from left to right across the calibration image window. A 'Total Error' of < 0.200 is desirable. If your calibration has resulted in a larger average error than this, scrub through your image set and look for any outlier images which have an error of > 0.300. Note the filename of any outliers. You can re-perform the analysis at any time, simply by dragging the videos onto the window pane again - this time excluding the erroneous clips. This should improve your Total Error.

good intrinsics

If nearly all of your images have high error, you will need to reshoot them. Before you do this, look for elements in your environment which could have caused the error. Is there light streaking across your checkerboard? Check the troubleshooting section for more reasons why you may be getting high error.

Congratulations, you've now sensed the actual structure of your camera lenses to create a model. With this we can now determine the relationship between the two lenses.

DepthKitCapture: Calibrate Correspondence

Navigate to the second tab, labeled Calibrate Correspondence Now that we have the lens models from the first tab, we can determine the spatial relationship between the cameras.

If you are using the laser cut mount, you can to pivot the sensor up and down in order to match the field of view (FoV) to the video camera's lens. Ideally the video camera will be able to see everything the depth sensor can see, with a little bit of margin on the top and bottom.

FoV Adjust

Now that we've matched the views, we need to take corresponding images of the checkerboard from the two cameras to determine how they sit. Looking back at the capture page, with the checkerboard in each quadrant, you need to capture three images, one short video clip from the video camera, one depth impression from the sensor, and one infrared view of the checkerboard from the sensor. This is where the IR light diffuser is important, so make sure that is handy before beginning. A second pair of hands is helpful at this step too.

IR diffuse

Repeat this process with the checkerboard at four different depths away from the cameras, making sure to refocus at every plane. The idea is to fill up an imaginary box of checkerboard points in the 3D space in front of the camera. This helps to best interpret the relationship between the two cameras that will work at all distances from the lens. Once you've captured all four sets, download the video clips from the camera and drop them into a new folder in the working directory you set before. One at a time, drag the video files into their corresponding rectangular tiles in the application next to the corresponding depth and IR thumbnails taken from the device.

With four sets of three images complete, click 'Generate RGB/Depth Correspondence'. If you get an error it means the algorithm was unable to find an exact fit for the selected checkerboard pairs. Try hitting 'ignore' to excluding a few of the image sets - 'bad apples' may be throwing off the model calculation. Just like before, excluding images may help in this situation. Click 'Ignore' on all but one of the images, and attempt to Generate RGB/Depth Correspondence again. When you find an image that allows the process to complete successfully, try combining it with other images. There is some randomness in the algorithm, so it helps to try the same combinations a few times just to see if it “guesses” a better starting place.

Good Calibration

By pressing the left and right arrows you can cycle through previews of the four checkerboard calibration sets. If it's correct, you'll see the checkerboard image data pixels (in black and white) mapped cleanly onto the depth model of the same image. You'll also see corresponding colored dots floating in space near corresponding to the checkerboard depth planes. Some dots are missing from the grid pattern, as they were removed as outliers while generating the calibration. An ideal calibration will contain dots from at least three different sets of colors. By cycling through all tests checkerboards sets, the checkerboard image should be visibly well aligned to the depth data.

The camera is set up as a video game style WASD first-person camera, using the following controls:

Move Forwardw
Move Backwards
Move Lefta
Move Rightd
Move Upe
Move Downc
Rotate Counterclockwiseq
Rotate Clockwiser

Once you have a calibration where all the checkerboards depth and image data match up for all the levels, you can move onto recording! As long as your camera and depth sensor lenses stay in position, you won't have to go through the painstaking process again. Phew!


Pre-filming checklist Ready to Roll? Navigate to the Record tab in the DepthKitCapture application. If connected properly, you will be able to see a preview of your depth camera on the left. Your takes will show on the right side in the format 'TAKE_MO_DD_HH_MI_SS', with the numbers corresponding to the time of capture (there will be none before you've saved your first take). If it's still there from the last tutorial, remove the IR projector cover from your depth camera.

Tethered When planning your shoot, be aware that your camera operator will not be able to move further from the computer than the depth camera's USB cable will allow. This distance can be extended with an active USB extender if needed. If you are shooting with an Asus Xtion and a laptop you can go mobile!

Clap Each take requires a visual signal to sync the two data streams together. Have someone (or yourself) ready to stand in front of the DepthKit rig and clap at the beginning of the each take. It may feel silly at the time, but it is important - it allows you to fine tune the temporal alignment between the video and depth streams later on.

Rolling We follow this convention when on set:

Warning: if you see the 'Toggle Record' button starting to fill with a red bar, it means that some images are failing to save. This usually occurs when the hard drive is nearly full. If this happens, stop recording as soon as possible and wait for the red bar to go down, ensuring that all the footage is written to the drive. If you're recording to an external drive, make sure it's FireWire, USB 3.0, or Thunderbolt.

unsaved frames

When you stop recording you should see your Take appear in the right side of the Record tab. As soon as you finish recording, the application will begin compressing the raw depth data in the background. The progress of this process is shown as a gray bar overlaid on the TAKE description on the top right. It will move all the way to the right when finished. In the meantime, open your working directory. You should see that a TAKE folder has been created with the same timestamp as in the application. Navigate inside this folder and download your DSLR footage into the 'color' folder.

It's possible to preview the recorded depth data inside of the capture application. In the playback tab, select a Take and hit space bar to play the timeline. Use the same controls from the calibration step to navigate the point cloud as it plays back your capture.

Media Bin Preparation

After a shoot, you'll need to copy the video files into the working directory. For each file you recorded, find the corresponding TAKE folder. Having the camera's clock set correctly is helpful so you can match the time stamps. Within the take directory, put the video file into the color/ folder.

The folder structure for the media bin looks like this:

            6x .yml files generated from the calibration step
            frame_000_millis_00000.png //compressed depth frame, first number is frame # second is millisecond
   //the movie clip that corresponds to this Take
   //optional small version that will be used for offline editing

The toolkit allows for the option of having a smaller preview, sometimes referred to as 'offline', version of the video to make visualization fast while keeping export quality top notch. To create an offline video we recommend MPEG Streamclip or Quicktime Pro 7 to create a 640x360 (assuming you shot in 16:9 aspect ratio) MJPEG @50% speed and remember to include sound. Add to the end of the filename and put in the Color folder. This is the clip that DepthKitVisualize will load for preview, it will be swapped out automatically when rendering.

Once your files are in place inside their color folders, you're ready to Visualize.


Launch DepthKitVisualize You should see the list of takes you just recorded. Only takes with color folders show up, so if you don't see one make sure your folders are complete. Select the take you'd like to visualize and click 'Create new Composition with this Scene.'

Assuming it's the first time loading this scene we need to create a temporal alignment by synchronizing the two data streams. You'll be glad you clapped while shooting! On the timeline on the bottom you should see two rows of thumbnails, color and depth. Scrub through them each by clicking on the line of thumbnails and dragging left or right. You'll see previews of the depth data and the RGB video as thumbnails on the top right. With your mouse over the respective timelines can use the arrow keys until you find the clap in both Depth and Color. Once you have the precise moment of the clap selected in both, press "Set Color-Depth Time" on the left. Scrub through the timeline by clicking on the bar with the yellow tickers at the bottom of the main preview pane. You should see the footage playing back with matched temporal synced.

If you make a mistake and find that the time alignment is off, you can highlight the blue marker on the 3rd track of the 'Time Alignment' tab and press DELETE to start over. It's never necessary to set multiple time alignments, so delete the existing ones before the

time align

Now navigate to the Texture Alignment tab to tweak the X and Y Shift & Scale to align the depth and color information to perfect your alignment.

Just like in the calibrate step, clicking and dragging the mouse rotates the camera. WASD moves you like a first person video game. E and C boom up and down in space while R+Q rotate. Use the "Reset view" button when you get lost in space.


Each Take can have any number of compositions. A composition lets you render the scene in a different way, with a different in and out point, changing camera perspectives and different rendering styles.

A few tips

To create a camera track, move the camera to where you'd like to look at the scene and scrub to the point in time you'd like it to be there. Press "Set camera point" on the GUI or press SHIFT+T. You'll notice a blue marker appear on the Camera timeline. You can then move the timeline and the camera to a new location and press SHIFT+T again. By selecting "Lock to Track" or pressing SHIFT+L the camera will travel along this path using your first camera point as an 'in point' and your last as an 'out point.'

You can click and drag the camera points to change their location in time. You can delete them with the Delete or backspace key. By clicking on the triangles to the left or right of the marker and pressing the up or down arrows you can have the camera movement ease in or out represented by a curved arc, or cut (blocky steps.)

The Depth of Field effect can be convincing for simulating a camera language. Draw DOF is selected in the GUI. By setting the Depth Range you can select here the focus plane is in relation to the camera - think of this like turning the focus ring on your camera. Tweak the range and blur to achieve the desired effect. Depth of field keyframes matched to camera moves can create really cool convincing effects simulation racking or tracking focus. The white rectangle on the top left acts as a focus assist by marking the area in focus in black.

Save the composition when you are ready to export or want to experiment with something else. To switch compositions, click the name of the composition at the top and select a new scene.


Currently you can export out of DepthKitVisualize as a PNG image sequences with or without alpha transparency or as a series of textured .OBJ files for use in other programs like Maya, Blender or Cinema 4D.

To export, click on the composition name to view all the scenes. Selecting a scene will show all the compositions in that scene, each of which has a small 'R' button next to it. This stands for Render, and by clicking it you will add this composition to the render queue. If you make changes to that comp you will have to re-add the comp to the queue by clicking the Take in the render queue and re-adding the comp.

Once you've selected all the compositions you wish to render, click the "Start Render Batch >>" button and sit back and relax as all the compositions you've queued up begin to render. Be careful not to press the spacebar - it cancels your render.

Export as Image Sequence

The application exports an image sequence by default. Exporting works by rendering a given composition from the perspective chosen in the camera track (what you see in "Lock to Track mode) into a series of PNG frames. The renderer uses the first camera point as an 'in point' and the last as an 'outpoint.'

Render frames are 1920x1080 by default but can be changed on the left-hand side of the app by changing the Frame Width & Frame Height Sliders. There are presets for 720P & 1080P built in below the sliders. Frames are saved into the _RenderBin/ folder underneath the main MediaBin/ (right next to _calibration). Each folder is stamped with the scene, composition and date of export so nothing will ever be overwritten.

The image sequences will be rendered out with the black background as transparent using an Alpha channel for compositing. Except when "Draw DoF" is checked it will remain black.

Export as OBJ Sequence

In order to export textured OBJ files toggle Export .obj Files on the left-hand panel. This will render out a sequential series of .OBJ files and matching .PNG frames to the _RenderBin/ folder. There will be no accompanying .mtl files included but they can be imported in to Maya. There is a tutorial and import script here. Keep in mind that there is no camera tracking or perspective information embedded in these sequences as they are simply texturable 3D files, but it does still use your first and last camera points as in & out points for the render.


The application will not launch / crashes / crashes my computer

My depth camera feed is not showing up in the Calibrate Lenses tab

My depth camera's field of view / principal point don't match the figures given in the video

I'm getting high error numbers on the calibrate lenses page

My correspondences are not detecting or way off

My recordings are dropping frames

The application crashes when I try to load movie files from the HD camera

My texture alignment is way off

My time alignment is way off