Mateo de Mayo

SLAM for Monado

Mateo de Mayo

30 October 2021

Summary

After some time working in the integration of SLAM systems with Monado, there is now a starting point to try out SLAM and VIO tracking. Kimera-VIO, ORB-SLAM3, Basalt and Monado itself were adapted to work together in a modular way. It is worth noting that this is one of the first iterations and there is still a lot of room for improvement.

Overview

These last months I’ve been testing out different SLAM/VIO systems (Kimera, ORB-SLAM3 and Basalt) and integrating them into Monado by proposing a header [1] for this kind of tracking and implementing it in a fork of each system (Kimera [2] [3] [4], ORB-SLAM3 [5] [6] and Basalt [7] [8]). Then in Monado, the necessary modifications were made to have a tracker class [9] that can talk with that header and drivers can use.

Trying it Out

For starters, if you want to get some SLAM working with Monado, you should pick one of these three systems, clone my fork of it, and read the installation section of the README. You do not need a hardware device to try it out as you can instead use a dataset as indicated in the READMEs. Here are the links for each fork: Kimera [18], ORB-SLAM3 [19], Basalt [20]. While I wasn’t able to get Kimera to work properly, ORB-SLAM3 and Basalt feel pretty decent with the RealSense D455 I’ve been working with. From those two, I would recommend starting with Basalt as it is a bit easier to set up.

The work done in the libraries and the interface is still very rough and has just been an initial iteration. Right now the systems mostly just return the last tracked pose and don’t support any kind of prediction. I’ve mostly done the bare minimum work to expose them as dynamic libraries that Monado can use. The configuration is still pretty cumbersome as you need to provide config data both for Monado and the systems separately, and each system’s configuration format is different from the others. Most of the systems’ numerical errors tend to get pretty bad and end up in crashes when you make harsh movements. Also, many quality-of-life improvements like being able to specify where the floor is at were not implemented. I’ll be working on improving aspects like these in the following months.

Writing a New Driver

Regarding drivers, currently there are two that can get tracked through SLAM in Monado: euroc and realsense. EuRoC is a standard dataset used for comparison of SLAM systems [10], but it is also a format for storing these kinds of datasets (basically a directory with a bunch of PNGs and CSVs with IMU data and timestamps). The euroc driver has a “player” [11] to play these datasets, it also exposes a basic device [12] to see how the dataset gets tracked. The realsense driver on the other, previously focused on working with the T265’s device SLAM, now is extended to (supposedly) work with any RealSense device that has video and IMU streams (I only tested it on a D455). It does this by providing a rs_source [13] of video and IMU data, and a very basic device rs_hdev [14] which gets tracked through (host) SLAM with this source of data.

The basic idea, if you want to write a driver that uses SLAM, is to expose the source of your video and IMU data as a derived class of the xrt_fs class similar to how the euroc_player and rs_source do. The fs in xrt_fs means frame server, but the concept is a bit extended here because it will be also an “IMU server”, or how I usually call the whole, a SLAM source. Your SLAM source will need to implement most of the methods to be an xrt_fs, in particular the method slam_stream_start which will receive a xrt_slam_sinks [15], which is where you need to send your driver’s video and IMU data into. Once you have a SLAM source that can feed SLAM sinks, you just need to start the SLAM tracker from the device you want to get tracked, similar to how the euroc_device [16] and the rs_hdev [17] do.

How is it Looking

These are videos of how the implementations are looking.

Kimera-VIO

Kimera was the first system to test and integrate, but unfortunately I wasn’t able to get good performance out of it. In the video, you can see how low its FPS are and how the tracking starts flying away very easily when exciting the accelerometer. Taken with a D455 camera in stereo mode, with 640x360 resolution at 30hz and IMU samples at 250hz.

ORB-SLAM3

ORB-SLAM3 works very well and is the only system that is not just visual-inertial odometry, but also builds and can store the map in a persistent way. It provides good FPS and robust tracking. Towards the end, you can see it supporting very fast movements, but also diverging with some particular abrupt ones. Something to keep in mind is that ORB-SLAM3 is GPL and, just in case, it is not listed in the main Monado repository, instead you will need to use a fork of Monado that might be a bit out of sync. This fork contains two small commits needed to make ORB-SLAM3 work. This video was also using a D455 camera with the same configuration as in the Kimera video.

Basalt

Basalt is extremely fast and has a BSD-3 license. This paper [21] of a privative solution shows how fast Basalt is. It is probably the system with the best software engineering practices out of the three integrated. While it can be less robust than ORB-SLAM3, you can get away by increasing the resolution and frequency of samples. In this video in particular, I’m using almost twice as many pixels with a 848x480 resolution for each camera and twice the FPS at 60hz, IMU samples still at 250hz.