Skip to content

Instantly share code, notes, and snippets.

@nerovalerius
Last active June 11, 2025 14:45
Show Gist options
  • Select an option

  • Save nerovalerius/80133f409f9ed0573522432244298195 to your computer and use it in GitHub Desktop.

Select an option

Save nerovalerius/80133f409f9ed0573522432244298195 to your computer and use it in GitHub Desktop.
Instructions on how to create a LIDAR dataset with semantic labels in "semantic KITTI" Format

Prerequisites

Tested on Ubuntu 22.04 with an Nvidia RTX 3090

  1. Install ROS2 Humble: LINK

  2. Set up a ROS2 workspace: LINK
    2a. prepare automatic sourcing of the ROS2 installation and workspace:

    echo "source /opt/ros/humble/setup.bash" >> ~/.bashrc
    echo "source /home/user/ros2_ws/install/setup.bash" >> ~/.bashrc  
    

    where the second line must be adapted to your ros2 workspace. Afterwards, source the bashrc file (source ~/.bashrc) or simply close your terminal and open an new one (thus bashrc is loaded).

  3. Clone the pointcloud merge node: LINK

  4. Set up a Second workspace for Point Labeler and SuMa -> See Instructions at: LINK

  5. Install Point Labeler: LINK

  6. Install Surfel Based Mapping (SuMa): LINK 5a. One dependency of SuMa is GTSAM. On Ubuntu 22.04 there is currently no pre-built package for GTSAM. After installing GTSAM's dependencies, build GTSAM with:

    cd ~
    git clone https://github.com/borglab/gtsam.git
    cd gtsam
    mkdir build
    cd build
    cmake ..
    sudo make install
    
  7. Install pcd2bin: LINK

Usage

  1. Create a folder structure which meets the KITTI standard LINK:

    mkdir -p ~/dataset/pointclouds
    cd ~/dataset
    mkdir rosbags
    mkdir -p sequences/00/velodyne
    cd sequences/00
    mkdir labels
    
  2. Record ROSBAG of your LiDAR Data. In our case, this data is made out of 5 x Livox Horizon LiDARs:
    ros2 bag record <params>

  3. Start the pointcloud merge node inside ~/dataset/pointclouds. (the pcd files are stored in the folder in which the node is started):
    https://github.com/nerovalerius/pointcloud_merge_and_kitti.

  4. Start playback of the ROSBAG within 5 seconds. Otherwise the pointcloud merge node quits.
    ros2 bag play rosbag2_2022_06_21-12_56_43_0.db3 -s sqlite3 --rate 0.5.

  5. Convert the .pcd files with the pcd2bin node and store the .bin files inside ~/dataset/sequences/00/velodyne:
    ros2 run pcd2bin_kitti pcd2bin.

  6. create a dummy calibration file inside your created sequence: nano ~/dataset/sequences/00/calib.txt and fill it with:

    P0: 1 0 0 0 0 1 0 0 0 0 1 0
    P1: 1 0 0 0 0 1 0 0 0 0 1 0
    P2: 1 0 0 0 0 1 0 0 0 0 1 0
    P3: 1 0 0 0 0 1 0 0 0 0 1 0
    Tr: 1 0 0 0 0 1 0 0 0 0 1 0
    
  7. Run SuMa with: ~/catkin_ws/src/SuMa/bin/visualizer and open the first binary pointcloud of the sequence, e.g.: ~/dataset/sequences/00/full_cloud_00001.bin

    Run SuMa to get a poses.txt file for your sequence by pressing the PLAY button and then save poses.txt to the ~/dataset/sequences/00/ folder. SuMa creates a posegraph for your pointcloud sequence.

    Each pointcloud then gets a absolute position in a world_frame, which is used to load a complete sequence of pointclouds into the Point Labeler. This means that a large number of pointclouds can be labelled at once.

  8. Open ~/dataset/sequences/00/ with Point Labeler. Run Point Labeler with:
    ~/catkin_ws/src/point_labeler/bin/labeler
    Point Labeler generates then generates the labes in the correct semantic KITTI format.

@opedromartins
Copy link

I figured out what was going wrong and got it all sorted. Thanks a bunch for your help - it worked like a charm!

I had a little trouble in step 6 (SuMa) because of a GTSAM error, probably some version thing. But fixed it by changing step 5a and installing GTSAM pre-built package like so:

sudo add-apt-repository ppa:borglab/gtsam-release-4.2 -y
sudo apt install libgtsam-dev libgtsam-unstable-dev -y

Then everything worked fine. Appreciate your help once again.

Best,
Pedro

@nerovalerius
Copy link
Author

Perfect. Let me know if you need any help.

Best,
Armin

@jalexdz
Copy link

jalexdz commented Jun 10, 2025

In the SemanticKITTI paper, they mention extracting input/output volumes ahead of the vehicle. Did you do this with a script using the SuMa poses? I'm attempting to generate a dataset like this myself using a mobile robot.

@nerovalerius
Copy link
Author

I only used the SuMa poses for localization and creation of the pose graph, but didn’t generate input/output volumes from them. Where exactly are you stuck?

best,
Armin

@jalexdz
Copy link

jalexdz commented Jun 11, 2025

I see. I'm not stuck, just wondering how you extracted these volumes (but I see you didn't) in your pipeline. It'll just take some processing on my end. Thanks!

Alex

@nerovalerius
Copy link
Author

you're welcome !

best,
Armin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment