AI生成场景新突破：3D Gaussian Splatting的简介及训练入门教程

3D Gaussian Splatting是一种用一组2d图像创建3d场景的方法，你只需要一个场景的视频或者一组照片就可以获得这个场景的高质量3d表示，使你可以从任何角度渲染它。它们是一类辐射场方法（如NeRF），但同时训练速度更快（同等质量）、渲染速度更快，并达到更好或相似的质量。3D Gaussian Splatting可以实现无界且完整的场景1080p分辨率下进行高质量实时（≥ 100 fps）视图合成。

https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

# 3D Gaussian Splatting

## A beginner friendly introduction to 3D Gaussian Splats and tutorial on how to train them.

3D Gaussian Splatting is a new method for novel-view synthesis of scenes captured with a set of photos or videos. They are a class of Radiance Field methods (like NeRFs) but are simultaneously faster to train (at equal quality), faster to render, and reach better or similar quality. They are also easier to understand and to postprocess (more on that later). This is a beginner friendly introduction to 3D Gaussian Splats and how to train them.

## What are 3D Gaussian Splats?

At a high level, 3D Gaussian splats, like NeRFs or photogrammetry methods, are a way to create a 3D scene using a set of 2D images. Practically, this means that all you need is a video or a set of photos of a scene, to obtain a 3D representation of it — enabling you to reshoot it, or render it from any angle.

Here’s an example of a capture I made. As input, I used 750 images from a plush toy, that I recorded with my phone from different angles.

Once trained, the model is a pointcloud of 3D Gaussians. Here is the pointcloud visualized as simple points.

But what are 3D Gaussians? They are a generalization of 1D Gaussians (the bell curve) to 3D. Essentially they are ellipsoids in 3D space, with a center, a scale, a rotation, and “softened edges”.

Each 3D Gaussian is optimized along with a (viewdependant) color and opacity. When blended together, here’s the visualization of the full model, rendered from ANY angle. As you can see, 3D Gaussian Splatting captures extremely well the fuzzy and soft nature of the plush toy, something that photogrammetry-based methods struggle to do.

## How to train your own models? (Tutorial)

Important: before starting, check the requirements (about your OS & GPU) to train 3D Gaussian Splats here. In particular, this will require a CUDA-ready GPU with 24 GB of VRAM.

### Step 1: Record the scene

If you want to use the same model as me for testing (the plush toy), I have made all images, intermediate files and outputs available, so you can skip to step 2.

Recording the scene is one of the most important steps because that’s what the model will be trained on. You can either record a video (and extract the frames afterwards) or take individual photos. Be sure to move around the scene, and to capture it from different angles. Generally, the more images you have, the better the model will be. A few tips to keep in mind to get the best results:

• Avoid moving too fast, as it can cause blurry frames (which 3D Gaussian Splats will try to reproduce)
• Try to aim for 200-1000 images. Less than 200 images will result in a low quality model, and more than 1000 images will take a long time to process in step 2.
• Lock the exposure of your camera. If it’s not consistent between frames, it will cause flickering in the final model.

Just for reference, I have recorded the plush toy using a turntable, and fixed camera. You can find cheap ones on Amazon, like here. But you can also record the scene just by moving around it.

Once you’re done. Place your images in a folder called input, like this:

📦 $FOLDER_PATH ┣ 📂 input ┃ ┣ 📜 000000.jpg ┃ ┣ 📜 000001.jpg ┃ ┣ 📜 ... ### Step 2: Obtain Camera poses Obtaining camera poses is probably to most finicky step of the entire process, for inexperienced users. The goal is to obtain the position and orientation of the camera for each frame. This is called the camera pose. There are several ways to do so: • Use COLMAP. COLMAP is a free and open-source Structure-from-Motion (SfM) software. It will take your images as input, and output the camera poses. It comes with a GUI and is available on Windows, Mac, and Linux. • Use desktop softwares. These include RealityCaptureMetashape (commercial softwares). • Use mobile apps, including PolycamRecord3D. They take advantage of the LiDAR sensor on recent iPhones to obtain the camera poses. Unfortunately, only available on iOS with an iPhone 12 or newer. Again, if you want to use the same model for testing, download the sample “sparse.zip” and skip to step 3. Because it is free and open-source, we will show how to use COLMAP to obtain the camera poses. First, install COLMAP: follow the instructions of the official installation guide. From now on, we suggest two ways to obtain the camera poses: with an automated script, or manually with the GUI. Download the code from the official repo. Make sure to clone it recursively to get the submodules, like this: git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive Then run the following script: python convert.py -s$FOLDER_PATH

This will automatically run COLMAP and extract the camera poses for you. Be patient as this can take a few minutes to a few hours depending on the number of images. The camera poses will be saved in a folder sparse and undistored images in a folder images.

To visualize the camera poses, you can open the COLMAP GUI. On linux, you can run colmap gui in a terminal. On Windows and Mac, you can open the COLMAPapplication.

Then select File > Import model and choose the path to the folder $FOLDER_PATH/sparse/0. The folder structure of your model dataset should now look like this: 📦$FOLDER_PATH ┣ 📂 (input) ┣ 📂 (distorted) ┣ 📂 images ┣ 📂 sparse ┃ ┣ 📂 0 ┃ ┃ ┣ 📜 points3D.bin ┃ ┃ ┣ 📜 images.bin ┃ ┃ ┗ 📜 cameras.bin

### Step 3: Train the 3D Gaussian Splatting model

If not already done, download the code from the official repo. Make sure to clone it recursively to get the submodules, like this:

git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive

Installation is extremely easy as the codebase has almost no dependencies. Just follow the instructions in the README. If you already have a Python environment with PyTorch, you can simply run:

pip install plyfile tqdm pip install submodules/diff-gaussian-rasterization pip install submodules/simple-knn

Once installed, you can train the model by running:

python train.py -s $FOLDER_PATH -m$FOLDER_PATH/output

Since my scene has white background, I’m adding the -w option. This will tell the training script that the base background color should be white (instead of black by default).

python train.py -s $FOLDER_PATH -m$FOLDER_PATH/output -w

This will save the model in the $FOLDER_PATH/output folder. The entire training (30,000 steps) will take about 30-40 minutes, but an intermediate model will be saved after 7,000 steps which is already great. You can visualize that model right away, by following step 4. ### Step 4: Visualize the model The folder structure of your model dataset should now look like this: 📦$FOLDER_PATH ┣ 📂 images ┣ 📂 sparse ┣ 📂 output ┃ ┣ 📜 cameras.json ┃ ┣ 📜 cfg_args ┃ ┗ 📜 input.ply ┃ ┣ 📂 point_cloud ┃ ┃ ┣ 📂 iteration_7000 ┃ ┃ ┃ ┗ 📜 point_cloud.ply ┃ ┃ ┣ 📂 iteration_30000 ┃ ┃ ┃ ┗ 📜 point_cloud.ply
• If you’re on Windows, download the pre-build binaries for the visualizer here.
• On Ubuntu 22.04, you can build the visualizer yourself by running:
# Dependencies sudo apt install -y libglew-dev libassimp-dev libboost-all-dev libgtk-3-dev libopencv-dev libglfw3-dev libavdevice-dev libavcodec-dev libeigen3-dev libxxf86vm-dev libembree-dev # Project setup cd SIBR_viewers cmake -Bbuild . -DCMAKE_BUILD_TYPE=Release # add -G Ninja to build faster cmake --build build -j24 --target install

Once installed, find the SIBR_gaussianViewer_app binary and run it with the path to the model as argument:

SIBR_gaussianViewer_app -m \$FOLDER_PATH/output

You get a beautiful visualizer of your trained model! Make sure to select Trackball mode for a better interactive experience.