
A Hands-on Guide for Practitioners

17 hours ago
Last summer a non-deep learning method for novel view synthesis has entered the sport: 3D Gaussian splattig. It is a technique to represent a scene in 3D and to render images in real-time from any viewing direction. Some even say they’re replacing NeRFs, the predominant method for novel view synthesis and implicit scene representation at the moment. I feel that’s debatable since NeRFs are far more than image renderers. But that’s nothing we care about today… Today we only care about crisp looking 3D models and that’s where 3D Gaussian splatting shines 🎉
On this post we’ll very briefly look into Gaussian Splatting after which switch gears and I’ll show you the way you possibly can turn yourself right into a 3D model.
Bonus: At the top I’ll show you the way you possibly can then embed your model in an interactive viewer on any website.
So, let’s go!
- What are Gaussian Splats?
- Let’s Turn Ourselves right into a 3D Gaussian Splatting
- Conclusion and Further Resources
3D Gaussian splatting is a method to represent a scene in 3D. It is definitely one in all some ways. For instance you may also represent a scene as a set of points, a mesh, voxels or using an implicit representation like Neural Radiance Fields (aka. NeRFs).
The inspiration of 3D Gaussian Splatting has been around for quite a while leading back to 2001 to a classical approach from computer vision called surface splatting.
But how does 3D Gaussian Splatting actually represent a scene?
3D Representation
In 3D Gaussian Splatting a scene is represented by a set of points. Each point has certain attributes related to it to parameterize an anisotropic 3D Gaussian. If a picture is rendered, these Gaussians overlap to form the image. The actual parameterization takes place in the course of the optimization phase that matches these parameters in such a way, that rendered images are as close as possible to the unique input images.
A 3D Gaussian is parameterizedwith
- its mean µ, which is the x,y,z coordinate in 3D space.
- its covariance matrix Σ, which might be interpreted because the spread of the Gaussian in any 3D direction. For the reason that Gaussian is anisotropic it could actually be stretched in any direction.
- a color normally represented as spherical harmonics. Spherical harmonics allow the Gaussian splats to have different colours from different viewpoints which drastically improves the standard of renders. It allows rendering non-lambertian effects like specularities of metallic objects.
- an opacity 𝛼 that determines how transparent the Gaussian can be.
The image bellow shows the influence of a 3D Gaussian Splat with respect to a degree p. Spoiler: that time p can be the one relevant if we render the image.
How do you get a picture out of this representation?
Image Rendering
Like NeRFs, 3D Gaussian Splatting uses 𝛼-blending along a ray that’s casted from a camera through the image plane and thru the scene. This mainly implies that through integration along a ray al intersecting gaussians contribute to the ultimate pixel’s color.
The image bellow shows the conceptual difference between essentially the most basic NeRF (for simplicity) and gaussian splatting.
While conceptually similar, there may be a big difference within the implementation though. In Gaussian Splatting we don’t have any deep learning model just like the multi-layer perceptron (MLP) in NeRFs. Hence we don’t need to guage the implicit function approximated by the MLP for every point (which is comparatively time consuming) but overlap various partially transparent Gaussians of various size and color. We still have to solid at the least 1 ray per pixel of the image to render the ultimate image.
So mainly through the mixing of all that Gaussians the illusion of an ideal image emerges. In the event you’d remove the transparency from the splats you possibly can actually see the person gaussians of various size and orientation.
And the way is it optimized?
Optimization
The optimization is theoretically straightforward and simple to know. But in fact, as all the time, the success lies in the small print.
To optimize the Gaussian Splattings, we want an initial set of points and pictures of the scene. The authors of the paper suggest to make use of the structure from motion (SfM) algorithm to acquire the initial point cloud. During training, the scene is rendered with the estimated camera pose and camera intrinsic obtained from SfM. The rendered image and the unique image are compared, a loss is calculated and the parameters of every Gaussian is optimized with stochastic gradient descent (SGD).
One in every of the vital details price mentioning is the adaptive densification scheme. SGD is just capable to regulate the parameter of existing Gaussians, nevertheless it cannot spawn recent ones or destroy existing ones. This might result in holes within the scene or to lack of fine-grained details if there are too few points and to unnecessarily large point clouds if there are too many points. To beat this, the adaptive densification method splits points with large gradients and removes points which have converged to low values.
Having talked about some theoretical basics let’s now switch gears and jump into the sensible a part of this post, where I show you the way you possibly can create a 3D Gaussian splatting of yourself.
Note: The authors suggest using a GPU with at the least 24GB but you possibly can still create your 3D Gaussian Splats using some tricks I’ll will mention once they should be applied. I even have an RTX 2060 mobile with 6GB.
These are the steps we’ll cover:
- Installation
- Capture a Video
- Obtain point cloud and camera poses
- Run the Gaussian Splatting Algo
- Post processing
- (Bonus) Embed your model on a web site in an interactive viewer
Installation
For the installation you possibly can either hop over to the official 3D Gaussian Splatting repository and follow their instructions or head over to The NeRF Guru on YouTube who does a superb job in showing easy methods to install all you wish. I like to recommend the later.
I personally selected to put in colmap on windows because I used to be not capable of construct colmap from source with GPU support in my WSL environment and for windows there may be a pre-built installer. The optimization for the 3D Gaussian Splatting has been done on Linux. Nevertheless it actually does not likely matter and the commands I show you’re equal on either Windows or Linux.
Capture a Video
Ask someone to capture a video of you. You have to stand as still as possible and the opposite person must walk around you attempting to capture you from any angle.
Some Hints:
- Select a pose where it is simple for you not to maneuver. E.g. holding your hands up for 1 minute without moving is just not that easy 😅
- Select a high framerate for capturing the video to cut back motion blur. E.g. 60fps.
- If you could have a small GPU, don’t film in 4k otherwise the optimizer is more likely to crash with an out of memory exception.
- Ensure there may be sufficient light, so your recording is crisp and clear.
- If you could have a small GPU, prefer indoor scenes over outdoor scenes. Outdoor scenes have a number of “high frequency” content aka. small things close to one another like gras and leaves which results in many Gaussians being spawned in the course of the adaptive densification.
Once you could have recorded your video move it to your computer and extract single frames using ffmpeg.
ffmpeg -i -qscale:v 1 -qmin 1 -vf fps= /%04d.jpg
This command takes the video and converts it into jpg images of top of the range with low compression (only jpg works). I normally use between 4–10 frames per second. The output files can be named with an up counting four-digit number.
It is best to then find yourself with a folder filled with single frame images like so:
Some hints for higher quality:
- Remove blurry images — otherwise results in a haze around you and spawns “floaters”.
- Remove images where your eyes are closed — otherwise results in blurry eyes in the ultimate model.
Obtain Point Cloud and Camera Poses
As mentioned earlier the gaussian splatting algorithm must be initialized. A method is to initialize the Gaussians’ mean with the placement of some extent in 3D space. We will use the tool colmap which implements structure from motion (SfM) to acquire a sparse point cloud from images only. Luckily, the authors of the 3D Gaussian Splatting paper provided us with code to simplify the method.
So head over to the Gaussian Splatting repo you cloned, activate your environment and call the convert.py script.
python .convert.py -s --resize
The basis path to your data is the directory that accommodates the “input” folder with all of the input images. In my case I created a subfolder inside within the repo: ./gaussian-splatting/data/
. The argument --resize
will output additional images with a down sampling aspects 2, 4, and eight. This is vital in case you run out of memory for top resolution images, so you possibly can simply switch to a lower resolution.
Note: I needed to set the environment variable CUDA_VISIBLE_DEVICES=0 for the GPU to getting used with colmap.
Depending on the variety of images you could have, this process might take some time, so either grab a cup of coffee or stare on the progress like I sometimes do wasting a number of time 😂
Once colmap is completed you possibly can type colmap gui
into your command line and inspect the sparse point cloud.
To open the purpose cloud click on “File>import model” and navigate to
and open that folder.
The red objects are cameras the SfM algorithm estimated from the input frames. They represent the position and pose of the camera where a frame was captured. SfM further provides the intrinsic camera calibration, which is vital for the 3D gaussian splatting algorithm so gaussians might be rendered right into a 2D image during optimization.
Run the Gaussian Splatting Optimizer
The whole lot up until now has been preparation for the actual 3D Gaussian splatting algorithm.
The script to coach the 3D Gaussian splatt is train.py. I normally wish to wrap those python scripts right into a shell script to have the ability so as to add comments and simply change the parameters of a run. Here’s what I exploit:
Aside from the data_device=cpu
all arguments are set to their default. In the event you run into memory issues, you possibly can try tweaking the next arguments:
resolution
: that is the down sampling factor of the image resolution. 1 means full resolution, and a couple of means half resolution. Since we have now used --resize
for the convert.py for the sparse point cloud generation, you possibly can test with 1, 2, 4 and eight. Before lowering the resolution I like to recommend attempting to lower sh_degree
first.
sh_degree
: Sets the utmost degree of the spherical harmonics, with 3 being the utmost. Lowering this value has a big impact on the memory footprint. Keep in mind that the spherical harmonics control the view-dependent color rendering. Practically sh_degree=1
normally still looks good from my experience.
densify_*_iter
: Controls the span of iterations where adaptive densification is performed. Tweaking the argument might end in fewer points being spawned hence a lower memory footprint. Note that this might need a huge impact on the standard.
If every part seems well, you hopefully find yourself with a scene as shown below. In the following section we jump into the visualization and postprocessing.
You’ll be able to actually see quite nice the gaussian shape of individual splats in low density regions.
Post Processing
Although the Gaussian splatting repo comes with its own visualizer, I prefer to make use of Super Splat because it is far more intuitive and you possibly can directly edit your scene.
So to start, head the Super Splat editor and open your ply-file, situated under ./output/
.
I normally begin to remove a lot of the background points using a sphere as indicated below.