Introduction

The most effective Image-to-3d method so far (considering generation speed, quality, number of vertices, etc.)

Vertex Coloring

Uses Vertex Color, resulting in a large number of vertices.
Solution: Proposes highly parallelizable fast box projection-based UV unwrapping, completing UV unwrapping in 0.5 seconds.

Light Bake-in

Shadows in the image are baked into the texture.
Solution: Separates shadows using Spherical Gaussians (illumination modeling, Light Net).

Lack of Material Properties

Previous methods lack material properties, making them unresponsive to changes in lighting conditions.
Solution: Predict material properties with Material Net.

Marching Cube

"Stair-stepping" occurs when transferring volumetric representations to meshes using marching cubes.
Solution: Uses Deep Marching Tetrahedra (DMTet) instead of Marching Cube.

Backbone Model

Material Estimation

Estimates metallic and roughness properties.
Uses frozen CLIP as a backbone, and trains the extracted features with MLP to predict material properties (contrastive learning), improving reflectivity and quality.

Illumination Modeling

Estimates illumination in the input image using Triplanes output from the backbone model.
Uses 2 CNN layers + 3 linear layers to predict spherical Gaussian illumination maps.
Includes lighting demodulation loss.

Mesh Extraction and Refinement

Fast UV-Unwrapping and Export

Pipeline:
- Box-projection: Projects the 3D mesh vertices onto cube faces to determine UV coordinates, achieving high efficiency through parallelization.
- Exports the mesh + UV texture as a GLB file.