- Published on
Stable Fast 3D - Notes

TOC
Introduction
The most effective Image-to-3d method so far (considering generation speed, quality, number of vertices, etc.)
Issues
Vertex Coloring
- Uses Vertex Color, resulting in a large number of vertices.
- Solution: Proposes highly parallelizable fast box projection-based UV unwrapping, completing UV unwrapping in 0.5 seconds.
Light Bake-in
- Shadows in the image are baked into the texture.
- Solution: Separates shadows using Spherical Gaussians (illumination modeling, Light Net).
Lack of Material Properties
- Previous methods lack material properties, making them unresponsive to changes in lighting conditions.
- Solution: Predict material properties with Material Net.
Marching Cube
- "Stair-stepping" occurs when transferring volumetric representations to meshes using marching cubes.
- Solution: Uses Deep Marching Tetrahedra (DMTet) instead of Marching Cube.
Improvements
Backbone Model
- DINO → DINOV2: Generates image features (vision transformer).
- Two-stream transformer: Generates Triplane with high efficiency.
Material Estimation
- Estimates metallic and roughness properties.
- Uses frozen CLIP as a backbone, and trains the extracted features with MLP to predict material properties (contrastive learning), improving reflectivity and quality.
Illumination Modeling
- Estimates illumination in the input image using Triplanes output from the backbone model.
- Uses 2 CNN layers + 3 linear layers to predict spherical Gaussian illumination maps.
- Includes lighting demodulation loss.
Mesh Extraction and Refinement
- Generates meshes from Triplane using DMTet.
- Optimizes meshes (offset and normal) using MLP.
Fast UV-Unwrapping and Export
- Pipeline:
- Box-projection: Projects the 3D mesh vertices onto cube faces to determine UV coordinates, achieving high efficiency through parallelization.
- Exports the mesh + UV texture as a GLB file.
Implementation
Input Image Size
- Full: 1024×1024 → Half: 512×512 → Quarter: 256×256.
- Lower resolutions may cause issues.
Paint Light
Super Resolution
- Left → Right: Original → Original with lighting → Original ×2 → Original ×2 with lighting.
Size Estimation
- MiDaS depth estimation → Size estimation (requires a reference object).
- ARKit