logo
Published on

Stable Fast 3D - Notes

Stable Fast 3D - Notes

Introduction

The most effective Image-to-3d method so far (considering generation speed, quality, number of vertices, etc.)

Issues

Vertex Coloring

  • Uses Vertex Color, resulting in a large number of vertices.
  • Solution: Proposes highly parallelizable fast box projection-based UV unwrapping, completing UV unwrapping in 0.5 seconds.

Light Bake-in

  • Shadows in the image are baked into the texture.
  • Solution: Separates shadows using Spherical Gaussians (illumination modeling, Light Net).

Lack of Material Properties

  • Previous methods lack material properties, making them unresponsive to changes in lighting conditions.
  • Solution: Predict material properties with Material Net.

Marching Cube

  • "Stair-stepping" occurs when transferring volumetric representations to meshes using marching cubes.
  • Solution: Uses Deep Marching Tetrahedra (DMTet) instead of Marching Cube.

Improvements

Backbone Model

  • DINO → DINOV2: Generates image features (vision transformer).
  • Two-stream transformer: Generates Triplane with high efficiency.

Material Estimation

  • Estimates metallic and roughness properties.
  • Uses frozen CLIP as a backbone, and trains the extracted features with MLP to predict material properties (contrastive learning), improving reflectivity and quality.

Illumination Modeling

  • Estimates illumination in the input image using Triplanes output from the backbone model.
  • Uses 2 CNN layers + 3 linear layers to predict spherical Gaussian illumination maps.
  • Includes lighting demodulation loss.

Mesh Extraction and Refinement

  • Generates meshes from Triplane using DMTet.
  • Optimizes meshes (offset and normal) using MLP.

Fast UV-Unwrapping and Export

  • Pipeline:
    • Box-projection: Projects the 3D mesh vertices onto cube faces to determine UV coordinates, achieving high efficiency through parallelization.
    • Exports the mesh + UV texture as a GLB file.

Implementation

Input Image Size

  • Full: 1024×1024 → Half: 512×512 → Quarter: 256×256.
  • Lower resolutions may cause issues.

Paint Light

Super Resolution

  • nunif/waifu2x.

    • Left → Right: Original → Original with lighting → Original ×2 → Original ×2 with lighting.

Size Estimation