Sketch to 3D
Tech demo · 2021
Plenty of people can't draw well, or don't want to spend ages drawing something accurate. So the idea was: scribble a rough doodle, let the model figure out what it's looking at, and hand back a clean 3D version of it. In 2021 this was still an open research problem. The good models mostly spat out point clouds, which you can't really use as assets, and the ones that made actual meshes wanted clean reference images, not a quick scribble. Still not fully solved today, just less embarrassing.
Same trick CLIP uses for text and images, applied to sketches and meshes. Two separate encoders, one for sketches and one for 3D meshes, trained so a doodle and its matching 3D model land at similar coordinates in a shared embedding space. The diagram below shows the mesh autoencoder. The sketch encoder lived in a separate model, with its latents aligned to match.
The hardest part was the data, not the model. Making arbitrary 3D meshes watertight and genus-zero is genuinely not a solved problem (wasn't then, still kind of isn't). And every mesh needed a paired hand-drawn-looking sketch, because no real user gives you a clean reference. Heuristics carried both ends. They kinda worked. Good enough to train on.
Always meant as a tech demo, not a product. A non-technical person doodles something and gets a family of 3D assets back in under two minutes. Sharp enough to help raise $3M.