DreamBooth —How Google hacks Diffusion Model to generate personalized photos

Trung Thanh Tran
6 min readOct 26, 2022
Photo by Jarred Clapperton on Unsplash

One major issue that AI generation models attempt to address is the generation of photographs in a controlled manner. When I say “Controllable,” I’m referring to the ability of Art Generation models to generate output with pre-defined context, expected subjects, and detailed cues.

While current models can generate stunning photos, they cannot re-draw the appearance of subjects in a given reference collection or synthesize a shot of the same subjects in different circumstances.

Google announced its solution to the problem in a paper titled “DreamBooth — Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation.” I love their work, so I summarize the paper in this article and hope that it will inspire people to investigate their idea.

DreamBooth is wonderful, according to their release, because:

By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, appearance modification, and artistic…

--

--