Wow, with my very little comprehension of the mechanics of diffusion i didn't think it was possible to do such a "renoising" (i thought it was a bit like finding the original content from a hash). This feels like an absolute killer feature...
Your thought wasn't completely wrong. What you're getting here is more like an init image than noise. Even if the image was a generated one, you'd need the exact same prompt ( and some of the other variables ) used during generation to get actual gaussian noise or even close.
Since those are not available, and the prompt is guessed , what's happening here can be conceptualized more as ( essence of that picture ) - ( essence of that guessed prompt ). So the init image ( actually latents ) you're left with after this process has all the concepts of the photo that's not in the the prompt "photo of a smiling woman with brown hair" i.e. composition , background etc.
Now what that init image ( if converted to image from latents ) looks like and whether it's even comprehensible as that by the human brain, I'm not sure. It would be fascinating to see what it looks like and if it's comprehensible.
12
u/HarisTarkos Sep 11 '22
Wow, with my very little comprehension of the mechanics of diffusion i didn't think it was possible to do such a "renoising" (i thought it was a bit like finding the original content from a hash). This feels like an absolute killer feature...