r/StableDiffusion Sep 11 '22

A better (?) way of doing img2img by finding the noise which reconstructs the original image Img2Img

Post image

View all comments


u/tinman_inacan Sep 12 '22

Can you provide a bit of a technical explanation of how to apply this technique?

Automatic1111 has implemented your code on the webui project, and I've been trying it out. It works perfectly for recreating the image, but I can't seem to figure out how to actually do anything with it. It just comes out looking exactly the same - overbaked - no matter how I mess with the settings or prompt.

Still, absolutely incredible that you threw this together, especially without reading the theory behind it first!


u/Daralima Sep 12 '22

That's odd, especially that the settings have no effect. Are you changing the original prompt window perhaps? I've also found that that has no effect whatsoever, even when left empty. You need to change the regular prompt, if you aren't doing so already. However using your original prompt or using a prompt that makes sense given the image (or alternatively using clip interrogator) as a base in the normal prompt window seems to work well, as I used the exact same prompts as in the image of this post along with the original image and got nearly identical results to the author.

This is my experience overbaking issue, but since you say that changing the settings does nothing I am not sure if it'll help in your case:

there seems to be a strong interplay between the decode settings and the regular sampling step count: increasing the decode CFG scale and steps all the way to 0.1 and 150 respectively seems to fully fix the overbaking when also combined with a somewhat unusually low step count; 10-20 seemed to work in the case I first tried (and seems to work as a general rule for other attempts I've made). But these settings do not seem to work universally:

sometimes setting the CFG scale too low seems to remove certain details, so experimenting with values between 0.1 and 1 is worthwhile if certain things are missing or look off (assuming those things are of consequence). And while decode steps seem to always decrease the level of overbake, it does not always seem to result in something closer to the original, and in a couple cases it made some weird changes instead.
I'd recommend testing with 0.1 and 150 decode CFG/steps at first, with a low sampling count and an empty prompt to make sure the image recreation goes as hoped, until you're really close to the original without much/any overbake. Then decreasing/increasing one by a fairly large amount if it doesn't yield good results, and once you've got the image you want you can either add the whole prompt like in this post and edit that, or add keywords which seems to give a similar effect.
Hope this is coherent enough to be somewhat helpful if you haven't figured it out by now!

If the author sees this comment, please correct anything that doesn't add up as I've figured all this out through experimentation and know nothing about the underlying code.


u/tinman_inacan Sep 13 '22

Thank you so much for your detailed response! With the help of your advice and a lot of trial and error, I think I have it working now. Still having trouble with overbaking, but at least I have some idea of what's going on. I think I was just confused about which prompts do what, which settings are disabled, how the sliders effect each other, etc.

At least I got some neat trippy pyramids out of it lol.