I’ve made quite a few attempts at editing existing pictures with img2img. However, at low strengths the pictures tend to be modified too little, while at high strengths the picture is modified in undesired ways. /u/bloc97 posted here about a better way of doing img2img that would allow for more precise editing of existing pictures – by finding the noise that will cause SD to reconstruct the original image.
I made a quick attempt at reversing the k_euler sampler, and ended up with the code I posted in a reply to the post by bloc97 linked above. I’ve refined the code a bit and posted it on GitHub here:
If image is a PIL image and model is a LatentDiffusion object, then find_noise_for_image can be called like this:
noise_out = find_noise_for_image(model, image, 'Some prompt that accurately describes the image', steps=50, cond_scale=1.0)
The output noise tensor can then be used for image generation by using it as a “fixed code” (to use a term from the original SD scripts) – in other words, instead of generating a random noise tensor (and possibly adding that noise tensor to an image for img2img), you use the noise tensor generated by find_noise_for_image_model.
This method isn’t perfect – deviate too much from the prompt used when generating the noise tensor, and the generated images are going to start differing from the original image in unexpected ways. Some experimentation with the different parameters and making the prompt precise enough will probably be necessary to get this working. Still, for altering existing images in particular ways I’ve had way more success with this method than with standard img2img. I have yet to combine this with bloc97’s Prompt-to-Prompt Image Editing, but I’m guessing the combination will give even more control.
All suggestions for improvements/fixes are highly appreciated. I still have no idea what the best setting of cond_scale, for example, and in general this is just a hack that I made without reading any of the theory on this topic.
Edit: By the way, the original image used in the example is from here and is the output of one of those old "this person does not exist" networks, I believe. I've tried it on other photos (including of myself :), so this works for "real" pictures as well. The prompt that I used when generating the noise tensor for this was "Photo of a smiling woman with brown hair".
Yeah that's a good distinction to make, I'm trying to make it accessible and less complicated, but it's important to make the distinction that the seed is what is used to produce the initial noise, which is used to diffuse / iterate on to get to a final product
It's a really important distinction because there's a lot more potential entropy in the noise than in the seed. There may be a noise pattern that results in the image, but there probably isn't a seed that makes that specific noise pattern.
There might be countless noise patterns in math but not in reality. The vast majority of those patterns will certainly result in identical result images which is also true for the 2^32 seed variations.
A lot of them are probably going to show the same result.
155
u/Aqwis Sep 11 '22 edited Sep 11 '22
I’ve made quite a few attempts at editing existing pictures with img2img. However, at low strengths the pictures tend to be modified too little, while at high strengths the picture is modified in undesired ways. /u/bloc97 posted here about a better way of doing img2img that would allow for more precise editing of existing pictures – by finding the noise that will cause SD to reconstruct the original image.
I made a quick attempt at reversing the
k_euler
sampler, and ended up with the code I posted in a reply to the post by bloc97 linked above. I’ve refined the code a bit and posted it on GitHub here:link to code
If
image
is a PIL image andmodel
is aLatentDiffusion
object, thenfind_noise_for_image
can be called like this:The output noise tensor can then be used for image generation by using it as a “fixed code” (to use a term from the original SD scripts) – in other words, instead of generating a random noise tensor (and possibly adding that noise tensor to an image for img2img), you use the noise tensor generated by
find_noise_for_image_model
.This method isn’t perfect – deviate too much from the prompt used when generating the noise tensor, and the generated images are going to start differing from the original image in unexpected ways. Some experimentation with the different parameters and making the prompt precise enough will probably be necessary to get this working. Still, for altering existing images in particular ways I’ve had way more success with this method than with standard img2img. I have yet to combine this with bloc97’s Prompt-to-Prompt Image Editing, but I’m guessing the combination will give even more control.
All suggestions for improvements/fixes are highly appreciated. I still have no idea what the best setting of
cond_scale
, for example, and in general this is just a hack that I made without reading any of the theory on this topic.Edit: By the way, the original image used in the example is from here and is the output of one of those old "this person does not exist" networks, I believe. I've tried it on other photos (including of myself :), so this works for "real" pictures as well. The prompt that I used when generating the noise tensor for this was "Photo of a smiling woman with brown hair".