r/StableDiffusion Sep 11 '22 Silver 1 Helpful 1 Burning Cash 1

A better (?) way of doing img2img by finding the noise which reconstructs the original image Img2Img

Post image
917 Upvotes

156

u/Aqwis Sep 11 '22 edited Sep 11 '22 Silver

I’ve made quite a few attempts at editing existing pictures with img2img. However, at low strengths the pictures tend to be modified too little, while at high strengths the picture is modified in undesired ways. /u/bloc97 posted here about a better way of doing img2img that would allow for more precise editing of existing pictures – by finding the noise that will cause SD to reconstruct the original image.

I made a quick attempt at reversing the k_euler sampler, and ended up with the code I posted in a reply to the post by bloc97 linked above. I’ve refined the code a bit and posted it on GitHub here:

link to code

If image is a PIL image and model is a LatentDiffusion object, then find_noise_for_image can be called like this:

noise_out = find_noise_for_image(model, image, 'Some prompt that accurately describes the image', steps=50, cond_scale=1.0)

The output noise tensor can then be used for image generation by using it as a “fixed code” (to use a term from the original SD scripts) – in other words, instead of generating a random noise tensor (and possibly adding that noise tensor to an image for img2img), you use the noise tensor generated by find_noise_for_image_model.

This method isn’t perfect – deviate too much from the prompt used when generating the noise tensor, and the generated images are going to start differing from the original image in unexpected ways. Some experimentation with the different parameters and making the prompt precise enough will probably be necessary to get this working. Still, for altering existing images in particular ways I’ve had way more success with this method than with standard img2img. I have yet to combine this with bloc97’s Prompt-to-Prompt Image Editing, but I’m guessing the combination will give even more control.

All suggestions for improvements/fixes are highly appreciated. I still have no idea what the best setting of cond_scale, for example, and in general this is just a hack that I made without reading any of the theory on this topic.

Edit: By the way, the original image used in the example is from here and is the output of one of those old "this person does not exist" networks, I believe. I've tried it on other photos (including of myself :), so this works for "real" pictures as well. The prompt that I used when generating the noise tensor for this was "Photo of a smiling woman with brown hair".

77

u/GuavaDull8974 Sep 11 '22

This is spectacular! I made feature request for it already on webui, you think you can produce actualy working comit for it ?

https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/291

14

u/hefeglass Sep 12 '22

Its been implemented by AUTOMATIC1111 but I cant seem to figure out how to use it. Anyone able to explain? I am trying to use the alternate img2img script.

23

u/jonesaid Sep 12 '22

You go to the img2img tab, select the img2img alternative test in the scripts dropdown, put in an "original prompt" that describes the input image, and whatever you want to change in the regular prompt, CFG 2, Decode CFG 2, Decode steps 50, Euler sampler, upload an image, and click generate.

2

u/Plopdopdoop Sep 13 '22 edited Sep 14 '22

So when I try those settings the output isn't anything close, like not even recognizable objects in the resulting image (original being 'picture of man wearing a read shirt').

9

u/jonesaid Sep 13 '22

It seems to be very sensitive to decode cfg and decode steps. I use decode cfg at about 2, and decode steps from 35-50. Make sure regular cfg is about 2 too.

2

u/BeanieBytes Sep 14 '22

I'm also getting this issue. does my denoising strength need to be altered?

2

u/Breadisgood4eat Sep 14 '22

I had an older install and just copied this new repo over the top and was getting the same issue. I reinstalled from scratch and now it's working.

1

u/2legsakimbo Sep 13 '22

hhmm that alternative test isnt showing up. even though i just forced updated by deleting verv and repositary folders.

I must have missed a step

3

u/tobboss1337 Sep 13 '22

You just deleted the additional python repos and environment. So you returned to the state of initial downloading but not the newest version. Did you pull the changes from Automatic's repo?

1

u/2legsakimbo Sep 13 '22 edited Sep 13 '22

no, thank you for letting me know that i have to do that.

→ More replies

3

u/redboundary Sep 11 '22

Isn't it the same as setting "masked content" to original in the img2img settings?

51

u/animemosquito Sep 11 '22

no, this is finding which "seed" basically would lead to SD generating the original image, so that you are able to modify it in less destructive ways.

22

u/MattRix Sep 11 '22

yep exactly! Though to be somewhat pedantic it’s not the seed, it’s the noise itself.

8

u/animemosquito Sep 11 '22

Yeah that's a good distinction to make, I'm trying to make it accessible and less complicated, but it's important to make the distinction that the seed is what is used to produce the initial noise, which is used to diffuse / iterate on to get to a final product

5

u/Trainraider Sep 12 '22

It's a really important distinction because there's a lot more potential entropy in the noise than in the seed. There may be a noise pattern that results in the image, but there probably isn't a seed that makes that specific noise pattern.

9

u/[deleted] Sep 12 '22

[removed] — view removed comment

12

u/ldb477 Sep 14 '22

That’s at least double

→ More replies

6

u/almark Sep 12 '22

this means we can keep the subject we like and alter it, move the model, poses, different things in the photo.

1

u/[deleted] Sep 12 '22

... make perfecto hands, I'd hazard a guess

3

u/almark Sep 12 '22

hands are floppy things - laughs

I still have nightmares from first glance in SD.

→ More replies

51

u/bloc97 Sep 11 '22

Awesome, I can't wait to combine this with cross attention control, this will actually allow people to edit an image however they want at any diffusion strengths! No more the problem of img2img ignoring the initial image at high strengths. I will take a look at the code tomorrow...

Also I believe (and hope) that inpainting with this method with cross attention control would yield far superior results than simply masking out parts of an image and adding random noise. What a time to be alive!

6

u/enspiralart Sep 12 '22

2 minute papers bump!

5

u/gxcells Sep 11 '22

Then you will probably update your jupyter notebook with k diffusers?

5

u/bloc97 Sep 11 '22

The current version uses k-lms by default.

2

u/gxcells Sep 12 '22

Ok, thanks a lot

9

u/no_witty_username Sep 11 '22

God speed my man. This feature is gonna be massive.

13

u/ethereal_intellect Sep 11 '22 edited Sep 11 '22

The prompt that I used when generating the noise tensor for this was "Photo of a smiling woman with brown hair".

Wait, so it take the assumed prompt as input? What if you put a wrong prompt, like a photo of a dog with brown hair. Does the learned noise overwrite the prompt and still draw a human face? I see u/JakeWilling asked basically the same too. It would/could be interesting if "close enough" descriptions from the blip+clip system work

Edit: There's also https://github.com/justinpinkney/stable-diffusion this which uses image embeddings instead of text. Wonder if it would make the reconstructions more accurate? Tho at that point you got no variables left to control lol

Edit2: Style transfer with the above might be interesting, get clip image1, get noise seed, get clip image2 and run it on the same seed

2

u/2legsakimbo Sep 13 '22

Edit: There's also https://github.com/justinpinkney/stable-diffusion this which uses image embeddings instead of text. Wonder if it would make the reconstructions more accurate? Tho at that point you got no variables left to control lol

this looks amazing

9

u/AUTOMATIC1111 Sep 12 '22

That last line in gist where you multiply by sigmas[-1] was completely destroying the picture. Don't know if you added it in jest or something but it took a lot to discover and fix it.

10

u/[deleted] Sep 11 '22

[deleted]

3

u/ByteArrayInputStream Sep 11 '22

Haven't tried it, but my guess would be that it wouldn't be able to find a seed that accurately resembles the original image

4

u/Doggettx Sep 11 '22 edited Sep 11 '22

Very cool, definitely gonna have to play with this :)

You're example is missing a few things though, like pil_img_to_torch() the tqdm import and the collect_and_empty() function

I assume it's something like:

def collect_and_empty():
    gc.collect()
    torch.cuda.empty_cache()

6

u/Aqwis Sep 12 '22

Sorry, I went and added pil_img_to_torch to the gist now! I removed collect_and_empty a couple of hours ago as it was slowing things down and the VRAM issue mysteriously vanished.

2

u/rservello Sep 12 '22

Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same

thoughts on this error now?

1

u/Etiennera Sep 12 '22

Did you halve your model to save vram?

1

u/rservello Sep 12 '22

I did. But I tried at full and get the same error.

3

u/Inevitable_Impact_46 Sep 11 '22

I'm guessing:

def pil_img_to_torch(img, half=True):
img = img.convert('RGB')
img = torch.tensor(np.array(img)).permute(2, 0, 1).float()
if half:
    img = img.half()
return img

1

u/rservello Sep 11 '22

I'm getting an error, pil_img_to_torch not defined. Do you know how to fix this?

3

u/backafterdeleting Sep 12 '22

I wonder how the prompt you use for reversing the noise affects how you can alter the image by changing the prompt, before getting an unrecognisable image.

E.g: You used "photo of a smiling woman with brown hair"

but if you just used "photo of a smiling woman" and got the noise for that prompt, and then added "with blue hair", would it be a worse result?

Or if you added "in the park on a sunny day" could you then more easily change it to, "on a rainy day"?

3

u/Aqwis Sep 12 '22

Yes, you're exactly right – when I made the examples I first used the noising prompt "photo of a smiling woman" and got inconsistent results when generating images with "...with X hair" added to the prompt. After adding "...with brown hair" to the noising prompt the results improved significantly.

On the other hand, for other pictures I've had the most success noising them with a CFG scale (cond_scale) setting of 0.0, which means that the prompt used when noising should have no impact whatsoever. In those cases I've often been able to use prompts like "photo of a woman with brown hair" in image generation despite that!

It's hard to conclude anything besides this method being quite inconsistent both in terms of how well it works and which settings lead to the best results. As mentioned I hope that combining this with prompt-to-prompt image editing can lead to more consistent results.

2

u/rservello Sep 11 '22 edited Sep 11 '22

What does this return? A seed value? If it produces a latent image or noise sample that needs to be inserted, where is that done? Can you provide more info on how to actually use this?

2

u/dagerdev Sep 11 '22

The output noise tensor can then be used for image generation

This could be a ignorant question, I hope not. But this output noise tensor can be translated back to an image? That would help a lot to visualize it.

2

u/starstruckmon Sep 11 '22

Yes. Just run it through the decoder. I'm pretty curious what it looks like too.

1

u/jfoisdfbjc218 Sep 11 '22

I'm trying to run this script after copying it to my script folder, but it keeps telling me there's "No module named 'k_diffusion'". How do I install this module? I'm kind of a noob.

2

u/ParanoidConfidence Sep 11 '22

I don't know the answer, but this has been discussed before, maybe in this link lies the answer for you?

https://www.reddit.com/r/StableDiffusion/comments/ww31wr/modulenotfounderror_no_module_named_k_diffusion/

1

u/WASasquatch Sep 13 '22

Maybe your k_diffusion us under the folder "k-diffusion" like mine. I had to change to k-diffusion.k_diffusion

1

u/EmbarrassedHelp Sep 12 '22

I love how your example image appears to be a StyleGAN 2 rendering, instead of a real stock photo.

1

u/summerstay Sep 12 '22

This is cool! Once I find the noise vector for a starting image, how do I then generate a new version of the starting image with a revised prompt? I don't see the code for that. Or, if it is a simple modification to txt2img.py or img2img.py, maybe you can just explain what I need to do.

1

u/the_dev_man Sep 12 '22

where i get the model virable from? can someone make a colab working example? just with this feature?

→ More replies

59

u/sassydodo Sep 11 '22

You should summon hlky and automatic in this thread or either do pull request on this into their webUIs repos - that would be much better from user experience side

I think I've seen some work in either hlky or auto's repo that mentioned cross attention control

46

u/MarvelsMidnightMoms Sep 11 '22

Automatic1111 has been so on the ball with updates to his fork these past 2 weeks+. Just today he added "Interrogate" in his img2img tab, which is img2prompt.

Yesterday, or the day before, he added prompt "presets" to save time on retyping your most commonly used terms.

Hlky's activity has died down quite a bit which is a bit unfortunate. His was the first webui fork that I tried.

27

u/Itsalwayssummerbitch Sep 11 '22

Hlky's is essentially going through a whole remake in streamlit UI, it should be much better than before and be easier to add things to it in the future, but it's going to take a week or two to get it out of dev stage.

The gradio version is only getting bigfixes btw, no new features as far as I'm aware.

Either way feel free to add it in the discussion section of the repo 😅

9

u/hsoj95 Sep 11 '22

^ This!

We are still looking for features to add, and I'm gonna send a link to this to the discord for Hlky's fork.

2

u/ImeniSottoITreni Sep 12 '22

Automatic1111 has been so on the ball with updates to his fork these past 2 weeks+. Just today he added "Interrogate" in his img2img tab, which is img2prompt.

Can you please give me some more info and compare about hlky and automatic?
I tought they were 2 dead repos. I mean, they put out their thing: hlky with webui and AUTOMATIC1111 with the outpainting stuff and that was it.

I pushed so far to make a pull request to neonsecret repo to add webui and he accepted to merge hlky webui, which is basically a fork that allows you to make high res images with low vram

But I'm loosing a bit of grip on all the news. Can you please tell me what we have now? and what are the news for hlky and others?

2

u/matahitam Sep 12 '22

You might want to use the dev branch for bleeding edge in hlky (re base to sd-webui) . There's also a discord, link is in readme if I'm not mistaken.

2

u/matahitam Sep 12 '22

Adding discord link here for reference. https://discord.gg/frwNB7XV

1

u/ImeniSottoITreni Sep 12 '22

Thanks I will!

6

u/sassydodo Sep 11 '22

yeah, I'd go with auto's version, but hlky has got all the fancy upscalers like GoBIG and also it doesn't crash as much as auto's. Tho Im still on auto's friday version, so it might have been fixed already.

6

u/halr9000 Sep 11 '22

Hlky is switching to atreamlit but seems features are still going into both branches. GoBig is sweet! I think auto added similar called sd-upscale but I haven't tried it yet.

11

u/AUTOMATIC1111 Sep 12 '22

I added sd upscale and afterwards hlky specifically copied my code of sd upscale code and added it as gobig

2

u/th3Raziel Sep 12 '22

I just want to say HLKY himself didn't do it, I did, I saw your implementation and used it (and txt2imghd) as the base for GoBig in the hlky fork, I'm not sure why is this so forbidden as large parts of the hlky fork is already copied code from your repo so I didn't even think twice about utilizing it.

I also added LDSR to the hlky fork which I modified from the original repo and created the image lab tab etc.

To be clear, I'd rather add stuff to your repo but I approached you on the SD discord and you said you'll likely not merge PRs that aren't made by you and that originally hlky PR'd a feature to your repo which you rejected which in turn prompted him to make his own fork.

It's too bad there's all this useless drama around the different UIs, it just creates a lot of confusion.

5

u/AUTOMATIC1111 Sep 12 '22

If I'm remembering correctly, I said that I won't accept big reworks unless we decide on them beforehand. I'm accepting a fair amount of code from different people.

The 'feature' I rejected was a change that would save all pictures in jpeg format for everyone.

1

u/StickiStickman Sep 12 '22

But why change the name then? Huh.

2

u/th3Raziel Sep 12 '22

I changed the name to GoBig as it's the original name for this approach.

2

u/AUTOMATIC1111 Sep 12 '22

To make it less obvious to user that he copied it.

1

u/halr9000 Sep 12 '22

Well, if true that's not cool. Should be relatively easy to prove by looking at commits. But the UIs are definitely diverging, so there's original work being done to some extent. Sorry if there's some bad behavior going on though.

2

u/Itsalwayssummerbitch Sep 12 '22

It's funny you mention the commits, they DO seem to tell a different story. The funniest of which is that the code Automatic111 used for the sdupscale was originally called "text2imagehd", and a port of someone else's work, which was called GoBig :)

https://github.com/jquesnelle/txt2imghd The link was literally in the Auto's code's comments.

Seriously though, ffs, this is open source, can we not just be decent humans and work together? I don't get all this drama, it's not Middle school 🙃

6

u/AUTOMATIC1111 Sep 12 '22

i credit the person who made txt2imghd both in comments and in the main readme in credits section for the idea.

I also did not take a single line of his code.

The decision to not work with me was on hlky, he was the one who forked my repo.

You're free to link the different story in commits because I do not see it.

→ More replies
→ More replies

1

u/TiagoTiagoT Sep 11 '22

Are the two projects different enough they can't be merged?

15

u/jansteffen Sep 11 '22

The hlky one actually started as a fork of the automatic1111 UI, but that was basically on day 1 of SD release and they've both changed a ton since then, with major reworks and refactors, sometimes even implementing the same functionality in different ways. I don't think merging them would be possible at this point, it'd be a lot easier to just cherry pick features that one has that the other one doesn't and weave that code into the existing code base.

1

u/ts4m8r Sep 12 '22

How do you install new versions of webui if you already have an old one installed?

3

u/sassydodo Sep 12 '22

I mean "installed" is just put in a folder with moldels placed in, everything else is in virtual environment. You can just download new version, or use git - in this case you just git clone once, and use git pull every time you think there's a worthy update

1

u/matahitam Sep 12 '22

Often it's as simple as performing git pull. Let me know in sd-webui discord if you need more details. https://discord.gg/frwNB7XV

2

u/manueslapera Sep 13 '22

that's a shame, Id rather manage a python environment (hlky ui) than having to install .net just to use automatic's

3

u/Dogmaster Sep 11 '22

And he still hasnt fixed the masking bug causing deepfrying, the commit is waiting :(

56

u/gxcells Sep 11 '22

That's just incredible, you unlocked the next generation of Photoshop. I can't describe how crazy this last month has been since SD release. I wish I had studied coding to participate to all of this.

21

u/Caldoe Sep 12 '22

haha just wait for a few weeks, people are already coming out with GUI for normal people

It won't take long

9

u/ExponentialCookie Sep 12 '22

It's never too late. There are more resources now than ever.

2

u/Still_Jicama1319 Sep 13 '22

is python enough to understand all this terminologies?

3

u/ExponentialCookie Sep 13 '22

At a high level, it's a must to understand how the applications are built. Beyond that, linear algebra is pretty much a prerequisite for building out the neural networks. Understanding the jargon isn't too hard, but the implementation is the hard part.

28

u/entrep Sep 11 '22

5

u/kaliber91 Sep 11 '22

Is there a simple way to update to from the previous version to the newest on PC, or do we need to go through the installation process from the start?

10

u/-takeyourmeds Sep 12 '22

literally download the repo zip and extract on the main folder, say yes to overwrite all

1

u/Limitlez Sep 12 '22

Were you able to figure out how to use the script in webui? I was able to run it, but could never find the seed.

7

u/Dogmaster Sep 11 '22

You can use a tool like beyond compare, check both folders and just merge the files changed form the old revision

I use that for "updating" my working repos

2

u/kaliber91 Sep 11 '22

thanks, worked

8

u/ExponentialCookie Sep 11 '22

On Linux, a simple git pull in the install directory works for me. I can't speak on Windows install.

8

u/justhitmidlife Sep 12 '22

Should work on windows as well.

6

u/an0maly33 Sep 12 '22

Yep, just git pull on windows as well, assuming one cloned the repo initially.

3

u/jonesaid Sep 12 '22

I can't wait to try this! Now, just gotta get Automatic's repo working without CUDA OOM errors...

2

u/Scriptman777 Sep 13 '22

You can try to add the --medvram parameter or even the low one. It will be a lot slower, but it work with MUCH less VRAM. Also try to keep the images small.

→ More replies

16

u/Adreitz7 Sep 11 '22

This is great! I like to see these innovations that dive into the inner workings of SD. This looks like a powerful feature. In your example mosaic, is the second image meant to be the base reconstruction, and the following images modifications of it? I’m asking because the second image looks most like the first, but I noticed that it is more vivid — the saturation has increased. It’s a minor thing here, but could cause problems if it is a general effect of your technique. Any idea why this happened?

21

u/Aqwis Sep 11 '22

Yeah, the second image is basically the base reconstruction. In general, converting an image to its latent representation and then back again to an image is going to lose a little bit of information, so that the two images won't be identical, but in most cases they will be very close. However, in this case I think the difference in contrast is caused by what happens at the very end of find_noise_for_image, namely:

return (x / x.std()) * sigmas[-1]

This basically has the effect of increasing the contrast. It shouldn't be necessary, but if I don't do this then in many cases the resulting noise tensor will have a significantly lower standard deviation than a normal noise tensor, and if used to generate an image the generated image will be a blurry mess. It's quite possible the need to do this is caused by some sort of bug that I haven't discovered.

14

u/Adreitz7 Sep 11 '22

It’s also fascinating how perfect the reconstruction is. The biggest changes I can see are the shape of the right eyebrow and the profile of the left cheek. Your technique reproduced individual strands of hair!

14

u/Aqwis Sep 11 '22

It's very likely that the reconstruction isn't actually as good as it could be – I used 50 sampler steps to create the noise tensor for this example and 50 to generate each of the images from the noise tensor, but I'd previously noticed that the reconstructions seemed to be even better if I used a few hundred sampler steps to create the noise tensor.

11

u/jonesaid Sep 11 '22

Hmm, I wonder if this would have made my work on the Van Gogh photo recreation much easier.

Starting from his 1887 self-portrait as input image, I struggled with getting a very painted look like the original at low denoising strength, or a completely different person at higher strengths. I wanted to keep the composition of person basically the same, while changing just the style of the image. I wanted to tell SD to make this painting input in the style of a studio photograph. Using weights in the prompt helped somewhat (e.g. "studio photograph :0.8").

Would your technique help with that kind of restyling?

12

u/HarisTarkos Sep 11 '22

Wow, with my very little comprehension of the mechanics of diffusion i didn't think it was possible to do such a "renoising" (i thought it was a bit like finding the original content from a hash). This feels like an absolute killer feature...

6

u/starstruckmon Sep 12 '22

Your thought wasn't completely wrong. What you're getting here is more like an init image than noise. Even if the image was a generated one, you'd need the exact same prompt ( and some of the other variables ) used during generation to get actual gaussian noise or even close.

Since those are not available, and the prompt is guessed , what's happening here can be conceptualized more as ( essence of that picture ) - ( essence of that guessed prompt ). So the init image ( actually latents ) you're left with after this process has all the concepts of the photo that's not in the the prompt "photo of a smiling woman with brown hair" i.e. composition , background etc.

Now what that init image ( if converted to image from latents ) looks like and whether it's even comprehensible as that by the human brain, I'm not sure. It would be fascinating to see what it looks like and if it's comprehensible.

2

u/Bitflip01 Sep 14 '22

Am I understanding correctly that in this case the init image replaces the seed?

35

u/no_witty_username Sep 11 '22

This is huge. The ability to find a latent space representation of the original image in SD model opens up soooo many opportunities. This needs to be implemented in every repo. I see this being a standard feature for every repo out there.

3

u/Fazaman Sep 12 '22

The ability to find a latent space representation of the original image in SD model

So... uh... what does this mean for us that aren't as deep into the weeds as half the people on this sub seem to be?

→ More replies

12

u/Aqwis Sep 12 '22 edited Sep 12 '22

Made a few incremental updates to the Gist over the past few hours. Happy to see that a few SD forks/UIs are implementing something like this – they're better situated than me to make something that's useable by non-coders. :)

It seems that the results are quite often best when cond_scale is set to 0.0 – exactly why this is, I don't know. If anyone has an idea, I would love an explanation. With cond_scale at zero, the given prompt has no effect.

In the meantime, I've got to see my share of extremely creepy pictures while experimenting with other cond_scales. Run this on a portrait with cond_scale set to 5.0 and use the resulting noise to generate a picture (also with scale > 2.0) ... or don't. I wouldn't advise doing so personally, especially if you have a superstitious bent. (Or maybe you're going to get completely different results than I got, who knows?)

4

u/protestor Sep 12 '22

Happy to see that a few SD forks/UIs are implementing something like this – they're better situated than me to make something that's useable by non-coders. :)

There's this https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/9c48383608850a1e151985e814a593291a69196b but shouldn't you be listed as the author? (in that commit, https://github.com/AUTOMATIC1111 is the author)

2

u/NotModusPonens Sep 12 '22

In what way are the pictures creepy?

5

u/Aqwis Sep 12 '22

To be a bit vague: a combination of "photos" of very seriously messed up human-like figures and "drawings" of symbols that if they meant anything would have been the equivalent of these messages for the human psyche.

2

u/NotModusPonens Sep 12 '22

Ooof.

... we'll soon have to disable images in social media and email by default in order to avoid being "trolled" by someone with one of these, won't we?

3

u/Lirezh Sep 15 '22

Anyone with photoshop can troll you already for more than a decade, it does not seem to be a big concern.

→ More replies

2

u/gxcells Sep 12 '22

I am using the automatic1111 implementation of your code. It is really difficult to have an effect of a prompt on generating a new image (hair color change or adding a helmet for example). Often it changes the whole face etc

1

u/Limitlez Sep 12 '22

Are you using it through webui? If so, how do you use it? I can't seem to figure it out

2

u/gxcells Sep 12 '22

You use this colab https://colab.research.google.com/drive/1Iy-xW9t1-OQWhb0hNxueGij8phCyluOh Then in img2img tab, at the bottom you can find a dropdown menu for scripts, just use the script "img2imgalternate"

1

u/thedarkzeno Sep 12 '22

https://colab.research.google.com/drive/1Iy-xW9t1-OQWhb0hNxueGij8phCyluOh

got an error:

Loading model [e3b0c442] from /content/stable-diffusion-webui/model.ckpt
---------------------------------------------------------------------------
EOFError Traceback (most recent call last)
<ipython-input-3-75bc94f91c1d> in <module>
2 sys.argv = ['webui.py', "--share", "--opt-split-attention"]
3
----> 4 import webui
5 webui.webui()
3 frames
/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
918 "functionality.")
919
--> 920 magic_number = pickle_module.load(f, **pickle_load_args)
921 if magic_number != MAGIC_NUMBER:
922 raise RuntimeError("Invalid magic number; corrupt file?")
EOFError: Ran out of input

11

u/ExponentialCookie Sep 11 '22

This seems to be a very similar method to RePaint.

6

u/LetterRip Sep 11 '22 edited Sep 11 '22

You are right it does do that for the unmasked part.

9

u/tinman_inacan Sep 12 '22

Can you provide a bit of a technical explanation of how to apply this technique?

Automatic1111 has implemented your code on the webui project, and I've been trying it out. It works perfectly for recreating the image, but I can't seem to figure out how to actually do anything with it. It just comes out looking exactly the same - overbaked - no matter how I mess with the settings or prompt.

Still, absolutely incredible that you threw this together, especially without reading the theory behind it first!

5

u/Daralima Sep 12 '22

That's odd, especially that the settings have no effect. Are you changing the original prompt window perhaps? I've also found that that has no effect whatsoever, even when left empty. You need to change the regular prompt, if you aren't doing so already. However using your original prompt or using a prompt that makes sense given the image (or alternatively using clip interrogator) as a base in the normal prompt window seems to work well, as I used the exact same prompts as in the image of this post along with the original image and got nearly identical results to the author.

This is my experience overbaking issue, but since you say that changing the settings does nothing I am not sure if it'll help in your case:

there seems to be a strong interplay between the decode settings and the regular sampling step count: increasing the decode CFG scale and steps all the way to 0.1 and 150 respectively seems to fully fix the overbaking when also combined with a somewhat unusually low step count; 10-20 seemed to work in the case I first tried (and seems to work as a general rule for other attempts I've made). But these settings do not seem to work universally:

sometimes setting the CFG scale too low seems to remove certain details, so experimenting with values between 0.1 and 1 is worthwhile if certain things are missing or look off (assuming those things are of consequence). And while decode steps seem to always decrease the level of overbake, it does not always seem to result in something closer to the original, and in a couple cases it made some weird changes instead.
I'd recommend testing with 0.1 and 150 decode CFG/steps at first, with a low sampling count and an empty prompt to make sure the image recreation goes as hoped, until you're really close to the original without much/any overbake. Then decreasing/increasing one by a fairly large amount if it doesn't yield good results, and once you've got the image you want you can either add the whole prompt like in this post and edit that, or add keywords which seems to give a similar effect.
Hope this is coherent enough to be somewhat helpful if you haven't figured it out by now!

If the author sees this comment, please correct anything that doesn't add up as I've figured all this out through experimentation and know nothing about the underlying code.

2

u/tinman_inacan Sep 13 '22

Thank you so much for your detailed response! With the help of your advice and a lot of trial and error, I think I have it working now. Still having trouble with overbaking, but at least I have some idea of what's going on. I think I was just confused about which prompts do what, which settings are disabled, how the sliders effect each other, etc.

At least I got some neat trippy pyramids out of it lol.

16

u/HorrorExpress Sep 11 '22

I've been following bloc97's posts, while trying (slowly) to learn how this all works.

I just wanted to tip my hat to you both for the work you're doing.

I'm finding Stable Diffusion, as is, isn't remotely able to do what you've both started to do with it. I've had much frustration with how changing the color prompt for one part of the image changes it for other elements. Your example - like bloc's - look awesome.

Keep up the great work.

5

u/WASasquatch Sep 11 '22

This is pretty awesome, man. I'm wondering if this is possible with regular diffusers? Or is this something special with k-diffusion?

3

u/LetterRip Sep 11 '22

should likely work for most samplers that are deterministic.

1

u/WASasquatch Sep 11 '22

I guess my real question is "I don't understand the implementation, how do I implement it?" like a newb. Is the noise_out overriding some variable for diffusion?

5

u/AnOnlineHandle Sep 11 '22

This is another incredible development.

4

u/[deleted] Sep 12 '22 edited Sep 12 '22

[deleted]

8

u/borntopz8 Sep 12 '22

i guess the development of this feature is still in an early state, but i managed to get the first results.
basically you upload an image in img2img,
interrogate to obtain the prompt ---this gives me a low vram error but still generates the prompt that you'll find on top---
in the scripts you use img2imgalternative with that prompt you have obtained (check https://github.com/AUTOMATIC1111/stable-diffusion-webui in the img2imgalt section for the parameters they are very strict for now)
now generate and you should get an output very similar to your original image
if you change your main prompt now (still running the script with the previously obtained prompt) you should be able to modify the image keeping most of the details

3

u/Z3ROCOOL22 Sep 12 '22

I don't understand this part "interrogate to obtain the prompt---" Where you do that?

6

u/borntopz8 Sep 12 '22 edited Sep 12 '22

speaking about automatic1111 and his webui you should see in the img2img a button to generate and a button to interrogate. if not, update to the last version because the are making changes by the minute.

1

u/Z3ROCOOL22 Sep 13 '22

Yeah, i figured out now, thx.

1

u/gxcells Sep 12 '22

It works well to regenerate the original. But I could not make a change in the prompt without changing completely the picture (portrait).

5

u/borntopz8 Sep 12 '22

if you regenerate the original and change the main prompt (keeping the script img2imgalt on the original prompt the interrogation gave you) you should be able to have less "destructive" results
application of a style works well, but sometimes -let's say changing shirt color or hair color- is still too similar or too far from the image.

the implementation is in a very early state the most i can do is keeping my fingers crossed since i dont know much about coding and i rely heavly on repos and webuis.

1

u/gxcells Sep 12 '22

Thanks, I'll try this and play around also with different source images tonight

4

u/AnOnlineHandle Sep 12 '22

Any idea if this would work with embeddings from textual inversion as part of the prompt?

5

u/use_excalidraw Sep 14 '22

I made a tutorial on how to actually use this locally (with the AUTOMATIC repo) https://youtu.be/_CtguxhezlE

4

u/Dark_Alchemist Sep 17 '22

Try as hard as I could I never could get this to work. A dog is wearing a collar with a bell and it changed the colour of the dog and made its big floppy ears into flowers. If you can't get it to work before adjusting it will never be right, and at 3 minutes per attempt I can't waste attempts.

3

u/GuavaDull8974 Sep 11 '22

Can you upscale with it somehow ?by synthesizing neighbour pixels

3

u/crischu Sep 11 '22

Would it be possible to get a seed from the noise?

9

u/Aqwis Sep 12 '22

Probably not, all the possible seeds can only generate a few of the possible noise matrices. If you want to share a noise matrix with someone else, the matrix itself can be saved and shared as a file, though.

3

u/Adreitz7 Sep 12 '22

How large is the noise matrix in comparison with the generated image? If you have to transmit a 512x512x8x8x8 (RGB) matrix to generate a 512x512 image, it would be better just to transmit the final image, especially considering that, for most normal images, lossless compression can reduce the size by a factor of two or more, while the noise matrix will likely be incompressible.

2

u/muchcharles Sep 12 '22

Isn't the noise in latent space? 64x64x3(bytes? floats?)

1

u/Adreitz7 Sep 12 '22

But isn’t the latent space on the order of 800,000,000 parameters? That is even larger than a 512x512 image.

1

u/muchcharles Sep 12 '22

Since latent diffusion operates on a low dimensional space, it greatly reduces the memory and compute requirements compared to pixel-space diffusion models. For example, the autoencoder used in Stable Diffusion has a reduction factor of 8. This means that an image of shape (3, 512, 512) becomes (3, 64, 64) in latent space, which requires 8 × 8 = 64 times less memory.

https://huggingface.co/blog/stable_diffusion

9

u/i_have_chosen_a_name Sep 12 '22

Wait if it can find the latent space representation of the original image does that not mean every single combination of 512x512 pixel is present in the data set? How is that possible. Surely the latent space only contains an aproximation, no?

Also I’m blown away at the development speed of this after being open sourced. Google their Imagen and OpenAi dalle2 will never be able to compete with the open source fine tuning you can get from w couple million dev monkeys all fucking around with the code and model.

4

u/StickiStickman Sep 12 '22

Surely the latent space only contains an aproximation, no?

Obviously, that's literally what he said though?

You also seem to have a bit of a fundamental misunderstanding how it works:

Wait if it can find the latent space representation of the original image does that not mean every single combination of 512x512 pixel is present in the data set?

It wouldn't mean that at all. It's not just copy pasting images from its dataset.

2

u/NerdyRodent Sep 11 '22

Very nice!

2

u/[deleted] Sep 11 '22

[deleted]

7

u/External_Quarter Sep 11 '22

Automatic just got it working in his web UI. I would expect to see it there pretty soon!

2

u/hyperedge Sep 11 '22

Looks great!

2

u/rservello Sep 11 '22

pil_image_to_torch is not defined. Can you please update with fix?

3

u/Aqwis Sep 12 '22

Added it now.

2

u/rservello Sep 12 '22

Thank you :)

2

u/PTKen Sep 11 '22

Looks like a fantastic tool! I wish I could try it. I still can't run this locally. Is anyone interested in putting this into a Colab Notebook?

6

u/ExponentialCookie Sep 11 '22

It's just been implemented in AUTOMATIC1111's webui. Link here, instructions at this anchor.

3

u/PTKen Sep 11 '22

Thanks for the link, but please correct me if I'm wrong. This is a web UI but you still need to have it installed locally. I cannot install it locally, so I am running it in Colab Notebooks for now.

3

u/cpc2 Sep 12 '22

Colab notebooks are local installs, just in a remote machine that you access through colab. https://colab.research.google.com/drive/1Iy-xW9t1-OQWhb0hNxueGij8phCyluOh this is the colab linked in automatic1111's github.

2

u/ExponentialCookie Sep 11 '22

Sorry for misunderstanding. That is correct, but if you can get it to work in a colab notebook if you're willing to set it up.

2

u/PTKen Sep 11 '22

No problem I appreciate the reply.

Well, it's a bit beyond me to figure out now to set up a Colab Notebook right now. That's why I was asking if anyone else was up to the task! :)

1

u/MysteryInc152 Sep 12 '22 edited Sep 12 '22

Hey !

So it's actually pretty easy to set up a collab notebook. Way easier than installing it locally.

A colab is basically text and media + code. Once you realize that, it all comes together. To run a snippet of code, you simply press the play button next to it.

Basically because it's text + code, colab notebooks are made to be ordered.

The only input coming from you is pressing the play button in the correct order. And remember the order has already been laid out for you. So Essentially, press the first one, scroll a bit, press the second one etc.

This site walks you through it

https://gigazine.net/gsc_news/en/20220907-automatic1111-stable-diffusion-webui#2

Honestly the only aspect that doesn't go like that is setting up a hugging face account but the site walks you through that as well. And it's something you only do once

2

u/no_witty_username Sep 11 '22

I messed around with it in automatic and couldn't get it to work.

2

u/TheSkyWaver Sep 11 '22

An idea i've had for a long while, but never really though that much into, is the concept of an image "compression" algorithm that uses some sort of image generation algorithm that takes a specific seed (previously generated with a preexisting image) and recreates that image via only the seed. Thereby effectively compressing the image far smaller than would ever be possible through conventional image compression.

This is basically that with the added benefit of not at all having a compressive effect die to the size and energy cost of actually running it, but also the ability to seamlessly edit any aspect of the image.

2

u/Adreitz7 Sep 12 '22

You have to keep in mind that you need to add the size of the generating software to get a good comparison, especially when that software is not widespread compared to, e.g., Zip or JPEG. Since SD is multiple gigabytes, well… But considering that it could conceivably generate most (all?) images this way and that Emad said on Twitter that he thinks the weights could be reduced to about 100MB, this might become more practical, though very compute-intensive.

On that note, I would be interested to see someone throw a whole corpus of images at this technique to see if there is anything that it cannot generate well.

2

u/starstruckmon Sep 12 '22

The encoder and decoder ( from pixel space to latent space ) used in SD can already be used for this. You're not getting any more compression through this method.

The "noise" generated in this process is not gaussian noise that you can turn into a seed. It's a whole init image ( in the form of latents ) that needs to be transmitted.

So unlike the first method, where you only send the latents, in this method you send the latents + the prompt and also have to do a bunch of computation at the receiving end to create the image through diffusion instead of just running it through the decoder.

1

u/PerryDahlia Sep 12 '22

that’s true, but the trade off works the wrong way given the current resource landscape. storage and bandwidth are cheap compared to gpu time and energy.

1

u/2022_06_15 Sep 12 '22

I think a useful variations of that idea are upscaling and in/outpainting.

You could make an image physically smaller in pixels and then seemlessly blow it up at the endpoint in a plausible and reliable way.

You could make an image with gaps and then get an algorithm to fill them in, effectively sending a scaffold for a particular image to be built upon/around. imgtoimg could probably work even better than that, you could just send a low res source image (or if you want to be particularly crafty, a vector that can be rasterised) and then fill in all the detail at the client end.

Of course, the part I'm really hanging out for is when this tech is ported to 3D. The requirement for complex and generative geometry is going to explode over the next couple of years, and if we use today's authoring technology the amount of data that will have to be pushed to the endpoints will make your eyes water. We can easily increase processing speed and storage footprint at rates we cannot comparably do for data transmission. That's going to be the next major bottleneck.

2

u/thomasblomquist Sep 12 '22

If I’m to understand this correctly, you found a method to identify the correct “noise” seed that when using an “appropriate” prompt will recreate the image somewhat faithfully. Then, by tweaking the prompt using the identified seed, it will modify the appropriate attribute that was modified in the prompt?!????!!!!!!

That’s some insanity, and is amazing for what it is able to do. We’re in the future

2

u/Aumanidol Sep 12 '22

Did anyone manage to get good results with AUTOMATIC implementation? My workflow is as follows:

  • I upload a picture

  • select "img2img alternative test"

  • select Euler (not Euler a)

  • hit interrogate

  • paste the found prompt into the "original prompt" box

  • change something in the prompt (the one on top of the page) and hit generate.

Results so far have been terrible, especially with faces.

I've read that better results were attained lowering "CFG scale" to 0.0 (this UI doesn't allow for that and I have no access to the terminal for a couple of days), but lowering it to 1 doesn't seem to be doing anything good.

Did anyone manage to get good results with AUTOMATIC implementation?

I've messed around with the decode parameters but nothing good came out of it either.

1

u/Aumanidol Sep 12 '22

worth mentioning: the prompt produced with the interrogate button on the very same picture used above is the following "a woman smiling and holding a cell phone in her hand and a cell phone in her other hand with a picture of a woman on it, by Adélaïde Labille-Guiard"

am I using the wrong implementation?

2

u/enspiralart Sep 12 '22

This is exactly what was missing, thanks so much! I am going to include it in my video2video implementation.

2

u/jaywv1981 Sep 12 '22 edited Sep 12 '22

Are you able to use this in the Automatic1111 colab or only locally? I ran the colab but don't see an option for it.

EDIT: Nevermind, I see it now at the bottom under scripts.

1

u/the_dev_man Sep 12 '22

can i know where u found it?

2

u/RogueStargun Sep 13 '22

What parameters did you set this to in order to prevent the network from altering the original appearance of the woman the the base prompt?

2

u/PervasiveUncertainty Sep 14 '22

I spent the last few hours trying to reproduce this but couldn't get the changes requested to be incorporated into the picture. I used a sculpture of David by Michelangelo, he's looking to his left on the original, and couldn't get him to look straight into the camera.

Can you share the exact full settings you've used for the picture you've posted? Thanks in advance

2

u/Many-Ad-6225 Sep 15 '22

I have an error when I try to use "img2img alternative" Please help :( the error : "TypeError: expected Tensor as element 0 in argument 0, but got ScheduledPromptBatch"

→ More replies

2

u/kmullinax77 Sep 16 '22

I can't get this to work even a little bit.

I am using Automatic 1111's webUI and have followed the explicit settings on his Github site as well as u/use_excalidraw 's great Youtube video. I get nothing except the original photo, but a little overbaked.

Does anyone have any ideas why this may be happening?

→ More replies

1

u/flamingheads Sep 12 '22

Mad props for figuring this out. It's so incredible to see all the development gushing so rapidly out of the community around this tech.

1

u/Sillainface Sep 11 '22

Really interesting!

1

u/Hoppss Sep 12 '22 edited Sep 12 '22

I've been working on how to do this as well, thank you for your insights!

1

u/IrreverentHippie Sep 12 '22

Being able to use something from my previous generation in my next Generation would be awesome

1

u/BrandonSimpsons Sep 12 '22

So this might be a dumb idea, but let's say you have two images (image A and image B).

You use this technique in order to back-form images of random noise (noise A and noise B) which will generate close approximations of image A and image B when given the same prompt (prompt P)

Can we interpolate between noise A and noise B, and feed these intermediate noises into stable diffusion with prompt P, and morph between image A and image B?

1

u/ExponentialCookie Sep 12 '22

I don't see why not. Given a latent representation of an image, you should be able to latent walk through as many of them as you wish.

1

u/BrandonSimpsons Sep 12 '22

I guess my question is more 'is the space organized enough for this to work feasibly', which probably can only be found experimentally.

1

u/[deleted] Sep 12 '22

[deleted]

1

u/BrandonSimpsons Sep 13 '22

oh yeah artbreeder is great, and being able to have similar tools with SD would be fantastic

1

u/fransis790 Sep 12 '22

Good, congratulations

1

u/RogueStargun Sep 12 '22

This is incredible. I've been struggling with getting img2img to work to my satisfaction. I've been aiming to reverse a self portrait I painted many years ago into a photograph. I'll look into this!

1

u/tanreb Sep 12 '22

How to execute “image variations” with this?

1

u/GuavaDull8974 Sep 12 '22

This already works in AUTOMATIC1111 webui! Under scripts img2img

1

u/ChocolateFit9026 Sep 12 '22

I'm eager to try this with video2video. So far, I've done some good ones
just with regular img2img and a for loop going through every frame of a
video. I wish there was an editable colab for this so I could try it.
Do you know of any img2img colab that has a k_euler sampler so I could
try this code?