r/StableDiffusion
•
u/Aqwis
•
Sep 11 '22
•
1
1
1
A better (?) way of doing img2img by finding the noise which reconstructs the original image Img2Img
59
u/sassydodo Sep 11 '22
You should summon hlky and automatic in this thread or either do pull request on this into their webUIs repos - that would be much better from user experience side
I think I've seen some work in either hlky or auto's repo that mentioned cross attention control
46
u/MarvelsMidnightMoms Sep 11 '22
Automatic1111 has been so on the ball with updates to his fork these past 2 weeks+. Just today he added "Interrogate" in his img2img tab, which is img2prompt.
Yesterday, or the day before, he added prompt "presets" to save time on retyping your most commonly used terms.
Hlky's activity has died down quite a bit which is a bit unfortunate. His was the first webui fork that I tried.
27
u/Itsalwayssummerbitch Sep 11 '22
Hlky's is essentially going through a whole remake in streamlit UI, it should be much better than before and be easier to add things to it in the future, but it's going to take a week or two to get it out of dev stage.
The gradio version is only getting bigfixes btw, no new features as far as I'm aware.
Either way feel free to add it in the discussion section of the repo 😅
9
u/hsoj95 Sep 11 '22
^ This!
We are still looking for features to add, and I'm gonna send a link to this to the discord for Hlky's fork.
2
u/ImeniSottoITreni Sep 12 '22
Automatic1111 has been so on the ball with updates to his fork these past 2 weeks+. Just today he added "Interrogate" in his img2img tab, which is img2prompt.
Can you please give me some more info and compare about hlky and automatic?
I tought they were 2 dead repos. I mean, they put out their thing: hlky with webui and AUTOMATIC1111 with the outpainting stuff and that was it.I pushed so far to make a pull request to neonsecret repo to add webui and he accepted to merge hlky webui, which is basically a fork that allows you to make high res images with low vram
But I'm loosing a bit of grip on all the news. Can you please tell me what we have now? and what are the news for hlky and others?
2
u/matahitam Sep 12 '22
You might want to use the dev branch for bleeding edge in hlky (re base to sd-webui) . There's also a discord, link is in readme if I'm not mistaken.
2
6
u/sassydodo Sep 11 '22
yeah, I'd go with auto's version, but hlky has got all the fancy upscalers like GoBIG and also it doesn't crash as much as auto's. Tho Im still on auto's friday version, so it might have been fixed already.
6
u/halr9000 Sep 11 '22
Hlky is switching to atreamlit but seems features are still going into both branches. GoBig is sweet! I think auto added similar called sd-upscale but I haven't tried it yet.
11
u/AUTOMATIC1111 Sep 12 '22
I added sd upscale and afterwards hlky specifically copied my code of sd upscale code and added it as gobig
2
u/th3Raziel Sep 12 '22
I just want to say HLKY himself didn't do it, I did, I saw your implementation and used it (and txt2imghd) as the base for GoBig in the hlky fork, I'm not sure why is this so forbidden as large parts of the hlky fork is already copied code from your repo so I didn't even think twice about utilizing it.
I also added LDSR to the hlky fork which I modified from the original repo and created the image lab tab etc.
To be clear, I'd rather add stuff to your repo but I approached you on the SD discord and you said you'll likely not merge PRs that aren't made by you and that originally hlky PR'd a feature to your repo which you rejected which in turn prompted him to make his own fork.
It's too bad there's all this useless drama around the different UIs, it just creates a lot of confusion.
5
u/AUTOMATIC1111 Sep 12 '22
If I'm remembering correctly, I said that I won't accept big reworks unless we decide on them beforehand. I'm accepting a fair amount of code from different people.
The 'feature' I rejected was a change that would save all pictures in jpeg format for everyone.
1
u/StickiStickman Sep 12 '22
But why change the name then? Huh.
2
2
u/AUTOMATIC1111 Sep 12 '22
To make it less obvious to user that he copied it.
→ More replies1
u/halr9000 Sep 12 '22
Well, if true that's not cool. Should be relatively easy to prove by looking at commits. But the UIs are definitely diverging, so there's original work being done to some extent. Sorry if there's some bad behavior going on though.
2
u/Itsalwayssummerbitch Sep 12 '22
It's funny you mention the commits, they DO seem to tell a different story. The funniest of which is that the code Automatic111 used for the sdupscale was originally called "text2imagehd", and a port of someone else's work, which was called GoBig :)
https://github.com/jquesnelle/txt2imghd The link was literally in the Auto's code's comments.
Seriously though, ffs, this is open source, can we not just be decent humans and work together? I don't get all this drama, it's not Middle school 🙃
6
u/AUTOMATIC1111 Sep 12 '22
i credit the person who made txt2imghd both in comments and in the main readme in credits section for the idea.
I also did not take a single line of his code.
The decision to not work with me was on hlky, he was the one who forked my repo.
You're free to link the different story in commits because I do not see it.
→ More replies1
u/TiagoTiagoT Sep 11 '22
Are the two projects different enough they can't be merged?
15
u/jansteffen Sep 11 '22
The hlky one actually started as a fork of the automatic1111 UI, but that was basically on day 1 of SD release and they've both changed a ton since then, with major reworks and refactors, sometimes even implementing the same functionality in different ways. I don't think merging them would be possible at this point, it'd be a lot easier to just cherry pick features that one has that the other one doesn't and weave that code into the existing code base.
1
u/ts4m8r Sep 12 '22
How do you install new versions of webui if you already have an old one installed?
3
u/sassydodo Sep 12 '22
I mean "installed" is just put in a folder with moldels placed in, everything else is in virtual environment. You can just download new version, or use git - in this case you just
git clone
once, and usegit pull
every time you think there's a worthy update1
u/matahitam Sep 12 '22
Often it's as simple as performing git pull. Let me know in sd-webui discord if you need more details. https://discord.gg/frwNB7XV
2
u/manueslapera Sep 13 '22
that's a shame, Id rather manage a python environment (hlky ui) than having to install .net just to use automatic's
3
u/Dogmaster Sep 11 '22
And he still hasnt fixed the masking bug causing deepfrying, the commit is waiting :(
56
u/gxcells Sep 11 '22
That's just incredible, you unlocked the next generation of Photoshop. I can't describe how crazy this last month has been since SD release. I wish I had studied coding to participate to all of this.
21
u/Caldoe Sep 12 '22
haha just wait for a few weeks, people are already coming out with GUI for normal people
It won't take long
9
u/ExponentialCookie Sep 12 '22
It's never too late. There are more resources now than ever.
2
u/Still_Jicama1319 Sep 13 '22
is python enough to understand all this terminologies?
3
u/ExponentialCookie Sep 13 '22
At a high level, it's a must to understand how the applications are built. Beyond that, linear algebra is pretty much a prerequisite for building out the neural networks. Understanding the jargon isn't too hard, but the implementation is the hard part.
28
u/entrep Sep 11 '22
Automatic1111 just started to implement this: https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/9c48383608850a1e151985e814a593291a69196b
5
u/kaliber91 Sep 11 '22
Is there a simple way to update to from the previous version to the newest on PC, or do we need to go through the installation process from the start?
10
u/-takeyourmeds Sep 12 '22
literally download the repo zip and extract on the main folder, say yes to overwrite all
1
u/Limitlez Sep 12 '22
Were you able to figure out how to use the script in webui? I was able to run it, but could never find the seed.
7
u/Dogmaster Sep 11 '22
You can use a tool like beyond compare, check both folders and just merge the files changed form the old revision
I use that for "updating" my working repos
2
8
u/ExponentialCookie Sep 11 '22
On Linux, a simple
git pull
in the install directory works for me. I can't speak on Windows install.8
6
u/an0maly33 Sep 12 '22
Yep, just git pull on windows as well, assuming one cloned the repo initially.
3
u/jonesaid Sep 12 '22
I can't wait to try this! Now, just gotta get Automatic's repo working without CUDA OOM errors...
2
u/Scriptman777 Sep 13 '22
You can try to add the --medvram parameter or even the low one. It will be a lot slower, but it work with MUCH less VRAM. Also try to keep the images small.
→ More replies
16
u/Adreitz7 Sep 11 '22
This is great! I like to see these innovations that dive into the inner workings of SD. This looks like a powerful feature. In your example mosaic, is the second image meant to be the base reconstruction, and the following images modifications of it? I’m asking because the second image looks most like the first, but I noticed that it is more vivid — the saturation has increased. It’s a minor thing here, but could cause problems if it is a general effect of your technique. Any idea why this happened?
21
u/Aqwis Sep 11 '22
Yeah, the second image is basically the base reconstruction. In general, converting an image to its latent representation and then back again to an image is going to lose a little bit of information, so that the two images won't be identical, but in most cases they will be very close. However, in this case I think the difference in contrast is caused by what happens at the very end of
find_noise_for_image
, namely:return (x / x.std()) * sigmas[-1]
This basically has the effect of increasing the contrast. It shouldn't be necessary, but if I don't do this then in many cases the resulting noise tensor will have a significantly lower standard deviation than a normal noise tensor, and if used to generate an image the generated image will be a blurry mess. It's quite possible the need to do this is caused by some sort of bug that I haven't discovered.
14
u/Adreitz7 Sep 11 '22
It’s also fascinating how perfect the reconstruction is. The biggest changes I can see are the shape of the right eyebrow and the profile of the left cheek. Your technique reproduced individual strands of hair!
14
u/Aqwis Sep 11 '22
It's very likely that the reconstruction isn't actually as good as it could be – I used 50 sampler steps to create the noise tensor for this example and 50 to generate each of the images from the noise tensor, but I'd previously noticed that the reconstructions seemed to be even better if I used a few hundred sampler steps to create the noise tensor.
11
u/jonesaid Sep 11 '22
Hmm, I wonder if this would have made my work on the Van Gogh photo recreation much easier.
Starting from his 1887 self-portrait as input image, I struggled with getting a very painted look like the original at low denoising strength, or a completely different person at higher strengths. I wanted to keep the composition of person basically the same, while changing just the style of the image. I wanted to tell SD to make this painting input in the style of a studio photograph. Using weights in the prompt helped somewhat (e.g. "studio photograph :0.8").
Would your technique help with that kind of restyling?
12
u/HarisTarkos Sep 11 '22
Wow, with my very little comprehension of the mechanics of diffusion i didn't think it was possible to do such a "renoising" (i thought it was a bit like finding the original content from a hash). This feels like an absolute killer feature...
6
u/starstruckmon Sep 12 '22
Your thought wasn't completely wrong. What you're getting here is more like an init image than noise. Even if the image was a generated one, you'd need the exact same prompt ( and some of the other variables ) used during generation to get actual gaussian noise or even close.
Since those are not available, and the prompt is guessed , what's happening here can be conceptualized more as ( essence of that picture ) - ( essence of that guessed prompt ). So the init image ( actually latents ) you're left with after this process has all the concepts of the photo that's not in the the prompt "photo of a smiling woman with brown hair" i.e. composition , background etc.
Now what that init image ( if converted to image from latents ) looks like and whether it's even comprehensible as that by the human brain, I'm not sure. It would be fascinating to see what it looks like and if it's comprehensible.
2
u/Bitflip01 Sep 14 '22
Am I understanding correctly that in this case the init image replaces the seed?
2
35
u/no_witty_username Sep 11 '22
This is huge. The ability to find a latent space representation of the original image in SD model opens up soooo many opportunities. This needs to be implemented in every repo. I see this being a standard feature for every repo out there.
3
u/Fazaman Sep 12 '22
The ability to find a latent space representation of the original image in SD model
So... uh... what does this mean for us that aren't as deep into the weeds as half the people on this sub seem to be?
→ More replies
12
u/Aqwis Sep 12 '22 edited Sep 12 '22
Made a few incremental updates to the Gist over the past few hours. Happy to see that a few SD forks/UIs are implementing something like this – they're better situated than me to make something that's useable by non-coders. :)
It seems that the results are quite often best when cond_scale
is set to 0.0 – exactly why this is, I don't know. If anyone has an idea, I would love an explanation. With cond_scale at zero, the given prompt has no effect.
In the meantime, I've got to see my share of extremely creepy pictures while experimenting with other cond_scales. Run this on a portrait with cond_scale set to 5.0 and use the resulting noise to generate a picture (also with scale > 2.0) ... or don't. I wouldn't advise doing so personally, especially if you have a superstitious bent. (Or maybe you're going to get completely different results than I got, who knows?)
4
u/protestor Sep 12 '22
Happy to see that a few SD forks/UIs are implementing something like this – they're better situated than me to make something that's useable by non-coders. :)
There's this https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/9c48383608850a1e151985e814a593291a69196b but shouldn't you be listed as the author? (in that commit, https://github.com/AUTOMATIC1111 is the author)
2
u/NotModusPonens Sep 12 '22
In what way are the pictures creepy?
5
u/Aqwis Sep 12 '22
To be a bit vague: a combination of "photos" of very seriously messed up human-like figures and "drawings" of symbols that if they meant anything would have been the equivalent of these messages for the human psyche.
2
u/NotModusPonens Sep 12 '22
Ooof.
... we'll soon have to disable images in social media and email by default in order to avoid being "trolled" by someone with one of these, won't we?
3
u/Lirezh Sep 15 '22
Anyone with photoshop can troll you already for more than a decade, it does not seem to be a big concern.
→ More replies2
u/gxcells Sep 12 '22
I am using the automatic1111 implementation of your code. It is really difficult to have an effect of a prompt on generating a new image (hair color change or adding a helmet for example). Often it changes the whole face etc
1
u/Limitlez Sep 12 '22
Are you using it through webui? If so, how do you use it? I can't seem to figure it out
2
u/gxcells Sep 12 '22
You use this colab https://colab.research.google.com/drive/1Iy-xW9t1-OQWhb0hNxueGij8phCyluOh Then in img2img tab, at the bottom you can find a dropdown menu for scripts, just use the script "img2imgalternate"
1
u/thedarkzeno Sep 12 '22
https://colab.research.google.com/drive/1Iy-xW9t1-OQWhb0hNxueGij8phCyluOh
got an error:
Loading model [e3b0c442] from /content/stable-diffusion-webui/model.ckpt
---------------------------------------------------------------------------
EOFError Traceback (most recent call last)
<ipython-input-3-75bc94f91c1d> in <module>
2 sys.argv = ['webui.py', "--share", "--opt-split-attention"]
3
----> 4 import webui
5 webui.webui()
3 frames
/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
918 "functionality.")
919
--> 920 magic_number = pickle_module.load(f, **pickle_load_args)
921 if magic_number != MAGIC_NUMBER:
922 raise RuntimeError("Invalid magic number; corrupt file?")
EOFError: Ran out of input
11
9
u/tinman_inacan Sep 12 '22
Can you provide a bit of a technical explanation of how to apply this technique?
Automatic1111 has implemented your code on the webui project, and I've been trying it out. It works perfectly for recreating the image, but I can't seem to figure out how to actually do anything with it. It just comes out looking exactly the same - overbaked - no matter how I mess with the settings or prompt.
Still, absolutely incredible that you threw this together, especially without reading the theory behind it first!
5
u/Daralima Sep 12 '22
That's odd, especially that the settings have no effect. Are you changing the original prompt window perhaps? I've also found that that has no effect whatsoever, even when left empty. You need to change the regular prompt, if you aren't doing so already. However using your original prompt or using a prompt that makes sense given the image (or alternatively using clip interrogator) as a base in the normal prompt window seems to work well, as I used the exact same prompts as in the image of this post along with the original image and got nearly identical results to the author.
This is my experience overbaking issue, but since you say that changing the settings does nothing I am not sure if it'll help in your case:
there seems to be a strong interplay between the decode settings and the regular sampling step count: increasing the decode CFG scale and steps all the way to 0.1 and 150 respectively seems to fully fix the overbaking when also combined with a somewhat unusually low step count; 10-20 seemed to work in the case I first tried (and seems to work as a general rule for other attempts I've made). But these settings do not seem to work universally:
sometimes setting the CFG scale too low seems to remove certain details, so experimenting with values between 0.1 and 1 is worthwhile if certain things are missing or look off (assuming those things are of consequence). And while decode steps seem to always decrease the level of overbake, it does not always seem to result in something closer to the original, and in a couple cases it made some weird changes instead.
I'd recommend testing with 0.1 and 150 decode CFG/steps at first, with a low sampling count and an empty prompt to make sure the image recreation goes as hoped, until you're really close to the original without much/any overbake. Then decreasing/increasing one by a fairly large amount if it doesn't yield good results, and once you've got the image you want you can either add the whole prompt like in this post and edit that, or add keywords which seems to give a similar effect.
Hope this is coherent enough to be somewhat helpful if you haven't figured it out by now!If the author sees this comment, please correct anything that doesn't add up as I've figured all this out through experimentation and know nothing about the underlying code.
2
u/tinman_inacan Sep 13 '22
Thank you so much for your detailed response! With the help of your advice and a lot of trial and error, I think I have it working now. Still having trouble with overbaking, but at least I have some idea of what's going on. I think I was just confused about which prompts do what, which settings are disabled, how the sliders effect each other, etc.
At least I got some neat trippy pyramids out of it lol.
16
u/HorrorExpress Sep 11 '22
I've been following bloc97's posts, while trying (slowly) to learn how this all works.
I just wanted to tip my hat to you both for the work you're doing.
I'm finding Stable Diffusion, as is, isn't remotely able to do what you've both started to do with it. I've had much frustration with how changing the color prompt for one part of the image changes it for other elements. Your example - like bloc's - look awesome.
Keep up the great work.
5
u/WASasquatch Sep 11 '22
This is pretty awesome, man. I'm wondering if this is possible with regular diffusers? Or is this something special with k-diffusion?
3
u/LetterRip Sep 11 '22
should likely work for most samplers that are deterministic.
1
u/WASasquatch Sep 11 '22
I guess my real question is "I don't understand the implementation, how do I implement it?" like a newb. Is the noise_out overriding some variable for diffusion?
5
4
Sep 12 '22 edited Sep 12 '22
[deleted]
8
u/borntopz8 Sep 12 '22
i guess the development of this feature is still in an early state, but i managed to get the first results.
basically you upload an image in img2img,
interrogate to obtain the prompt ---this gives me a low vram error but still generates the prompt that you'll find on top---
in the scripts you use img2imgalternative with that prompt you have obtained (check https://github.com/AUTOMATIC1111/stable-diffusion-webui in the img2imgalt section for the parameters they are very strict for now)
now generate and you should get an output very similar to your original image
if you change your main prompt now (still running the script with the previously obtained prompt) you should be able to modify the image keeping most of the details3
u/Z3ROCOOL22 Sep 12 '22
I don't understand this part "interrogate to obtain the prompt---" Where you do that?
6
u/borntopz8 Sep 12 '22 edited Sep 12 '22
speaking about automatic1111 and his webui you should see in the img2img a button to generate and a button to interrogate. if not, update to the last version because the are making changes by the minute.
1
1
u/gxcells Sep 12 '22
It works well to regenerate the original. But I could not make a change in the prompt without changing completely the picture (portrait).
5
u/borntopz8 Sep 12 '22
if you regenerate the original and change the main prompt (keeping the script img2imgalt on the original prompt the interrogation gave you) you should be able to have less "destructive" results
application of a style works well, but sometimes -let's say changing shirt color or hair color- is still too similar or too far from the image.the implementation is in a very early state the most i can do is keeping my fingers crossed since i dont know much about coding and i rely heavly on repos and webuis.
1
u/gxcells Sep 12 '22
Thanks, I'll try this and play around also with different source images tonight
4
u/AnOnlineHandle Sep 12 '22
Any idea if this would work with embeddings from textual inversion as part of the prompt?
5
u/use_excalidraw Sep 14 '22
I made a tutorial on how to actually use this locally (with the AUTOMATIC repo) https://youtu.be/_CtguxhezlE
4
u/Dark_Alchemist Sep 17 '22
Try as hard as I could I never could get this to work. A dog is wearing a collar with a bell and it changed the colour of the dog and made its big floppy ears into flowers. If you can't get it to work before adjusting it will never be right, and at 3 minutes per attempt I can't waste attempts.
3
3
u/crischu Sep 11 '22
Would it be possible to get a seed from the noise?
9
u/Aqwis Sep 12 '22
Probably not, all the possible seeds can only generate a few of the possible noise matrices. If you want to share a noise matrix with someone else, the matrix itself can be saved and shared as a file, though.
3
u/Adreitz7 Sep 12 '22
How large is the noise matrix in comparison with the generated image? If you have to transmit a 512x512x8x8x8 (RGB) matrix to generate a 512x512 image, it would be better just to transmit the final image, especially considering that, for most normal images, lossless compression can reduce the size by a factor of two or more, while the noise matrix will likely be incompressible.
2
u/muchcharles Sep 12 '22
Isn't the noise in latent space? 64x64x3(bytes? floats?)
1
u/Adreitz7 Sep 12 '22
But isn’t the latent space on the order of 800,000,000 parameters? That is even larger than a 512x512 image.
1
u/muchcharles Sep 12 '22
Since latent diffusion operates on a low dimensional space, it greatly reduces the memory and compute requirements compared to pixel-space diffusion models. For example, the autoencoder used in Stable Diffusion has a reduction factor of 8. This means that an image of shape (3, 512, 512) becomes (3, 64, 64) in latent space, which requires 8 × 8 = 64 times less memory.
9
u/i_have_chosen_a_name Sep 12 '22
Wait if it can find the latent space representation of the original image does that not mean every single combination of 512x512 pixel is present in the data set? How is that possible. Surely the latent space only contains an aproximation, no?
Also I’m blown away at the development speed of this after being open sourced. Google their Imagen and OpenAi dalle2 will never be able to compete with the open source fine tuning you can get from w couple million dev monkeys all fucking around with the code and model.
4
u/StickiStickman Sep 12 '22
Surely the latent space only contains an aproximation, no?
Obviously, that's literally what he said though?
You also seem to have a bit of a fundamental misunderstanding how it works:
Wait if it can find the latent space representation of the original image does that not mean every single combination of 512x512 pixel is present in the data set?
It wouldn't mean that at all. It's not just copy pasting images from its dataset.
2
2
Sep 11 '22
[deleted]
7
u/External_Quarter Sep 11 '22
Automatic just got it working in his web UI. I would expect to see it there pretty soon!
2
2
2
u/PTKen Sep 11 '22
Looks like a fantastic tool! I wish I could try it. I still can't run this locally. Is anyone interested in putting this into a Colab Notebook?
6
u/ExponentialCookie Sep 11 '22
3
u/PTKen Sep 11 '22
Thanks for the link, but please correct me if I'm wrong. This is a web UI but you still need to have it installed locally. I cannot install it locally, so I am running it in Colab Notebooks for now.
3
u/cpc2 Sep 12 '22
Colab notebooks are local installs, just in a remote machine that you access through colab. https://colab.research.google.com/drive/1Iy-xW9t1-OQWhb0hNxueGij8phCyluOh this is the colab linked in automatic1111's github.
2
u/ExponentialCookie Sep 11 '22
Sorry for misunderstanding. That is correct, but if you can get it to work in a colab notebook if you're willing to set it up.
2
u/PTKen Sep 11 '22
No problem I appreciate the reply.
Well, it's a bit beyond me to figure out now to set up a Colab Notebook right now. That's why I was asking if anyone else was up to the task! :)
1
u/MysteryInc152 Sep 12 '22 edited Sep 12 '22
Hey !
So it's actually pretty easy to set up a collab notebook. Way easier than installing it locally.
A colab is basically text and media + code. Once you realize that, it all comes together. To run a snippet of code, you simply press the play button next to it.
Basically because it's text + code, colab notebooks are made to be ordered.
The only input coming from you is pressing the play button in the correct order. And remember the order has already been laid out for you. So Essentially, press the first one, scroll a bit, press the second one etc.
This site walks you through it
https://gigazine.net/gsc_news/en/20220907-automatic1111-stable-diffusion-webui#2
Honestly the only aspect that doesn't go like that is setting up a hugging face account but the site walks you through that as well. And it's something you only do once
2
2
u/TheSkyWaver Sep 11 '22
An idea i've had for a long while, but never really though that much into, is the concept of an image "compression" algorithm that uses some sort of image generation algorithm that takes a specific seed (previously generated with a preexisting image) and recreates that image via only the seed. Thereby effectively compressing the image far smaller than would ever be possible through conventional image compression.
This is basically that with the added benefit of not at all having a compressive effect die to the size and energy cost of actually running it, but also the ability to seamlessly edit any aspect of the image.
3
2
u/Adreitz7 Sep 12 '22
You have to keep in mind that you need to add the size of the generating software to get a good comparison, especially when that software is not widespread compared to, e.g., Zip or JPEG. Since SD is multiple gigabytes, well… But considering that it could conceivably generate most (all?) images this way and that Emad said on Twitter that he thinks the weights could be reduced to about 100MB, this might become more practical, though very compute-intensive.
On that note, I would be interested to see someone throw a whole corpus of images at this technique to see if there is anything that it cannot generate well.
2
u/starstruckmon Sep 12 '22
The encoder and decoder ( from pixel space to latent space ) used in SD can already be used for this. You're not getting any more compression through this method.
The "noise" generated in this process is not gaussian noise that you can turn into a seed. It's a whole init image ( in the form of latents ) that needs to be transmitted.
So unlike the first method, where you only send the latents, in this method you send the latents + the prompt and also have to do a bunch of computation at the receiving end to create the image through diffusion instead of just running it through the decoder.
1
u/PerryDahlia Sep 12 '22
that’s true, but the trade off works the wrong way given the current resource landscape. storage and bandwidth are cheap compared to gpu time and energy.
1
u/2022_06_15 Sep 12 '22
I think a useful variations of that idea are upscaling and in/outpainting.
You could make an image physically smaller in pixels and then seemlessly blow it up at the endpoint in a plausible and reliable way.
You could make an image with gaps and then get an algorithm to fill them in, effectively sending a scaffold for a particular image to be built upon/around. imgtoimg could probably work even better than that, you could just send a low res source image (or if you want to be particularly crafty, a vector that can be rasterised) and then fill in all the detail at the client end.
Of course, the part I'm really hanging out for is when this tech is ported to 3D. The requirement for complex and generative geometry is going to explode over the next couple of years, and if we use today's authoring technology the amount of data that will have to be pushed to the endpoints will make your eyes water. We can easily increase processing speed and storage footprint at rates we cannot comparably do for data transmission. That's going to be the next major bottleneck.
2
u/thomasblomquist Sep 12 '22
If I’m to understand this correctly, you found a method to identify the correct “noise” seed that when using an “appropriate” prompt will recreate the image somewhat faithfully. Then, by tweaking the prompt using the identified seed, it will modify the appropriate attribute that was modified in the prompt?!????!!!!!!
That’s some insanity, and is amazing for what it is able to do. We’re in the future
2
u/Aumanidol Sep 12 '22
Did anyone manage to get good results with AUTOMATIC implementation? My workflow is as follows:
- I upload a picture
- select "img2img alternative test"
- select Euler (not Euler a)
- hit interrogate
- paste the found prompt into the "original prompt" box
- change something in the prompt (the one on top of the page) and hit generate.
Results so far have been terrible, especially with faces.
I've read that better results were attained lowering "CFG scale" to 0.0 (this UI doesn't allow for that and I have no access to the terminal for a couple of days), but lowering it to 1 doesn't seem to be doing anything good.
Did anyone manage to get good results with AUTOMATIC implementation?
I've messed around with the decode parameters but nothing good came out of it either.
1
u/Aumanidol Sep 12 '22
worth mentioning: the prompt produced with the interrogate button on the very same picture used above is the following "a woman smiling and holding a cell phone in her hand and a cell phone in her other hand with a picture of a woman on it, by Adélaïde Labille-Guiard"
am I using the wrong implementation?
2
u/enspiralart Sep 12 '22
This is exactly what was missing, thanks so much! I am going to include it in my video2video implementation.
2
u/jaywv1981 Sep 12 '22 edited Sep 12 '22
Are you able to use this in the Automatic1111 colab or only locally? I ran the colab but don't see an option for it.
EDIT: Nevermind, I see it now at the bottom under scripts.
1
2
u/RogueStargun Sep 13 '22
What parameters did you set this to in order to prevent the network from altering the original appearance of the woman the the base prompt?
2
u/PervasiveUncertainty Sep 14 '22
I spent the last few hours trying to reproduce this but couldn't get the changes requested to be incorporated into the picture. I used a sculpture of David by Michelangelo, he's looking to his left on the original, and couldn't get him to look straight into the camera.
Can you share the exact full settings you've used for the picture you've posted? Thanks in advance
2
u/Many-Ad-6225 Sep 15 '22
I have an error when I try to use "img2img alternative" Please help :( the error : "TypeError: expected Tensor as element 0 in argument 0, but got ScheduledPromptBatch"
→ More replies
2
u/kmullinax77 Sep 16 '22
I can't get this to work even a little bit.
I am using Automatic 1111's webUI and have followed the explicit settings on his Github site as well as u/use_excalidraw 's great Youtube video. I get nothing except the original photo, but a little overbaked.
Does anyone have any ideas why this may be happening?
→ More replies
1
u/flamingheads Sep 12 '22
Mad props for figuring this out. It's so incredible to see all the development gushing so rapidly out of the community around this tech.
1
1
u/Hoppss Sep 12 '22 edited Sep 12 '22
I've been working on how to do this as well, thank you for your insights!
1
u/IrreverentHippie Sep 12 '22
Being able to use something from my previous generation in my next Generation would be awesome
1
u/BrandonSimpsons Sep 12 '22
So this might be a dumb idea, but let's say you have two images (image A and image B).
You use this technique in order to back-form images of random noise (noise A and noise B) which will generate close approximations of image A and image B when given the same prompt (prompt P)
Can we interpolate between noise A and noise B, and feed these intermediate noises into stable diffusion with prompt P, and morph between image A and image B?
1
u/ExponentialCookie Sep 12 '22
I don't see why not. Given a latent representation of an image, you should be able to latent walk through as many of them as you wish.
1
u/BrandonSimpsons Sep 12 '22
I guess my question is more 'is the space organized enough for this to work feasibly', which probably can only be found experimentally.
1
Sep 12 '22
[deleted]
1
u/BrandonSimpsons Sep 13 '22
oh yeah artbreeder is great, and being able to have similar tools with SD would be fantastic
1
1
u/RogueStargun Sep 12 '22
This is incredible. I've been struggling with getting img2img to work to my satisfaction. I've been aiming to reverse a self portrait I painted many years ago into a photograph. I'll look into this!
1
1
1
u/ChocolateFit9026 Sep 12 '22
I'm eager to try this with video2video. So far, I've done some good ones
just with regular img2img and a for loop going through every frame of a
video. I wish there was an editable colab for this so I could try it.
Do you know of any img2img colab that has a k_euler sampler so I could
try this code?
156
u/Aqwis Sep 11 '22 edited Sep 11 '22 •
I’ve made quite a few attempts at editing existing pictures with img2img. However, at low strengths the pictures tend to be modified too little, while at high strengths the picture is modified in undesired ways. /u/bloc97 posted here about a better way of doing img2img that would allow for more precise editing of existing pictures – by finding the noise that will cause SD to reconstruct the original image.
I made a quick attempt at reversing the
k_euler
sampler, and ended up with the code I posted in a reply to the post by bloc97 linked above. I’ve refined the code a bit and posted it on GitHub here:link to code
If
image
is a PIL image andmodel
is aLatentDiffusion
object, thenfind_noise_for_image
can be called like this:The output noise tensor can then be used for image generation by using it as a “fixed code” (to use a term from the original SD scripts) – in other words, instead of generating a random noise tensor (and possibly adding that noise tensor to an image for img2img), you use the noise tensor generated by
find_noise_for_image_model
.This method isn’t perfect – deviate too much from the prompt used when generating the noise tensor, and the generated images are going to start differing from the original image in unexpected ways. Some experimentation with the different parameters and making the prompt precise enough will probably be necessary to get this working. Still, for altering existing images in particular ways I’ve had way more success with this method than with standard img2img. I have yet to combine this with bloc97’s Prompt-to-Prompt Image Editing, but I’m guessing the combination will give even more control.
All suggestions for improvements/fixes are highly appreciated. I still have no idea what the best setting of
cond_scale
, for example, and in general this is just a hack that I made without reading any of the theory on this topic.Edit: By the way, the original image used in the example is from here and is the output of one of those old "this person does not exist" networks, I believe. I've tried it on other photos (including of myself :), so this works for "real" pictures as well. The prompt that I used when generating the noise tensor for this was "Photo of a smiling woman with brown hair".