r/StableDiffusion Sep 10 '22

Simple prompt2prompt implementation with prompt parsing (code inside)

Post image
104 Upvotes

20

u/Doggettx Sep 10 '22 edited Sep 20 '22

Simple implementation of promp2prompt by using prompt swapping, got the idea after reading the post from /u/bloc97

Github branch for changes is here:https://github.com/Doggettx/stable-diffusion/tree/prompt2prompt

or specific commit:
https://github.com/CompVis/stable-diffusion/commit/3b5c504bb0c11a882252c0eb2b1955474913313a

Changes for existing files is minor, should be easy to implement in existing forks.

Prompts work the same way as before but you can swap out text during rendering.Replacing concepts is done by:

[old concept:new concept:step]

where step is a step # or a percentage of all steps when < 1(so at 50 steps, .5 and 25 are the same), inserting new concepts:

[new concept:step]

removing concepts:

[old concept::step]

Only modified the ddim sampler in the example code, but can be added to any sampler with just a few lines of code. Doesn't increase render time, just slightly higher initialization time due to having to process multiple prompts.

See post image for example prompts on how to replace parts of an image

P.S. this is a much simpeler method than using the attention map editing, but it still seems to give good results while not sacrificing performance

Edit: updated version at https://github.com/Doggettx/stable-diffusion/tree/prompt2prompt-v2 or check in at https://github.com/CompVis/stable-diffusion/commit/ccb17b55f2e7acbd1a112b55fb8f8415b4862521 comes with negative prompts and ability to change guidance scale through prompt, also much easier to add to existing forks.

5

u/LetterRip Sep 10 '22 edited Sep 10 '22

excellent, thanks for the contribution. Really fascinating the adding and removing concepts at different steps.

2

u/Acalme-se_Satan Sep 10 '22

I love how you adapted the Python slice syntax for this purpose.

1

u/Doggettx Sep 10 '22

hehe, didn't even realize it, I've been slicing so many arrays lately for the optimization branch that I probably did that unconsciously ;)

2

u/AnOnlineHandle Sep 11 '22

So is this generating the image partway and then changing prompts halfway through the process, so that you can get a foundation that you want and keep it with a new concept?

Because that's amazing. I wonder if this could be used with embeddings to clear up a lot of their introduced artefacts, start with a valid basis and then let the embedded vector start making changes.

Though come to think of it, is this different than say using image2image for a basis with a similar weight? Does the change partway through the process somehow keep the other features of the seed?

6

u/Doggettx Sep 11 '22

Yea, it seems you have much more control over the composition than you do with img2img though.

It allows to do interesting things yea, like if you want a photo in a certain composition that an artist uses, but not in the style of the artist. You can just remove that artist after a few steps (sometimes even 1 step is enough) so that the style of the artist doesn't influence the rest of the generation.

Or the other way around, if you want the style but not the composition, just insert the artist at a later step, since style changes are still easy to do in later parts of the generation.

Or silly stuff, if you want a horse with a long neck, just start it as a giraffe and swap to horse half way. It's fun to see how the algorithm tries to figure out how to turn a giraffe shape into a horse when swapping at different steps ;)

1

u/AnOnlineHandle Sep 11 '22

Interesting, I didn't realize the AI learned that much about the artist's images but I suppose it makes sense. This sounds very useful

1

u/tmm1 Sep 11 '22

This is really cool!

Were the safety/watermark changes required to make this work?

1

u/Doggettx Sep 11 '22

No, but for some reason the repo wouldn't run with them in it. So I just removed them ;)

1

u/cacus7 Sep 14 '22

Thank you very much!

I've managed to integrate this with AUTOMATIC1111 sd-webui thanks to you.

3

u/blueeyedlion Sep 11 '22

AAAAA, so fucking amazing!

3

u/blueeyedlion Sep 11 '22

This looks like it could be useful to combine with a visualization of how the image changes over the steps, like I saw in this other post: https://www.reddit.com/r/StableDiffusion/comments/xay9ts/druid_princess_step_1_to_101_animation/

3

u/Daralima Sep 11 '22

Wish I knew enough about coding to integrate this into Automatic1111's WebUI haha. Seems really neat!

2

u/RealAstropulse Sep 11 '22

This is awesome. I'm going to try implementing it along with tiling to make some variations of textures.

1

u/thatdude_james Sep 12 '22

In one of your other posts you explained to implement a feature in existing forks to just replace a couple of files. Is there a similar flow for this feature?

1

u/Doggettx Sep 12 '22

Unfortunately this one requires a bit more work, the prompt_parser.py can just be copied to the scripts folder. But after that you still need to add the initialization code and the swap code in the samplers.

So it requires a little bit of work which might be hard if you're not a coder yourself. But there's an example of an adjusted txt2img.py and ddim.py in the branch.

1

u/thatdude_james Sep 12 '22

I do coding, but I just make games with C#. I haven't dabbled much in python/machine learning. I'll take a look at the examples. Thanks!

1

u/frollard Sep 29 '22

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-editing

I'm trying to get the hang of prompt editing but having a really hard time with the example given with float values 0-1; not sure what variable to put in the x column of the xy matrix. None of the options seem suitable for 'just keep trying'...only thing that really fits is variation strength. Thoughts on that part of the config?

1

u/Doggettx Sep 29 '22

Not entirely sure what you mean, but the float value is just the step at which the switch is made, so it depends on the total amount of steps. If you use a integer value instead it just denotes the step at which the switch happens, which is a bit more exact and easier to test with.

It's hard to say at which step you should make a switch though, it depends highly on what you're trying to do. But mostly when you're trying to replace subjects or parts with other things it's best to do it early. While style changes can be done later for example.

1

u/frollard Sep 29 '22

Thanks for the refactor on the instructions. I did a dumb.