r/StableDiffusion Sep 11 '22

A better (?) way of doing img2img by finding the noise which reconstructs the original image Img2Img

Post image

View all comments


u/TheSkyWaver Sep 11 '22

An idea i've had for a long while, but never really though that much into, is the concept of an image "compression" algorithm that uses some sort of image generation algorithm that takes a specific seed (previously generated with a preexisting image) and recreates that image via only the seed. Thereby effectively compressing the image far smaller than would ever be possible through conventional image compression.

This is basically that with the added benefit of not at all having a compressive effect die to the size and energy cost of actually running it, but also the ability to seamlessly edit any aspect of the image.


u/Adreitz7 Sep 12 '22

You have to keep in mind that you need to add the size of the generating software to get a good comparison, especially when that software is not widespread compared to, e.g., Zip or JPEG. Since SD is multiple gigabytes, well… But considering that it could conceivably generate most (all?) images this way and that Emad said on Twitter that he thinks the weights could be reduced to about 100MB, this might become more practical, though very compute-intensive.

On that note, I would be interested to see someone throw a whole corpus of images at this technique to see if there is anything that it cannot generate well.


u/starstruckmon Sep 12 '22

The encoder and decoder ( from pixel space to latent space ) used in SD can already be used for this. You're not getting any more compression through this method.

The "noise" generated in this process is not gaussian noise that you can turn into a seed. It's a whole init image ( in the form of latents ) that needs to be transmitted.

So unlike the first method, where you only send the latents, in this method you send the latents + the prompt and also have to do a bunch of computation at the receiving end to create the image through diffusion instead of just running it through the decoder.


u/PerryDahlia Sep 12 '22

that’s true, but the trade off works the wrong way given the current resource landscape. storage and bandwidth are cheap compared to gpu time and energy.


u/2022_06_15 Sep 12 '22

I think a useful variations of that idea are upscaling and in/outpainting.

You could make an image physically smaller in pixels and then seemlessly blow it up at the endpoint in a plausible and reliable way.

You could make an image with gaps and then get an algorithm to fill them in, effectively sending a scaffold for a particular image to be built upon/around. imgtoimg could probably work even better than that, you could just send a low res source image (or if you want to be particularly crafty, a vector that can be rasterised) and then fill in all the detail at the client end.

Of course, the part I'm really hanging out for is when this tech is ported to 3D. The requirement for complex and generative geometry is going to explode over the next couple of years, and if we use today's authoring technology the amount of data that will have to be pushed to the endpoints will make your eyes water. We can easily increase processing speed and storage footprint at rates we cannot comparably do for data transmission. That's going to be the next major bottleneck.