I think it is important to understand the capabilities of these technologies beyond just making art. What these text to image models are capable of is massive compression of visual data that is orders of magnitude better then what we currently have. Right now if I wanted to share any image I created with stable diffusion, I do not have to share the image with you. I could just give you the metadata for the settings used to generate the said image and save a tremendous amount of bandwidth and hard drive space.
Now you might be thinking that's great and all but what's my point. Well imagine if you are running a massive website that hosts generated images. All of those images are taking up a lot of space on server hard drives which you have to pay to host. Also there is massive amount of bandwidth going through the website which also you have to pay for. Now imagine that you had the idea of embedding the generated images metadata where the jpg image should have been. Now all you need is an browser extension that automatically reads the prompt data, runs the data through your local machine, spit the image back out to the browser for you to view. And now the visitor to the site just viewed the image without you actually hosting the said image. All of the rendering was done on the users end locally. This saves you massive amount of space and bandwidth as you only need to store the images metadata.
Furthermore, you can run censored websites in plain view on the open internet without anyone knowing what's on the website. For example China censors all of the traffic on the internet through the use of automated bots that look out for specific images and phrases. Its a closed off system that is heavily monitored. With the technology I propose above, if you were to host a dissident website in china now, there would be no way for the sensors to pick up the data that is flowing through them. to outside observers all of the websites data would look like gibberish as the prompt metadata could be also encoded and only those with the proper seed information and the text to image model can generate the data on the website.
Now you are also wondering great, but what about images that were not generated with any of the text to image models. Well, those can also be done. As we have already been capable of finding the image representation of your image in the latent space of the model and its appropriate coordinates (prompt). With this we can encode any image in to its similar counterpart that exists in the latent space. This image is not 100% accurate but with large enough model it is indistinguishable to any human eye from the original.
Now apply the data compression tech with every form of visual data we currently have and you save an absolute massive amount of bandwidth and storage space in the process while being ultra secure. And all we have to do is create a browser extension that automatically reads the prompt metadata, runs that info locally, spit it back in to the browser for your viewing in proper format. As the data sets of these models grows and models get better with text, you can do the same thing with text if you wish.
Currently the models are in their infancy, so you cant run it on a non gpu machine or a cellphone efficiently but with very little time we will get there, i suspect within this year actually.
In the mean time you can also have the models run on the cloud if you are a large data center. (temporary hybrid solution). For example someone visits your website, your local servers or cloud servers spin up to generate the images and push those images to the user and dump the generated image from your hard drives. That way you lose out on the bandwidth but win out by getting customers who don't have their own machines and also you don't store massive amounts of data.
Bottom line is there are massive opportunities I see opening up with this tech that few are thinking about. there is massive potential in this tech no doubt about it.