If you have ever asked yourself whether a computer can be creative, we have just gotten closer to the answer.
While software has long been able to generate artistic work like text, scripts and images, a new piece of software called OpenAI GLIDE AI has recently been released which blows these out of the water.
See the video here to understand how it works:
As you can see in the video, the software is able to take any text command you give it, and either produce an image with that, or even edit existing images with your words.
Most impressively, you can specify the style that the image should be in. So the software can produce an image which looks like:
- a photo
- Van Gogh
- Video Game Pixel Art
- stained glass
- a child’s crayon drawing
In all of these examples, the text below the image was the only input given to the AI, and it produced the image by itself.
Researchers recently released the journal paper describing how they trained an AI model with 3.5 billion parameters so that it could understand natural language commands from people.
So not only can you simply type what you want to exist in the image, you can even use the system to edit existing images using text commands, and result in edited images that look like they came directly out of photoshop.
Using the system, you can highlight where in an image you want a change to take place, and write out what the change should be.
For example, look at the second example listed here. A human gave the command “a girl hugging a corgi on a pedestal”, and highlighted the existing dog. The software recognised that this was a dog but not a corgi, so produced an image of the corgi in the same style as the oil painting, but then also made sure the girl had their arm around the dog so that it was being “hugged”.
Because this system can generate images based on text input, it has the possibility to disrupt graphic design and art, since it can produce dozens or hundreds of possible images for a challenge very quickly, which would take a human designer or artist significant time for each image.
You can also easily imagine this system being combined with a system which automatically produces text, like GPT-3, to create entire visual stories with only minimal starting inputs from someone.
So as the work of producing images becomes more and more commoditised through virtually free, instantaneous AI algorithms, companies which previously based their business models on producing them, like advertising agencies and freelancers, may need to evolve and find new additional ways to add value. It is no longer just humans which can create this “creative” output.
The team is aware their model could make it easier for malicious players to produce convincing disinformation or deepfakes. To safeguard against such use cases, they have only released a smaller diffusion model and a noised CLIP model trained on filtered datasets. The code and weights for these models are available on the project’s GitHub.