AI software called DALL-E turns your words into pictures

The DALL-E Mini software from a group of open source developers is not perfect, but sometimes it actually offers images that match people’s text descriptions.


Scrolling through your social media feeds lately, chances are you’ve noticed some illustrations with captions. They are popular now.

The images you see are likely made possible by a text-to-image conversion program called DALL-E. Before posting the illustrations, people insert words, which are then converted into images using artificial intelligence models.

For example, a Twitter user posted a tweet with the text “To be or not to be, rabbi holding lawyer, marble sculpture”. The attached photo, which is quite elegant, shows a marble statue of a bearded man in a robe and bowler hat, holding a lawyer.

The AI ​​models come from Google’s Imagen software as well as OpenAI, a Microsoft-backed startup that developed DALL-E 2. On its website, OpenAI calls DALL-E 2 “a new AI system able to create realistic images and art from natural language description.”

But most of what happens in this space comes from a relatively small group of people sharing their photos and, in some cases, generating strong engagement. Indeed, Google and OpenAI have not made the technology widely available to the public.

Many of the early adopters of OpenAI were friends and relatives of employees. If you want to access it, you must join a waiting list and indicate whether you are a professional artist, developer, academic researcher, journalist or online creator.

“We’re working hard to accelerate access, but it will likely take some time to reach everyone; as of June 15, we’ve invited 10,217 people to try DALL-E,” OpenAI’s Joanne Jang wrote on a company help page. website.

A publicly available system is DALL-E Mini. it relies on open source code from a loosely organized team of developers and is often overloaded with demand. Attempts to use may be greeted with a dialog saying “Too much traffic, please try again”.

It’s a bit reminiscent of Google’s Gmail service, which lured people in with unlimited email storage in 2004. Early adopters could only enter by invitation at first, leaving millions of people waiting. Today, Gmail is one of the most popular email services in the world.

Creating images from text may never be as widespread as email. But the tech is definitely having a moment, and part of its appeal is exclusivity.

Private research lab Midjourney is asking users to fill out a form if they want to experiment with its image-generating bot from a channel on the Discord chat app. Only a select group of people use Imagen and publish images from it.

Text-to-image summarization services are sophisticated, identifying the most important parts of a user’s prompts, then guessing the best way to illustrate those terms. Google trained its Imagen model with hundreds of its internal AI chips on 460 million internal image-text pairs, in addition to external data.

The interfaces are simple. There is usually a text area, a button to start the build process, and an area below to display images. To indicate the source, Google and OpenAI add watermarks to the lower right corner of images from DALL-E 2 and Imagen.

The companies and groups that build the software are rightly concerned that everyone is storming the doors at once. Managing web requests to run queries with these AI models can be expensive. More importantly, models aren’t perfect and don’t always produce results that accurately represent the world.

The engineers trained the models on large collections of words and images from the web, including photos posted on Flickr.

OpenAI, which is based in San Francisco, recognizes the potential for harm that could come from a model that learned to create images by essentially browsing the web. To try to reduce risk, employees have removed violent content from training data, and there are filters that prevent DALL-E 2 from generating images if users submit prompts that may violate company policy against nudity, violence, conspiracies or political content.

“There is a continuous process of improving the security of these systems,” said OpenAI researcher Prafulla Dhariwal.

Biases in the results are also important to understand and represent a broader concern for AI. Boris Dayma, a developer from Texas, and others who worked on DALL-E Mini explained the problem in an explanation of their software.

“Occupations demonstrating higher levels of education (such as engineers, doctors, or scientists) or high physical labor (such as in the construction industry) are predominantly represented by white males,” they wrote. . “On the other hand, nurses, secretaries or assistants are generally women, often white as well.”

Google described similar shortcomings of its Imagen model in an academic paper.

Despite the risks, OpenAI is excited about the kinds of things the technology can enable. Dhariwal said it could open up creative opportunities for individuals and could help with commercial applications for home decor or website wraps.

The results should continue to improve over time. DALL-E 2, which was introduced in April, spits out more realistic images than the initial version announced by OpenAI last year, and the company’s text generation model, GPT, has become more sophisticated with each generation. .

“You can expect that to happen for a lot of these systems,” Dhariwal said.

LOOK: Former Pres. Obama tackles misinformation, says it could get worse with AI

Comments are closed.