The Road To Artificial General Intelligence

Another month, another breakthrough in AI. This time, Google’s DeepMind published a blog about a new multi-modal AI system capable of a large variety of tasks such as playing video games, moving a robotic arm to stack blocks, caption pictures, or serving as a chatbot.

In fact, the Gato agent is pre-trained to perform up to 604 distinct tasks of “varying modalities, observations, and action specifications.” And in a research paper published last week, the team behind Gato claims that it can outperform humans for many of these tasks.

There’s an AI for that

According to DeepMind, Gato is noteworthy because it paves the way for the creation of a generalist AI system capable of performing many of the tasks that humans can.

In a tweet, DeepMind founder Demis Hassabis lauded this achievement: “Our most general agent yet!! Fantastic work from the team!”

“[This new approach] reduces the need for handcrafting policy models with appropriate inductive biases for each domain. It increases the amount and diversity of training data since the sequence model can ingest any data that can be serialized into a flat sequence,” explained the DeepMind researchers.

In addition, they also pointed to how general methods of leveraging computation are typically more effective by comparing it to computing: “Historically, generic models that are better at leveraging computation have also tended to overtake more specialized domain-specific approaches, eventually.”

Not everyone is impressed, however. While there is no question that DeepMind’s cross-modal technique is a feat worth shining the spotlight on, not all observers think that this will bring us closer to achieving artificial general intelligence (AGI).

No closer to artificial general intelligence

But first, what is AGI? Considered the Holy Grail of AI, AGI is the ability of an AI system to make sense of the world as a human world, with similar flexibility and capacity to perform a wide variety of tasks. In a nutshell, genuine AGI should possess a human-like ability to reason and learn new skills – often with limited inputs or data.

The issue with Gato is that it merely expands on standard ML methods which see training with a lot of data with no actual understanding of what it is doing. And in some cases, the outcomes are poorer.

As noted on ZDNet, the program can do better than a dedicated ML program at controlling a robotic Sawyer arm to stack blocks. However, the captions for images appear to be quite poor, and it doesn’t do as well as most dedicated ML programs designed to compete in the Arcade Learning Environment.

On the other hand, the novel method behind Gato exposes new weaknesses in the form of unexpected cross-domain knowledge transfer. They wrote: “Cross-domain knowledge transfer is often a goal in ML research, it could create unexpected and undesired outcomes if certain behaviors (e.g. arcade game fighting) are transferred to the wrong context.”

In a nutshell, the development of knowledge transfer will require substantial new research in ethics and safety considerations, the least unexpected actions by machines or malfunctions led to distrust – and potentially open the door to exploitation by hackers.

More power to AI

As I wrote earlier this year, at least one AI scientist thinks that the current approach to AI might be experiencing diminishing returns. Gary Marcus argues that for all the advances made in deep learning, the current approach to recognizing patterns appears to have hit a wall.

Yet other AI researchers think scale is all that matters now. Indeed, researchers have performed a study (See: Scaling laws for neural language models) that demonstrates better performance when neural networks are fed with more data; larger models will give us superior results.

And they are putting the money where the mouth is. Last week, Google released its Tensor Processor Unit (TPU) v4 chips, each of which delivers 275 teraflops of ML-targeted bf16 (“brain floating point”) performance, or more than twice the 123 teraflops of bf16 process of TPU v3.

With more TPU v4 chips that can be packed in each rack or pod, we are looking at 1.1 exaflops for a TPU v4 pod versus 126 petaflops of TPU v3 – close to 10 times more power.

Will a brute force approach through larger models bring us closer to general intelligence, or are we on the wrong track? Only time will tell.

For now, you can access the DeepMind paper on its generalist agent here (pdf).

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].​

Image credit: iStockphoto/KENGKAT