Baidu AI Research has announced PLATO-XL, an 11-billion parameter natural language processing (NLP) model that outperforms existing solutions for maintaining a conversation.
Though today’s NLP systems are good at understanding and responding to single commands, maintaining a clear and engaging conversation has so far eluded AI bots. Yet maintaining a conversation is vital for building good chatbots and next-gen AI-powered systems that can serve as emotional companions or intelligent assistants.
Good at chit-chat
PLATO-XL is implemented on PaddlePaddle, a deep learning platform also developed by Baidu. The model employs gradient checkpoint and sharded data parallelism offered by the FleetX distributed training library in PaddlePaddle to train large models. For the actual training, a total of 256 Nvidia Tesla V100 32G GPU cards deployed in a high-performance GPU cluster were used.
Baidu outlined the technique used in PLATO-XL: “PLATO-XL is based on a unified transformer design that enables simultaneous modeling of dialogue comprehension and response production... The team used a variable self-attention mask technique to enable Bidirectional encoding of dialogue history and unidirectional decoding of responses.”
The majority of the pretraining data is gathered from social media in which multiple users exchange ideas. To address the issue of the learned models offering out-of-context information from multiple participants, the researchers used multi-party aware pretraining, allowing the model to distinguish information in context and maintain consistency when generating dialog.
The current model has 11 billion parameters and two dialogue models, one for Chinese and one for English. Baidu says PLATO-XL outperforms other open-source Chinese and English dialog models, including Blender, DialoGPT, EVA, and the earlier PLATO-2 also from Baidu. Notably, Baidu claims PLATO-XL offers significantly better performance than current mainstream commercial chatbots.
For now, the team behind PLATO-XL says it currently suffers from “unfair biases, incorrect information, and the inability to learn continuously” among others. This is presumably attributed to the use of social media conversations of its training data, where conversations are prone to exaggeration.
Baidu promised to eventually release the source code for PLATO-XL along with the English model on GitHub to facilitate research in dialog generation. For now, a white paper has been published and can be accessed here.
See sample of the PLATO-XL chatbot in action below.
Image credit: iStockphoto/monkeybusinessimages