Ping An keeps flexing its AI muscle. The latest is achieving the pole position at Stanford Question Answering Dataset 2.0 (SQuAD 2.0) of Stanford University—for the third time.
SQuAD 2.0 is a test of machine reading comprehension. It is an important benchmark test that pits rival ML teams from international players with each other, while comparing their efforts with those from humans. Previous winners include Microsoft, Google and Alibaba.
The test involves a reading comprehension dataset, comprising questions on a set of Wikipedia articles. The answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.
SQuAD 2.0 combines the 100,000 questions in SQuAD 1.1 with over 50,000 unanswerable questions that look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible but also determine when no answer is supported by the paragraph and abstain from answering.
In this competition, the ensemble model of ALBERT + DAAF + Verifier submitted by Ping An Technology achieved an Exact Match (EM) score of 90.386 for answers that were an exact match to the standard answers, and an F1 score of 92.777 for partially correct answers.
The DAAF (Data Augmentation and Auxiliary Feature) is a learning framework developed by Ping An and played a key role in the test. The framework contains forward and backward algorithms. The forward algorithm can absorb the data for enhancement from external data, and the backward algorithm can filter out the data that has a negative impact on enhancement.
Both DAAF and F1 results places Ping An first overall among global competitors. Shanghai Jiao Tong University is second. Google and Qianxin share fourth position. A previous effort by Ping An is also rwanked third.
Both Ping An scores beat average human performance, according to SQuAD 2.0. Ping An's EM score of 90.386 was 3.56% higher. The F1 score of 92.777 was 3.33% higher.
Photo credit: iStockphoto/nicescene