Google plans to introduce an open-source machine learning model that can answer a direct question written in “natural language” (for example, “Which country’s president has the longest rule?”). The creators of the model claim that it is able to find answers for which you need to know information about several facts in different areas at once.
To do this, the team stores a huge array of information in the form of tables. Thomas Muller of Google Research noted that there are separate bases for this – for example, global financial statistics and sports results are stored in this form. But these tables often lack an intuitive way to use them – this is the problem that AI can solve.
To answer such questions, the model encodes the question and then looks for statistics in the tables. For each cell in the table, the model generates an estimate indicating the likelihood that the information there will become part of the answer. In addition, it displays a probability indicating which operation should be applied to get the final answer (for example, “AVERAGE”, “SUM” or “COUNT”).
To prepare the model, the researchers additionally extracted 6.2 million pairs of tabular materials from the English Wikipedia that served as a training data set. During pre-training, the model learned – with relatively high accuracy – to recover deleted words in both tables and text. AI was able to answer 71.4% of researchers’ questions. They are confident that its accuracy will increase in the future.