Can ML models be developed and improved by a community of like-minded experts who work individually on them before “merging” enhancements back into the initial model? This revolutionary idea was proposed by Colin Raffel, currently an assistant professor at the Computer Science Department at the University of North Carolina.
Raffel made this suggestion in a blog post last week, where he detailed his rationale and explained how such an endeavor might be possible. For the latter, he referenced work already done by researchers, including research he was involved in as a resident at Google’s famed AI Residency Program before joining the University.
Pitfalls of pre-trained models
But first, why should it matter? The most popular use of AI today revolves around the use of transfer learning which generates pre-trained models to tackle specific ML problems. These models are typically fine-tuned through additional training on a downstream task of interest and are a popular way to implement ML in organizations around the world.
The elephant in the room here is the sheer cost of training the initial model, which essentially puts it out of reach of individual data scientists or AI experts. For instance, it was estimated that retraining the popular GPT-3 language model would cost around USD4.6 million in computing resources, putting it beyond the reach of most and into the domain of a handful of large, well-funded corporations.
And as Raffel observed, most pre-trained models are never updated and are left as-released until a better model comes along: “To date, there is no standard approach for updating a released model to address these issues – standard practice is instead to leave them indefinitely ‘frozen’ in the state they were released until they are supplanted by a new model.”
The popular Python programing language would never have incorporated features such as variables, Unicode support, or many other widely-used features were it developed using such an approach, argues Raffel.
“We should develop tools that will allow us to build pre-trained models in the same way that we build open-source software. Specifically, models should be developed by a large community of stakeholders who continually update and improve them.”
Community-developed ML models
By drawing from open-source software development, Raffel suggests the use of community-developed models that are continually improved, bringing ideas such as code merging and versioning into the realm of ML models.
He pointed to how models can already be trained effectively without updating every parameter, with updates communicated to a centralized server. Moreover, updates can be kept to a small subset and compressed, substantially reducing the cost associated with storing and transmitting updates as models are trained.
Referring to work he was involved in to select a small subset of a model’s parameters, Raffel wrote: “We demonstrate that our approach makes it possible to update a small fraction (as few as 0.5%) of the model's parameters while still attaining similar performance to training all parameters.”
To combat merge conflicts, Raffel suggests strategies such as starting from a strong baseline and attempting to average updates from individual workers, though he concedes the latter can degrade performance. He highlighted an improved method of merging models he was involved in developing, though also referenced work by other researchers on distributed training.
The road ahead
Of course, many other barriers remain to be addressed before the idea of an open-source ML model community can take off. In his post, Raffel also addressed the challenges inherent to vetting community contributions, modularity, and backward compatibility.
“[The] development of these models is still in the dark ages compared to best practices in software development. Well-established concepts from open-source software development provide inspiration for methods for building continually-improved and collaboratively-developed pre-trained models.”
“Undertaking this research program will help shift the power away from large corporate entities working in isolation and allow models to be developed democratically by a distributed community of researchers,” he summed up.
Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].
Image credit: iStockphoto/07LE