As data science continues to gain prominence globally, more enterprises are waking up to its potential. But getting the most out of data science teams calls for a particular set of practices, says Rama Ramakrishnan, a professor at MIT Sloan.
In a post on the MIT Management blog, he notes that these practices don’t require technical knowledge but revolves around business thinking. He offered some tips for managing data science projects; we sum up three of them below.
Identify the problem to solve
For a start, business leaders must take “extraordinary care” in defining the problems they want their data science teams to solve, says Ramakrishnan. Using a hypothetical marketing campaign as an example, he noted that the onus is on the business leader to identify the customers to target. Getting this right is vital because the training data and modelling approach might be completely different depending on the objective.
“To maximize the chance of identifying the right problem, look at what other companies in your industry are doing... pay less attention to how they are solving it, as there are usually many different ways to solve any data science problem, and more attention to what they are solving,” suggested Ramakrishnan.
Getting business leaders involved mirrors advice from data science veteran Nicolas Paris, who CDOTrends spoke to last week. The head of data at CloudCover observed that data science teams can’t operate in a vacuum or magically come up with solutions by themselves. Instead, business leaders and departmental heads must be willing to engage with their data scientists, he explained.
Once the problem is identified, the next step is to establish a common-sense baseline. A common-sense baseline is how the team would solve the problem without the benefit of powerful machine learning models. Beyond giving a good ballpark of the benefits from the project, it also forces the team to get hold of the necessary data, potentially uncovering issues with the data pipeline.
Check for unintended consequences
Just because a machine learning model yielded better outcomes on one metric does not necessarily mean that it is better for the organization, warns Ramakrishnan. This is because improved performance on a selected metric might come at the expense of another metric that could be even more important.
For instance, an algorithm that increases revenue per visitor at an e-commerce website, but which decreases the conversion rate, or vice versa, could hurt the organization in the long term. For this reason, it is vital not to fixate solely on the selected metric but to keep an eye out for unintended consequences.
Of course, whether the trade-off of one metric against another makes sense is not a decision for data scientists to make. This rests ultimately on business leaders, which is another reason why they must stay engaged and involved with ongoing data science initiatives.
Document extensively and retrain periodically
Unexpected results or incorrect predictions can happen even with the best models. While there is no way to eliminate them, proper logging of all inputs and outputs, as well as proper documentation can make identifying and resolving issues much less challenging.
Finally, the nature of data being fed into the machine learning model will start to drift away from the original data used to train it. Inaction will almost certainly reduce the efficacy of the model, says Ramakrishnan. The solution is to put automated processes in place to track performance, and if necessary, to retrain the model using the latest data.
Finally, it would be a mistake to project manage data science teams like an engineering or IT team, holding them strictly accountable to missed timelines, says Ramakrishnan. The strong element of research in most data science work can make it difficult to predict when a breakthrough will happen, he notes.
Case in point: When Netflix offered a USD1 million grand prize to whoever can beat their in-house movie recommendation system by 10%, it took three years – and entries from 41,305 teams, before someone took the prize home.
Do you have any recommendations or best practices to succeed with data science? Do drop me a note, and I’ll outline them next week.
Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].
Image credit: iStockphoto/Svetlana Mokrova