Researchers at MIT have released a tool that dramatically lowers the bar for making predictive forecasts using existing data in real-time.
Predicting future values
TspDB works with the Postgres DBMS and takes care of the complex modeling under the hood to generate predictions in seconds. This is much simpler and faster than traditional deep learning approaches that require data science knowledge and steep learning curves to train models.
At the heart of tspDB is an algorithm that transforms multiple time-series datasets into a tensor, itself a multi-dimensional array of numbers. According to the authors, the model training and prediction query time is close to that of a standard data insertion or read access from the database.
According to the tspDB page, setting it up is as simple as adding a new index to an existing table series table, and is generated using standard SQL queries.
As reported on Silicon Republic, tspDB can analyze data that has more than one time-dependent variable, such as the temperature of a weather database, the dew point, and cloud cover from previous values.
“One reason I think this works so well is that the model captures a lot of time-series dynamics, but at the end of the day, it is still a simple model,” said Abdullah Alomar, one of three research authors behind the tspDB white paper.
“When you are working with something simple like this, instead of a neural network that can easily overfit the data, you can actually perform better,” he said.
The success wasn’t achieved overnight; Shah and his collaborators say they worked on time-series data for years.
Ability to predict future values aside, the algorithm’s ability to fill in missing data points can serve to advance the field of data science. Data scientists spend a significant proportion of their time with data wrangling tasks such as cleaning up erroneous entries or filling in missing values.
Filling in missing values with predicted values that are rapidly generated would be a significant improvement over blunt approaches such as using mean or median values.
For now, the researchers are working to improve the functionalities and user-friendliness of tspDB, while also looking at new algorithms that can be incorporated.
You can read more about how tspDB works and its performance here.
Image credit: iStockphoto/liseykina