Andy Palmer, TAMR | MIT CDOIQ 2019
Andy Palmer, Co-founder & CEO at TAMR joins theCUBE hosts Dave Vellante (@dvellante) and Paul Gillin (@pgillin) live from MIT CDOIQ in Cambridge MA #theCUBE #MITCDOIQ @SiliconANGLE theCUBE https://siliconangle.com/2019/08/09/real-big-data-problem-machine-learning-can-fix-mitcdoiq-startupoftheweek/ The real big-data problem and why only machine learning can fix it Why do so many companies still struggle to build a smooth-running pipeline from data to insights? They invest in heavily hyped machine-learning algorithms to analyze data and make business predictions. But then, inevitably, they realize that algorithms aren’t magic: If they’re fed junk data, their insights won’t be stellar. So they employ data scientists who spend 90% of their time washing and folding in a data-cleaning laundromat, leaving just 10% of their time to do the job for which they were hired. What’s also flawed about this process is that companies only get excited about machine learning for end-of-the-line algorithms. They should apply machine learning just as liberally in the early cleansing stages instead of relying on people to grapple with gargantuan data sets, according to Andy Palmer, co-founder and chief executive officer of Tamr Inc., which helps organizations use machine learning unify their data silos. Lots of companies have spent large amounts of money on systems for big data collection. Their emphasis on data quantity over quality is readily apparent. “Anybody that’s worked at one of theses big companies can tell you that the data that they get from most of their internal systems sucks, plain and simple,” Palmer said. Palmer and Michael Stonebraker (pictured), co-founder and chief technology officer of Tamr, spoke with Dave Vellante and Paul Gillin, co-hosts of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, which covered the recent MIT CDOIQ Symposium in Cambridge, Massachusetts. They discussed machine learning in big-data cleansing and why Tamr not surprisingly believes startups offer better, more scalable big-data solutions than do legacy companies (see the full interviews with transcripts here and here). This week, theCUBE spotlights Tamr Inc. in its Startup of the Week feature. Big data? Big whoop Palmer and Stonebraker have been trying to deflate the big-data hype bubble for years. All the way back in 2007, they predicted that the Apache Hadoop big-data framework wasn’t going to deliver the results so many expected of it. “Mike actually was really aggressive in saying that it was going to be a disaster,” Palmer said. It’s not that large data sets are bad. They’re obviously necessary for training analytics models and artificial intelligence. It’s the notion that as long as data is big, the rest of the analytics or AI pieces will fall into place that’s left so many companies disillusioned. Organizations now realize that data quality is not negligible. They also know that a data scientist shouldn’t have to spend 80% to 90% or more of his or her time cleansing and wrangling data. There has to be a better, faster way to get data ready for use in analytics and AI. The answer is to start looking at machine learning as a highly practical tool for doing these bulky, unglamorous tasks, according to Palmer. So many vendors use machine learning to make more appealing the marketing of software for prediction, recommendation engines, etc. Tamr is using it for the least glamorous thing there is: cleansing and organizing big data before anyone analyzes, predicts, markets or sells anything with it.