Karthik Ramasamy, Co-Founder, Streamlio, on theCUBE from Data Platforms 2017.
https://siliconangle.com/2017/05/31/real-time-data-is-as-good-as-its-analytics-says-twitter-alumnus-dataplatforms2017/
#theCUBE #Streamlio #Qubole #SiliconANGLE
Real-time data is as good as its analytics, says Twitter alumnus
To get their real-time data acts together, companies might take a tip from a guy who helped build Twitter, a site synonymous with always-on streams.
First off, Apache Hadoop’s big data framework doesn’t have the brains for real-time, according to Karthik Ramaswamy (pictured), formerly engineering manager at Twitter and now co-founder of Streamlio, an enterprise real-time data project. He is also a member of the faculty in the EECS Department at UC Berkeley.
“It kind of becomes a storage sea where all the data comes and stores there,” Ramaswamy said of Hadoop during the Data Platforms event in Litchfield Park, Arizona.
Hadoop’s strength is in sheer capacity for data — its abilities in real-time data and especially real-time analytics are quite limited, he told George Gilbert (@ggilbert41) and Jeff Frick (@JeffFrick), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio. (* Disclosure below.)
The reason for this is that visibility of data at all stages, from creation point to landing, is not possible in Hadoop, Ramaswamy stated. “You can kind of dump the data in real-time into Hadoop, but until you close the file, you cannot see the data at all, right?” he said.
In order to gain real-time visibility of data, there must be a distributed log that shows data from its entrance point onward, he explained. “The moment the data comes in, the data is immediately visible within the three to five millisecond time frame,” he said.
Streaming data platform Apache Kafka uses a distributed log in this way, Gilbert noted.
Model behavior
This highly visible streaming data can help rejigger analytics models on the fly, Ramaswamy stated.
“Once the model is built, the model is pre-loaded into the real-time compute environment like Heron [Twitter’s open-source data streaming engine],” he said.
The next step is model enhancement based on analysis of users’ changing behavior as shown through real-time data streams. It will then be possible to look up the model and serve data such as a relevant ad for a user to click on, he added.
(* Disclosure: TheCUBE is a paid media partner for Data Platforms 2017. Neither Qubole Inc. nor other sponsors have editorial influence on theCUBE or SiliconANGLE.)
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Data Platforms 2017 | Phoenix. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Data Platforms 2017 | Phoenix
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Data Platforms 2017 | Phoenix.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Data Platforms 2017 | Phoenix. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Data Platforms 2017 | Phoenix
Please sign in with LinkedIn to continue to Data Platforms 2017 | Phoenix. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Karthik Ramasamy, Streamlio | Data Platforms 2017
Karthik Ramasamy, Co-Founder, Streamlio, on theCUBE from Data Platforms 2017.
https://siliconangle.com/2017/05/31/real-time-data-is-as-good-as-its-analytics-says-twitter-alumnus-dataplatforms2017/
#theCUBE #Streamlio #Qubole #SiliconANGLE
Real-time data is as good as its analytics, says Twitter alumnus
To get their real-time data acts together, companies might take a tip from a guy who helped build Twitter, a site synonymous with always-on streams.
First off, Apache Hadoop’s big data framework doesn’t have the brains for real-time, according to Karthik Ramaswamy (pictured), formerly engineering manager at Twitter and now co-founder of Streamlio, an enterprise real-time data project. He is also a member of the faculty in the EECS Department at UC Berkeley.
“It kind of becomes a storage sea where all the data comes and stores there,” Ramaswamy said of Hadoop during the Data Platforms event in Litchfield Park, Arizona.
Hadoop’s strength is in sheer capacity for data — its abilities in real-time data and especially real-time analytics are quite limited, he told George Gilbert (@ggilbert41) and Jeff Frick (@JeffFrick), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio. (* Disclosure below.)
The reason for this is that visibility of data at all stages, from creation point to landing, is not possible in Hadoop, Ramaswamy stated. “You can kind of dump the data in real-time into Hadoop, but until you close the file, you cannot see the data at all, right?” he said.
In order to gain real-time visibility of data, there must be a distributed log that shows data from its entrance point onward, he explained. “The moment the data comes in, the data is immediately visible within the three to five millisecond time frame,” he said.
Streaming data platform Apache Kafka uses a distributed log in this way, Gilbert noted.
Model behavior
This highly visible streaming data can help rejigger analytics models on the fly, Ramaswamy stated.
“Once the model is built, the model is pre-loaded into the real-time compute environment like Heron [Twitter’s open-source data streaming engine],” he said.
The next step is model enhancement based on analysis of users’ changing behavior as shown through real-time data streams. It will then be possible to look up the model and serve data such as a relevant ad for a user to click on, he added.
(* Disclosure: TheCUBE is a paid media partner for Data Platforms 2017. Neither Qubole Inc. nor other sponsors have editorial influence on theCUBE or SiliconANGLE.)