Inhi Cho Suh, IBM, with John Furrier and Jeff Kelly at Hadoop Summit 2014
@thecube
#hadoopsummit
Thousands of new data jobs will be created in the next couple of years, making “data scientist” one of the hottest emerging job titles. New data jobs will center on “analytics, data architects, data scientists [and] data modelers,” according to Inhi Cho Suh, Vice President and General Manager of Big Data, Integration & Governance at IBM. But what skill sets do you need to develop to become a data scientist?
At the recently concluded Hadoop Summit 2014, IBM’s Suh and other guests joined theCUBE co-hosts Jeff Frick, John Furrier and Jeff Kelly to discuss needed data scientist skill sets as well as the talent shortage. “[The] talent gap continues to be a big issue,” summarized John Furrier, founder of SiliconANGLE. “There’s a huge demand for data scientists.”
A data scientist is described in many ways. According to Josh Wills, Senior Director of Data Science at Cloudera, it’s a “person who is better at statistics than any software engineer and better at software engineering than any statistician,” he wrote in a May 3, 2012 tweet @josh_wills. Also, data scientists must know math, statistics, experiments, causal inference, machine learning, and software, according to a blog post by data scientist Trey Causey.
Data Scientist definition by Josh Wills
Most knowledge workers today already possess some data scientist skills, according to Jeff Kelly, analyst at Wikibon. “It’s interesting that the [U.S.] government points out that…170,000 more data jobs [will be created in the next two years],” said Kelly at Hadoop Summit 2014. “But really, if you think about it, most knowledge workers are becoming data professionals in a lot of ways. You’ve got to understand how to interpret data and how to communicate with data. And that’s one of the softer problems, one of the non-technology problems that I think a lot of organizations run into.”
Data scientists should be able to program but don’t need to be masters of a language out of the gate. “It doesn’t matter what language you learn first. Pick a language and learn it,” Causey advised aspiring data scientists in his blog post. “Write bad code that breaks. Just learn it.” By the time you figure out what your language is bad at or can’t do, Causey wrote, you’ll already know enough about programming that you’ll know which language you need to learn next to solve your data problem.
As enterprise applications become more data-centric, the roles of the data scientist and the application developer are actually merging, according to Kelly. “In the short-term, this means the two roles must learn to collaborate more effectively and both must assume new ways of thinking,” Kelly recently wrote. “For data scientists, this means starting to think more about how the insights they uncover can be translated into repeatable form factors consumable by end-users. And application developers need to gain a better understanding of data flows and how analytic requirements impact application performance.”
.
Gaining data scientist skill sets
.
Filling in the talent gap for data scientists may require a combination of efforts by both universities and industry vendors. “Universities are big, slow-moving beasts and they don’t necessarily have in place ‘data science’ schools,” Mark Lowerison, Director of Research and Academics at the University of Calgary, told theCUBE’s Furrier at Hadoop Summit 2014. “They have molecular biology schools and genetic schools and statistics schools and computer science programs. It’s the people who…[attend]…those schools that become the good candidates for working in our industry.”
There are vendor-sponsored data scientist education programs to help address the talent gap. Cloudera, for one, offers a course toward a certification called a Cloudera Certified Professional: Data Scientist (CCP: DS). According to Cloudera, candidates must prove their abilities under real-world conditions by designing and developing a production-ready data science solution that is peer-evaluated for its accuracy, scalability, and robustness.
For the CCP: DS certification, Cloudera assesses 11 different areas ranging from the ability to ingest data, transform data, query complex math across that data and deploy machine learning algorithms, “[and to] deploy all of that at scale,” said Brad Johnson, Certification Manager at Cloudera in a video on the company’s website.
Some vendors, such as IBM, have been working with colleges and universities to help churn out tomorrow’s data scientists. “We’re…actually working with several universities globally to actually put together a curriculum—both in the business school as well as in the technical schools—for certifications and advanced sort of Masters classes around various data type jobs,” said IBM’s Suh at Hadoop Summit 2014.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Hadoop Summit 2014 | San Jose. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Hadoop Summit 2014 | San Jose
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Hadoop Summit 2014 | San Jose.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Hadoop Summit 2014 | San Jose. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Hadoop Summit 2014 | San Jose
Please sign in with LinkedIn to continue to Hadoop Summit 2014 | San Jose. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Inhi Cho Suh | Hadoop Summit 2014
Inhi Cho Suh, IBM, with John Furrier and Jeff Kelly at Hadoop Summit 2014
@thecube
#hadoopsummit
Thousands of new data jobs will be created in the next couple of years, making “data scientist” one of the hottest emerging job titles. New data jobs will center on “analytics, data architects, data scientists [and] data modelers,” according to Inhi Cho Suh, Vice President and General Manager of Big Data, Integration & Governance at IBM. But what skill sets do you need to develop to become a data scientist?
At the recently concluded Hadoop Summit 2014, IBM’s Suh and other guests joined theCUBE co-hosts Jeff Frick, John Furrier and Jeff Kelly to discuss needed data scientist skill sets as well as the talent shortage. “[The] talent gap continues to be a big issue,” summarized John Furrier, founder of SiliconANGLE. “There’s a huge demand for data scientists.”
A data scientist is described in many ways. According to Josh Wills, Senior Director of Data Science at Cloudera, it’s a “person who is better at statistics than any software engineer and better at software engineering than any statistician,” he wrote in a May 3, 2012 tweet @josh_wills. Also, data scientists must know math, statistics, experiments, causal inference, machine learning, and software, according to a blog post by data scientist Trey Causey.
Data Scientist definition by Josh Wills
Most knowledge workers today already possess some data scientist skills, according to Jeff Kelly, analyst at Wikibon. “It’s interesting that the [U.S.] government points out that…170,000 more data jobs [will be created in the next two years],” said Kelly at Hadoop Summit 2014. “But really, if you think about it, most knowledge workers are becoming data professionals in a lot of ways. You’ve got to understand how to interpret data and how to communicate with data. And that’s one of the softer problems, one of the non-technology problems that I think a lot of organizations run into.”
Data scientists should be able to program but don’t need to be masters of a language out of the gate. “It doesn’t matter what language you learn first. Pick a language and learn it,” Causey advised aspiring data scientists in his blog post. “Write bad code that breaks. Just learn it.” By the time you figure out what your language is bad at or can’t do, Causey wrote, you’ll already know enough about programming that you’ll know which language you need to learn next to solve your data problem.
As enterprise applications become more data-centric, the roles of the data scientist and the application developer are actually merging, according to Kelly. “In the short-term, this means the two roles must learn to collaborate more effectively and both must assume new ways of thinking,” Kelly recently wrote. “For data scientists, this means starting to think more about how the insights they uncover can be translated into repeatable form factors consumable by end-users. And application developers need to gain a better understanding of data flows and how analytic requirements impact application performance.”
.
Gaining data scientist skill sets
.
Filling in the talent gap for data scientists may require a combination of efforts by both universities and industry vendors. “Universities are big, slow-moving beasts and they don’t necessarily have in place ‘data science’ schools,” Mark Lowerison, Director of Research and Academics at the University of Calgary, told theCUBE’s Furrier at Hadoop Summit 2014. “They have molecular biology schools and genetic schools and statistics schools and computer science programs. It’s the people who…[attend]…those schools that become the good candidates for working in our industry.”
There are vendor-sponsored data scientist education programs to help address the talent gap. Cloudera, for one, offers a course toward a certification called a Cloudera Certified Professional: Data Scientist (CCP: DS). According to Cloudera, candidates must prove their abilities under real-world conditions by designing and developing a production-ready data science solution that is peer-evaluated for its accuracy, scalability, and robustness.
For the CCP: DS certification, Cloudera assesses 11 different areas ranging from the ability to ingest data, transform data, query complex math across that data and deploy machine learning algorithms, “[and to] deploy all of that at scale,” said Brad Johnson, Certification Manager at Cloudera in a video on the company’s website.
Some vendors, such as IBM, have been working with colleges and universities to help churn out tomorrow’s data scientists. “We’re…actually working with several universities globally to actually put together a curriculum—both in the business school as well as in the technical schools—for certifications and advanced sort of Masters classes around various data type jobs,” said IBM’s Suh at Hadoop Summit 2014.