Lisa Ehrlinger, Senior Researcher at Johannes Kepler University joins theCUBE hosts Dave Vellante (@dvellante) and Paul Gillin (@pgillin) live from MIT CDOIQ 2019
#theCUBE #MITCDOIQ #WomenInTech @SiliconANGLE theCUBE
https://siliconangle.com/2019/08/12/do-businesses-run-on-premium-data-new-study-assesses-variables-in-data-quality-tools-mitcdoiq-womenintech/
Do businesses run on premium data? New study assesses variables in data quality tools
Data is a critical resource. Its insights drive operational and strategic decisions not only for big-data behemoths such as Google, Facebook and Amazon, but also a range of industries from jet engine manufacturers to major league basketball to agriculturalists who use data to increase crop yield.
Raw data as a resource is often compared to crude oil as a driver of economic change. Like crude oil, data is unusable in its natural state. The value is obtained only after refining the base product into a usable form. And as with oil, the quality of the output can vary.
But unlike petroleum-based products, data has no clear labeling system, meaning businesses are often blind as to whether they are operating on the data equivalent of 100-octane jet fuel or high-sulfur off-road diesel.
Statistics show that 84% of global chief executive officers are concerned about data standards, and flawed data costs U.S. businesses $15 million a year in losses. This has led to a proliferation of software tools to monitor data quality; some of which are of dubious quality themselves. Determining “how data quality measurement and monitoring is implemented in state-of-the-art data quality tools” has been documented in the just-released “Survey of Data Quality Measurement and Monitoring Tools.”
“The main motivation for this study was actually a very practical one,” said Lisa Ehrlinger (pictured), senior researcher at Johannes Kepler University and co-author of the study. “We spent the majority of time in [our] big-data projects on data quality measurement and improvement tasks. So, we [asked] what tools are out there on the market to automate these data quality tasks.”
Ehrlinger spoke with Dave Vellante and Paul Gillin, co-hosts of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the MIT CDOIQ Symposium in Cambridge, Massachusetts. They discussed the research methods and the results of the study (see the full interview with transcript here).
This week, theCUBE spotlights Lisa Ehrlinger in its Women in Tech feature.
Automating data quality measurement
Ehrlinger has been at Johannes Kepler University in Linz, Austria, since her undergraduate days and holds both bachelor’s and master’s degrees in computer science from the university. She is currently working on her doctorate thesis on automated continuous data quality measurement under the supervision of Professor Dr. Wolfram Wöß from the Institute of Application-oriented Knowledge Processing at Johannes Kepler.
During her studies, Ehrlinger expanded her experience by working on information-technology projects for diverse employers. These include Oracle, software intelligence company Dynatrace LLC, the Roman Catholic Diocese of the city of Linz, Austria, and most recently the Software Competence Center Hagenburg.
In just the past four years, Ehrlinger has published her master’s thesis on “Data Quality Assessment on Schema-Level for Integrated Information Systems,” co-authored 10 additional research papers, and co-edited the conference proceedings for the Tenth International Conference on Advances in Databases, Knowledge, and Data Applications.
Ehrlinger was a featured speaker at the MIT CDOIQ Symposium, giving a talk inspired by her doctoral research titled “Automating Data Quality Measurement With Tools.”
Not all data quality tools are equal
Ehrlinger and her team identified 667 data quality tools on the market, and they then narrowed that number down to 13 for detailed testing and analysis based on their domain independence, non-specificity, and availability free or on a trial basis. Just over half (50.8%) of the tools were excluded because they were domain-specific; meaning they were dedicated to specific data types or proprietary tools.
“We just really wanted to find tools that are generally applicable for different kinds of data, for structured data, unstructured data, and so on,” Ehrlinger said.
Another 40% were excluded because they were dedicated to a specific management task, such as data visualization, integration or cleansing.
The tools selected had to offer three functionality areas identified by the research team as the most important: data profiling, quality metrics and quality monitoring: “Data profiling to get a first insight into data quality … data quality management in terms of dimensions, metrics and rules … [and] data quality monitoring over time,” Ehrlinger explained.
...
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
MIT Chief Data Officer and Information Quality Symposium (CDOIQ) 2019 | Boston. If you don’t think you received an email check your
spam folder.
Sign in to MIT Chief Data Officer and Information Quality Symposium (CDOIQ) 2019 | Boston.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For MIT Chief Data Officer and Information Quality Symposium (CDOIQ) 2019 | Boston
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for MIT Chief Data Officer and Information Quality Symposium (CDOIQ) 2019 | Boston.
Thanks for confirming your account. Now you can access MIT Chief Data Officer and Information Quality Symposium (CDOIQ) 2019 | Boston with this email address.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
MIT Chief Data Officer and Information Quality Symposium (CDOIQ) 2019 | Boston. If you don’t think you received an email check your
spam folder.
Sign in to MIT Chief Data Officer and Information Quality Symposium (CDOIQ) 2019 | Boston.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to MIT Chief Data Officer and Information Quality Symposium (CDOIQ) 2019 | Boston
Please sign in with LinkedIn to continue to MIT Chief Data Officer and Information Quality Symposium (CDOIQ) 2019 | Boston. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Lisa Ehrlinger, Johannes Kepler University | MIT CDOIQ 2019
Lisa Ehrlinger, Senior Researcher at Johannes Kepler University joins theCUBE hosts Dave Vellante (@dvellante) and Paul Gillin (@pgillin) live from MIT CDOIQ 2019
#theCUBE #MITCDOIQ #WomenInTech @SiliconANGLE theCUBE
https://siliconangle.com/2019/08/12/do-businesses-run-on-premium-data-new-study-assesses-variables-in-data-quality-tools-mitcdoiq-womenintech/
Do businesses run on premium data? New study assesses variables in data quality tools
Data is a critical resource. Its insights drive operational and strategic decisions not only for big-data behemoths such as Google, Facebook and Amazon, but also a range of industries from jet engine manufacturers to major league basketball to agriculturalists who use data to increase crop yield.
Raw data as a resource is often compared to crude oil as a driver of economic change. Like crude oil, data is unusable in its natural state. The value is obtained only after refining the base product into a usable form. And as with oil, the quality of the output can vary.
But unlike petroleum-based products, data has no clear labeling system, meaning businesses are often blind as to whether they are operating on the data equivalent of 100-octane jet fuel or high-sulfur off-road diesel.
Statistics show that 84% of global chief executive officers are concerned about data standards, and flawed data costs U.S. businesses $15 million a year in losses. This has led to a proliferation of software tools to monitor data quality; some of which are of dubious quality themselves. Determining “how data quality measurement and monitoring is implemented in state-of-the-art data quality tools” has been documented in the just-released “Survey of Data Quality Measurement and Monitoring Tools.”
“The main motivation for this study was actually a very practical one,” said Lisa Ehrlinger (pictured), senior researcher at Johannes Kepler University and co-author of the study. “We spent the majority of time in [our] big-data projects on data quality measurement and improvement tasks. So, we [asked] what tools are out there on the market to automate these data quality tasks.”
Ehrlinger spoke with Dave Vellante and Paul Gillin, co-hosts of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during the MIT CDOIQ Symposium in Cambridge, Massachusetts. They discussed the research methods and the results of the study (see the full interview with transcript here).
This week, theCUBE spotlights Lisa Ehrlinger in its Women in Tech feature.
Automating data quality measurement
Ehrlinger has been at Johannes Kepler University in Linz, Austria, since her undergraduate days and holds both bachelor’s and master’s degrees in computer science from the university. She is currently working on her doctorate thesis on automated continuous data quality measurement under the supervision of Professor Dr. Wolfram Wöß from the Institute of Application-oriented Knowledge Processing at Johannes Kepler.
During her studies, Ehrlinger expanded her experience by working on information-technology projects for diverse employers. These include Oracle, software intelligence company Dynatrace LLC, the Roman Catholic Diocese of the city of Linz, Austria, and most recently the Software Competence Center Hagenburg.
In just the past four years, Ehrlinger has published her master’s thesis on “Data Quality Assessment on Schema-Level for Integrated Information Systems,” co-authored 10 additional research papers, and co-edited the conference proceedings for the Tenth International Conference on Advances in Databases, Knowledge, and Data Applications.
Ehrlinger was a featured speaker at the MIT CDOIQ Symposium, giving a talk inspired by her doctoral research titled “Automating Data Quality Measurement With Tools.”
Not all data quality tools are equal
Ehrlinger and her team identified 667 data quality tools on the market, and they then narrowed that number down to 13 for detailed testing and analysis based on their domain independence, non-specificity, and availability free or on a trial basis. Just over half (50.8%) of the tools were excluded because they were domain-specific; meaning they were dedicated to specific data types or proprietary tools.
“We just really wanted to find tools that are generally applicable for different kinds of data, for structured data, unstructured data, and so on,” Ehrlinger said.
Another 40% were excluded because they were dedicated to a specific management task, such as data visualization, integration or cleansing.
The tools selected had to offer three functionality areas identified by the research team as the most important: data profiling, quality metrics and quality monitoring: “Data profiling to get a first insight into data quality … data quality management in terms of dimensions, metrics and rules … [and] data quality monitoring over time,” Ehrlinger explained.
...