Bob Violino
Contributing writer

8 tips for unleashing the power of unstructured data

Feature
Nov 28, 202310 mins
Data ManagementData MiningData Science

For most organizations, data in the form of text, video, audio, and other formats is plentiful but remains untapped. Here’s how to unlock business value from this overlooked data trove.

Financial Business Analytics Data Dashboard. Analyst Woman
Credit: Andrey_Popov / Shutterstock

Making the most of enterprise data is a top concern for IT leaders today. With organizations seeking to become more data-driven with business decisions, IT leaders must devise data strategies gear toward creating value from data no matter where — or in what form — it resides.

For many enterprises, unstructured data, in the form of text, video, audio, social media, imaging, sensor, and other formats, remains elusive and untapped. While industry research estimates that as much as 90% of enterprise data is unstructured, 61% of IT leaders say managing unstructured data is a problem for their organization, with another 24% not even including unstructured data on their data and analytics short list, according to research from Foundry.

Unstructured data resources can be extremely valuable for gaining business insights and solving problems. The key is figuring out how to create that value. Organizations that become skilled in tapping these vast information resources can gain a significant advantage in delivering actionable insights to key business processes.

Here is a look at how inventive enterprises are transforming unstructured data into business value today, along with some tips on how to put unstructured data to work for your organization.

Enhancing the creative process

At mobile game development company RetroStyle Games, unstructured data has proved to be a “goldmine” that directly contributes to growing the business and improving games, says Ivan Konoval, a data analyst at the company.

Among the numerous ways RetroSyle Games uses unstructured data, perhaps the most impactful are for concept art gathering, and audio data.

“The creative process of our game developers often begins with a sketch, mood board, or concept art,” Konoval says. “These works, while not structured, capture the essence of what we want to express in the game. To ensure that these works do not get lost among others and can be easily found in the future when working on the game’s sequel, we use advanced image recognition tools.”

These tools categorize and tag various elements of the artwork, whether it’s a character, landscape, or some other element. “This allows our artists and developers to quickly find related artwork, which provides design consistency and speeds up the development process,” Konoval says. “In addition, this system allows us to store information about the development of the company’s artwork, which is very useful when training new employees.”

Regarding audio data, voice acting plays a key role in players’ experience in the game world, Konoval says. “We collect a huge amount of data from in-game dialogs, background sounds, and player voice chats,” he says. “Using voice recognition and sound analysis, we extract nuances such as mood and sentiment.”

For example, if a certain dialog results in players consistently entering voice chats with excitement, developers take note of this. Similarly, anomalies such as background noise that does not match the environment are identified and addressed.

“The insights derived from this audio data have directly contributed to improving the game’s audio experience, ensuring that players are constantly emotionally engaged in the gameplay and interacting with the environment,” Konoval says.

Games are dynamic, and so is the data they generate, Konoval says. Features such as in-game chat sentiment analysis needed real-time processing to filter out inappropriate behavior by players. “We’ve addressed this by leveraging stream processing frameworks like Apache Kafka,” he says. “This allows our game moderators to respond in real-time to any emerging patterns and issues.”

With each game release and update, the amount of unstructured data being processed grows exponentially, Konoval says. “This volume of data poses serious challenges in terms of storage and efficient processing,” he says.

To address this problem RetroStyle Games invested in data lakes. “This not only allows us to store huge amounts of unstructured data, but also to query and analyze it efficiently, providing immediate access to the information we need for our data scientists and developers,” Konoval says.

Powering generative AI

Workhuman, a provider of employee recognition and experience software, is leveraging unstructured data in multiple ways on its cloud-based platform, says Jesse Harriott, head of analytics and execute director.

“Unstructured data is the most prevalent form of data, yet the most challenging to use effectively,” Harriott says.

The Workhuman cloud contains millions of recognition messages from employees around the world, sharing positive feedback about someone with whom they work.

“They do this in their own words, so each recognition moment is completely unique,” Harriott says. “We use this data to power AI models that help companies better define how employees are collaborating in their organization, what topics come up most frequently in messages, and whether there is equity in recognition awards across the organization.”

The company also uses large language models (LLMs) to summarize recognition trends over time and to suggest language for an effective recognition message.

“One initiative I’m particularly proud of is our tool Inclusion Advisor, an in-the-moment AI-based coaching tool that identifies and suggests corrections for unconscious bias in award language before it is sent to the recipient,” Harriott says.

One of the biggest challenges of getting value out of unstructured data is limited access to reliable and valid training data for the business use cases that are the focus for the organization.

“You can have large amounts of unstructured data, but without effective training data to create and validate a model, progress and quality will suffer,” Harriott says. “Leveraging LLMs can certainly help in this regard, but many business use cases are not effectively captured by existing LLMs.”

In addition, “in an LLM there can still be the issue of bias in the training data,” Harriott says. Workhuman has a linguistics team that is responsible for data annotation, augmentation, and validation to deal with some of these issues. “We also partner with our large, multinational customers to make sure models yield meaningful and useful results,” Harriott says.

Tips for transforming unstructured data into value

Harriott, Konoval, and other data experts offer advice on how to ensure success when working with unstructured data.

1. Tie initiatives to business outcomes. IT leaders should make sure initiatives to leverage unstructured data are tightly aligned to business needs and have executive sponsorship, Harriott says.

“Too often, a team may have a creative use case for unstructured data, but the connection to a key business outcome is not obvious to others and may lose support,” Harriott says. “It’s the leader’s responsibility to educate the organization on why the use case is important and how it can directly or indirectly drive business benefit.”

2. Recognize the journey. Also, data leaders should set and celebrate initiative milestones as they are met, especially given how difficult the challenge of creating value with unstructured data can be. 

“Making unstructured data actionable may require more time and effort than the business expects,” Harriott says. “By recognizing milestones, leaders give other stakeholders visibility into the progress being made, and also ensure that their team members feel appreciated for the level of effort they are putting in to make unstructured data actionable.”

3. Quality is job one. Another key to success is to prioritize data quality.

“The adage ‘garbage in, garbage out’ couldn’t be more appropriate,” Konoval says. “Going into analysis without ensuring data quality can be counterproductive. We have always taken this approach: Clean the data, remove what is unnecessary, and ensure that it meets quality standards.”

In the gaming industry, “misinformed decisions can result in expensive feature developments that players might not resonate with, or even worse, bugs that could tarnish our reputation,” Konoval says. “Our rigorous data governance framework ensures the foundation of our analyses is rock-solid.”

4. Separate the actionable from the informative. Prioritizing data that business users can act on is also vital. “What’s important is the volume of data and being able to parse what is actionable versus what is informative,” says Joe Minarik, COO at colocation and data services provider DataBank.

To underscore the importance of this, Minarik gives the example of using unstructured data for systems monitoring. “Actionable aspects have to be prioritized and addressed quickly,” he says. “Because so many aspects of systems are monitored, a single issue can generate alarms and information from downstream devices, causing an overabundance of alerts, alarms, and information that needs to be sifted through to identify what single aspect really needs to be addressed.”

5. Make ample use of AI. Continuing with his example, Minarik points out the valuable role AI and machine learning play in analyzing unstructured data streams over time. “It helps you build system correlation,” he says. “That allows you to drop the noise and get to the root issue immediately.”

For instance, organizations can deploy named entity recognition (NER), a component of natural language processing (NLP) that focuses on identifying and categorizing named entities within unstructured text, with tags such as “person,” “organization,” or “location.”

“In practical terms, entity recognition plays a crucial role in a multitude of applications,” Minarik says. These include information retrieval systems that index and organize content, question-answering systems that locate answers within text, and content recommendation engines that personalize content based on recognized entities.

“By identifying and categorizing named entities, NER empowers data analysts and system engineers to unlock valuable insights from the vast data collected,” Minarik says.

6. Ensure value with visualizations. The process of making unstructured data usable doesn’t end with analysis, Minarik says. It culminates in the reporting and communication of findings.

“Reports typically involve a structured presentation of key findings, methodologies, and the implications of the analysis,” Minarik says. “Visualizations, such as charts, graphs, and dashboards, are instrumental in conveying complex data in an understandable format. Visual representations not only facilitate comprehension but also make it easier for stakeholders to identify trends, outliers, and critical insights, ensuring that timely data-driven decisions are made.”

7. Monitor as you go. Another key practice that is sometimes overlooked is the need for continuous monitoring and maintenance, Minarik says. “Real-life data is dynamic and ever-evolving,” he says. “Continuous monitoring and maintenance are critical to ensuring that the data remains usable over time.”

The key to this is to regularly clean and perform quality checks to maintain data accuracy and reliability, Minarik says. Data anomalies, inconsistencies, and duplicates must be identified and rectified promptly to prevent skewed or erroneous analyses.

8. Keep your team’s skills sharp. Finally, it’s a good practice to invest in the development of the right skills — an effort that, given the constant evolution of underlying tools, must be ongoing.

“The world of data analytics, particularly around unstructured data, is dynamic,” Konoval says. “The smallest advantage, such as a team skilled in the latest image recognition technology and analyzing concept art, can be the difference between a game being a hit or a failure. We’ve already seen how the results of advanced technology have impacted the storytelling and design of our games, resulting in positive feedback and increased player engagement.”