Contributing writer

Lay the groundwork now for advanced analytics and AI

Feature
Aug 03, 20239 mins
Artificial IntelligenceCloud ManagementData Architecture

Proper data integration, modeling, and maintenance make up the unglamorous but necessary foundation for high-impact analytics and AI applications. Without it, data is too hard to access and, even if can be analyzed, will deliver inaccurate results.

When global technology company Lenovo started utilizing data analytics, they helped identify a new market niche for its gaming laptops, and powered remote diagnostics so their customers got the most from their servers and other devices.  

Comcast is using data analytics to reduce the cost, and improve the efficacy of, its 10P byte of security data to better understand attacks, respond more effectively, and improve its ability to predict future threats.    

And at First Commerce Bank, EVP and COO Gregory Garcia hopes to leverage unified, real-time data to monitor risks such as worsening vacancy rates that could make it harder for commercial property owners to pay their mortgages.

But reaching all these goals, as well as using enterprise data for generative AI to streamline the business and develop new services, requires a proper foundation. That hard, ongoing work includes integrating siloed data, modeling, and understanding it, as well as maintaining and securing it over time.

Integrate data siloes

At Lenovo, customer authorized usage logs showed a sizeable number of customers using its consumer-grade IdeaPad laptops for gaming rather than its higher-end gaming notebooks.  In response, Lenovo launched a new line of entry-level gaming laptops and desktops it now brands as Lenovo LOQ that caters to a new gamer’s first foray into gaming, says Girish Hoogar, global head of engineering for Lenovo’s cloud and software business in its Intelligent Devices Group.

It also used device data to develop Lenovo Device Intelligence, which uses AI-driven predictive analytics to help customers understand and proactively prevent and solve potential IT issues. Lenovo Device Intelligence can also help to optimize IT support costs, reduce employee downtime, and improve the user experience, the company says.

But before consolidating the required data, Lenovo had to overcome concerns around sharing potentially sensitive information. Hoogar’s staff helped relieve such fears by educating employees that information included in the solution, such as notices of bug fixes or software updates, was already public.

In the past, First Service Credit Union’s Chief data officer Ty Robbins struggled to integrate data from the legacy, non-relational, and often proprietary tabular databases on which many credit unions run. “You had to be an expert in the programming language that interacts with that data, and understand the relationships of each data element within each data source, let alone understand its relation to elements in other data sources,” he says.

Using the metadata-driven Cinchy Data Collaboration Platform reduced a typical modeling and integration effort from 18 months to six weeks, he says. It also helps him democratize credit union data so it can be used to improve customer service, automate the maintenance of such data by making various types of data easier to find, and provide chains of custody and audit controls to help meet regulatory needs.

At Ocean Technologies Group (OTG), CTO Ian Hepworth must not only integrate maintenance and crew data from the 20,000 vessels managed by OTG’s platform but from the six companies OTG has acquired. As well as keeping its current data accurate and accessible, the company wants to leverage decades of historical data to identify potential risks to ship operations and opportunities for improvement.

Each of the acquired companies had multiple data sets with different primary keys, says Hepworth. “We needed a clever tool to help us efficiently put this data into a data warehouse and enable us to start to build customer views,” he adds. Using SnapLogic’s integration platform freed his developers from manually building APIs (application programming interfaces) for each data source, and helped with cleaning the data and storing it quickly and efficiently in the warehouse, he says. SnapLogic not only cut the workload of his staff, he says, but provided an API allowing OTG’s customers to download data from it.

Model, understand, and transform the data

Comcast faced the challenge of collecting large amounts of information about potential security and reliability issues but with no easy way to make sense of it all, says Noopur Davis, corporate EVP, CISO, and chief product privacy officer.  

After moving its expensive, on-premise data lake to the cloud, Comcast created a three-tiered architecture. The first keeps a full year of raw data in lower cost and lower speed storage for low frequency use cases, such as forensic analysis. The second stores currently needed data “with metadata, fully normalized, and in a time series,” says Davis, which can be used by analysts for more immediate retrieval. The third layer, on the most expensive but highest performing storage, contains data marts and data warehouses configured with the required data links for the most frequent use cases and personas.

Comcast focuses its predictive analytics on parts of its security infrastructure that are critical to the business continuity, such as secure Wi-Fi in its retail stores. Like other enterprises, it’s moving toward the use of data fabrics that allow multiple authorized users to access data from single “sources of truth,” rather than make copies for each new user who needs it. The aim, says Davis, is less cutting data transmission or storage costs than making it easier for data custodians to govern the data. Comcast also realizes double-digit million-dollar cost avoidance, she says, by retiring data management tools, whose functions are now served by the data lake.

At Paytronix, which manages customer loyalty, online ordering, and other systems for its customers, director of data science ­Jesse Marshall wanted to reduce the custom coding of data transformations—the conversion, cleaning, and structuring of data into a form usable for analytics and reports.

To free his staff from maintaining and fixing past transformations so they can focus on new projects, he uses the Coalesce data transformation tool, which, he says, gives Paytronix a drag and drop interface for creating transformations, and makes it easier to troubleshoot data transformation problems, as well as maintain those transformations as the company’s infrastructure changes, he says.

The ability to easily create new transformations allows the business to try more analytic approaches to find the unexpected, but valuable, winners. “In the old world, if we had 10 ideas for useful analytics, we only had time to work on four of those,” he says. “We wanted the team to try every idea even if 60% of them failed.”

Maintain and secure the data over time

Despite enterprise-wide needs for more and better data, it can be difficult to convince business units or boards of directors to fund the ongoing work to ensure data is accurate, timely, and secure.

Yao Morin, chief technology officer at commercial real estate services firm JLL compares data maintenance to plumbing, which nobody thinks about until it malfunctions and creates a messy, urgent problem. To get the needed funding, data practitioners must continue to show business leaders the value of data, and how if we don’t maintain the data, it’ll be useless, she says.

In JLL’s case, such value includes meeting demands from clients (and the renters who occupy their buildings) for new types of information as workers return to the office after COVID-19 lockdowns. This includes whether employees are sitting isolated at desks or meeting in crowded conference rooms, the quality of the air in them, and what amenities, such as restaurants, are open near their offices to lure them back.

While senior management backing is crucial for ongoing data management, Lenovo’s Hoogar calls doing the work a collective responsibility for everyone. One way to build ground-level support, he says, is finding data enthusiasts in each department and building their skills through courses and regular meetings with other data champions or data councils. Continual education, training, and upskilling are also critical to better data management, he says.

“The issue CIOs run into is that many boards and bank CEOs are reluctant to hire data analysts over commercial lenders because they don’t see them as revenue generating resources,” says Garcia at First Commerce Bank. “But a financial institution armed with a dozen data analysts properly weaponized with real-time data can be more effective than a legion of lenders aimlessly trying to grow their portfolios without the proper analytics to guide them.”

Start early

The time to standardize everything from data modeling to its security is when the data is acquired. “We template a lot of our data ingestion processes,” says Morin, requiring the addition of metadata and a data dictionary so business leaders can know what information they can get from the data lake. “Without those templates, it’s hard to add such information after the fact.”

Robbins, at First Service Credit Union, urges holistic, up-front data modeling to create well understood data that can be analyzed easily and in new ways. For example, a query asking how many deposits a credit union received each month will only draw on the elements required to get the data for that report, he says. Generating a related report and asking for the number of new accounts receiving deposits requires starting from scratch, which wastes valuable time. “With a metadata platform, you assemble all the data adjusted to those elements in one view so you can simply do any one of a number of reports on that data,” he says.

Along with such day-to-day benefits, companies such as Comcast say the right data architecture and infrastructure allow them to develop exciting new generative AI applications much quicker than they expected. But before reaping such benefits, “you’ve got to get the infrastructure right and the data clean,” says Davis. “It takes a lot of grunt work, but with that work done, one can do amazing things.”