Maria Korolov
Contributing writer

For IT leaders, operationalized gen AI is still a moving target

Feature
Feb 28, 202415 mins
Artificial IntelligenceBusiness OperationsCIO

The blistering pace of advancement in generative AI leaves companies struggling to effectively implement and measure the technology, while guarding against bias and risk.

Group of Empowered Multicultural Men and Women Working in a Research Center, Using Computers to Run Advanced Software, Develop Artificial Intelligence Interface and Cyber Security Protocols
Credit: Gorodenkoff / Shutterstock

The rate of companies that have either already deployed generative AI or are actively exploring it is accelerating to the point where, combined, there are very few holdouts. 

The use of gen AI in the enterprise was nearly nothing in November 2022, where the only tools commonly available were AI image or early text generators. But by May 2023, according to an IDC survey, 65% of companies were using gen AI, and in September, that number rose to 71%, with another 22% planning to implement it in the next 12 months.

Even in its infancy, gen AI has become an accepted course of action and application, and the most common use cases include automation of IT processes, security and threat detection, supply chain intelligence, and automating customer service and network processes, according to a report released by IBM in January. Plus, when you add in cloud-based gen AI tools like ChatGPT, the percentage of companies using gen AI in one form or another becomes nearly universal.

And this doesn’t include the gen AI that’s now being embedded into platforms like Office 365, Google Docs, and Salesforce.

However, getting into the more difficult types of implementations — the fine-tuned models, vector databases to provide context and up-to-date information to the AI systems, and APIs to integrate gen AI into workflows — is where problems might crop up. Building enterprise-grade gen AI platforms is like shooting at a moving target, and AI progress is developing at a much faster rate than they can adapt.

“It makes it challenging for organizations to operationalize generative AI,” says Anand Rao, AI professor at Carnegie Mellon University. “There are different tools, models, and vector databases evolving, and new papers coming out, which makes it very challenging for a company. They need stability. Tell me what to do for the next three months; don’t change everything every two weeks.”

Due to the complexity of this challenge, plus the cost involved and the expertise required, only 10% of organizations were actually able to launch gen AI models into production last year, according to findings released by Intel’s cnvrg.io in December.

But that doesn’t mean enterprises should just wait for things to settle down. To help regain some initiative, there are best practices that can be applied now to start building gen AI platforms — practices that will allow them to adapt quickly as the technology changes, including building robust and modern data and API infrastructures, creating an AI abstraction layer between their enterprise applications and the AI models they use, and setting up security and cost policies, usage guardrails, and ethics frameworks to guide how they deploy gen AI.

Data and API infrastructure

“Data still matters,” says Bradley Shimmin, chief analyst for AI platforms, analytics, and data management at London-based independent analyst and consultancy Omdia. Yet according to the IBM survey, data complexity was the second biggest barrier to adoption after lack of expertise, while the cnvrg.io survey said infrastructure was the single biggest challenge for companies looking to productionize large language models (LLMs).

Another setback is enterprises unable to keep up with business demands due to inadequate data management capabilities. An overriding issue, though, is that most organizations don’t have a plan, says Nayur Khan, a partner at McKinsey & Company. “They try to do something and see what sticks.” But with gen AI models being delivered as a service, in the form of, say, OpenAI APIs, there are use cases where companies can skip right ahead to deploying AI as a service.

“Now it becomes a service I can call and I don’t have to worry about training,” says Khan. “That’s fine, but language models are great for language. They’re not great for knowledge.” Knowledge sits inside organizations, he says.

A retail company, for example, might have a 360-degree view of customers, which is all fed into analytics engines, machine learning, and other traditional AI to calculate the next best action. Then the gen AI could be used to personalize the messages to those customers. So by using the company’s data, a general-purpose language model becomes a useful business tool. And everyone is trying to build these types of applications.

“I’m seeing it across all industries,” says Khan, “from high tech and banking all the way to agriculture and insurance.” It’s forcing companies to move faster on the digital front, he adds, and fix all the things they said they were going to do but never got around to doing.

And not only do companies have to get all the basics in place to build for analytics and MLOps, but they also need to build new data structures and pipelines specifically for gen AI.

When a company wants to fine-tune a model or create a new one in a particular subject area, it requires data architecture, critical choices about which model or type of model to pursue, and more. “It quickly adds up in complexity,” says Sheldon Monteiro, EVP at Publicis Sapient, a global digital consultancy.

Even a simpler project, like adding an external data source to an existing gen AI model, requires a vector database, the right choice of model, and an industrial-grade pipeline.

But it all begins with data, and it’s an area where many companies lag behind. Without a single and holistic strategy, every department will set up its own individual solutions.

“If you do that, you’ll end up making a lot more mistakes and re-learning the same things over and over again,” says Monteiro. “What you have to do as a CIO is take an architectural approach and invest in a common platform.”

Then there’s the hard work of collecting and prepping data. Quality checks and validation are critical to create a solid base, he says, so you don’t introduce bias, which undermines customers and business.

So if a particular data set excludes the highest-value transactions because those are all handled manually, then the resulting model could potentially have a bias toward smaller, less profitable business lines. Garbage in, garbage out applies to the new era of gen AI as much as it did in previous technological periods.

For companies that have already invested in their data infrastructure, those investments will continue to pay off into the future, says Monteiro. “Companies that invested in data foundations have a tremendous head start in what they’re doing with generative AI,” he says.

Still, these traditional data foundations originally designed for advanced analytics and machine learning use cases only go so far.

“If you want to go beyond the basics, you’ll need to understand some of the deeper subtleties of generative AI,” says Omdia’s Shimmin. “What’s the difference between different embedding models, what is chunking, what is overlap? What are all the different methodologies you can use to tokenize data in the most efficient way? Do you want high or low dimensionality to save space in your vector database? The MLOps tools we have weren’t built to do that. It’s all very complicated and you can waste a lot of time and money if you don’t know what you’re doing.”

But MLOps platforms vendors are stepping up, he says. “Companies like Dataku, DataRobot, and Databricks all have retooled to support LLMOps or GenAIOps. All the little pieces are starting to come into place.”

Analyzing the abstraction layer

Last November, OpenAI, the go-to platform for enterprise gen AI, unexpectedly fired its CEO, Sam Altman, which set off a circus-like scramble to find a new CEO, staff threatening to walk out, and Microsoft offering to take everyone in. During those tumultuous days, many companies using OpenAI’s models suddenly realized they put all their eggs into one unstable basket.  

“We saw a lot of OpenAI integrations,” says Dion Hinchcliffe, VP and principal analyst at Constellation Research. “But the whole management issue that happened with OpenAI has made people question their over-commitment.”

Even if a company doesn’t go out of business, it might quickly become obsolete. Early last summer, ChatGPT was pretty much the only game in town. Then Facebook released Llama 2, free for most enterprise customers followed by Anthropic’s Claude 2, which came out with a context window of 200,000 tokens — enough for users to cut-and-paste the equivalent of a 600-page book right into a prompt — leaving GPT-4’s 32,000 tokens in the dust. Not to be outdone, however, Google announced in February its new Gemini 1.5 model can handle up to 10 million tokens. With that, and greater speed, efficiency and accuracy across video, audio, and written copy, there were virtually no limits.

The number of free, open-source models continues to proliferate, as well as industry-specific models, which are pre-trained on, say, finance, medicine or material science.

“You’ve got new announcements every week, it seems,” says Publicis Sapient’s Monteiro.

That’s where a “model garden” comes in, he says. Companies that are disciplined about how they select and manage their models, and architect their systems so models can be easily swapped in and out, will be able to handle the volatility in this space.

But this abstraction layer needs to do more than just allow a company to upgrade models or pick the best one for each particular use case.

It can also be used for observability, metering, and role-based access controls, says Subha Tatavarti, CTO at technology and consulting firm Wipro Technologies.

Wipro, with 245,000 employees, has no choice but to adopt gen AI, she says, because its customers are expecting it to.

“We’re foundationally a technology company,” she says. “We have to do this.”

Broadening perspectives

Observability allows a company to see where data is going, what models and prompts are being used, and how long it takes for responses to come back. It can also include a mechanism to edit or obfuscate sensitive data.

Once a company knows what’s happening with its models, it can implement metering controls — limits on how much a particular model can be used, for example — to avoid unexpected spikes in costs.

“Right now, the way the metering works is the token consumption model,” Tatavarti says. “And it could get very expensive.”

In addition, for FAQs, companies can cache responses to save time and money. And for some use cases, an expensive, high-end commercial LLM might not be required since a locally-hosted open source model might suffice.

“All of that is fascinating to us and my team is definitely working on this,” she adds. “This is imperative for us to do.”

And when it comes to access controls, the fundamental principle should be to never expose native APIs to the organization but instead have a middle layer that checks permissions and handles other security and management tasks.

If, for example, an HR platform uses gen AI to answer questions based on a vector database of policies and other information, an employee should be able to ask questions about their own salary, says Rajat Gupta, chief digital officer at Xebia, an IT consultancy. But they shouldn’t be able to ask questions about those of other employees — unless they’re a manager or work in HR themselves.

Given how fast gen AI is being adopted in enterprises across all different business units and functions, it would be a nightmare to build these controls from scratch for every use case.

“The work would be enormous,” he says. “There’d be chaos.”

Gupta agrees enterprises that need to build this kind of functionality should do so once and then reuse it. “Take everything they need in common — security, monitoring, access controls — and build it as part of an enterprise-level platform,” he says.

He calls it an AI gateway, with the open source MLflow AI Gateway being one example. Released last May, it’s already been deprecated in favor of the MLflow Deployments Server. Another tool his company is using is Arthur AI’s Arthur Shield, a firewall for LLMs. It filters prompt injection attacks, profanity, and other malicious or dangerous prompts.

And then there’s Ragas, which helps check a gen AI response against the actual information in a vector database in order to improve accuracy and reduce hallucinations.

“There are many such projects both in the open source and the commercial space,” he says.

Third-party AI platforms, startups, and consultants are also rushing in to fill the gaps.

“The way the AI ecosystem is evolving is surprising,” says Gupta. “We thought the pace would slow down but it’s not. It’s rapidly increasing.”

So to get to market faster, Xebia is weaving these different projects together, he says, but it doesn’t help that AI companies keep coming up with new stuff like autonomous AI-powered agents, for example.

“If you’re using autonomous agents, how do you actually measure the efficacy of your overall agents project?” he asks. “It’s a challenge to actually monitor and control.”

Today, Xebia hobbles agents, curtailing their autonomy and allowing them to carry out only very limited and precise tasks. “That’s the only way to do it right now,” he adds. “Limit the skills they have access to, and have a central controller so they’re not talking to each other. We control it until we have more evolved understanding and feedback loops. This is a pretty new area, so it’s interesting to see how this evolves.”

Building guardrails

According to the cnvrg.io survey, compliance and privacy were top concerns for companies looking to implement gen AI, ahead of reliability, cost, and lack of technical skills.

Similarly, in the IBM survey, for companies not implementing gen AI, data privacy was cited as the barrier by 57% of respondents, and transparency by 43%. In addition, 85% of all respondents said consumers would be more likely to pick companies with transparent and ethical AI practices, but fewer than half are working toward reducing bias, tracking data provenance, working on making AI explainable, or developing ethical AI policies.

It’s easy for technologists to focus on technical solutions. Ethical AI goes beyond the technology to include legal and compliance perspectives, and issues of corporate values and identity. So this is an area where CIOs or chief AI officers can step up and help guide the larger organizations.

And it goes even further than that. Setting up gen AI-friendly data infrastructures, security and management controls, and ethical guide rails can be the first step on the journey to fully operationalize LLMs.

Gen AI will require CIOs to rethink technology, says Matt Barrington, EY Americas emerging technologies leader. Prior to gen AI, software was deterministic, he says.

“You’d design, build, test, and iterate until the software behaved as expected,” he says. “If it didn’t, it was a bug, and you’d go back and fix it. And if it did, you’d deploy it into production.” All the large compute stacks, regardless of software pattern, were deterministic. Now, other than quantum computing, gen AI is the first broadly known non-deterministic software pattern, he says. “The bug is actually the feature. The fact it can generate things on its own is the main selling point.”

That doesn’t mean the old stuff should all be thrown out. MLOps and Pytorch are still important, he says, as is knowing when to do a RAG embedding model, a DAG, or go multi-modal, as well as getting data ready for gen AI.

“All those things will remain and be important,” he says. “But you’ll have the emergence of a new non-deterministic platform stack that’ll sit alongside the traditional stack with a whole new area of infrastructure engineering and ops that will emerge to support those capabilities.”

This will change how businesses operate at a core level, and moving in this direction to become a truly AI-powered enterprise will be a fast-paced shift, he says. “Watching this emerge will be very cool,” he says.