Today, most enterprises use services from more than one Cloud Service Provider (CSP). Getting operational visibility across all vendors is a common pain point for clients. Further, modern architecture such as a microservices architecture introduces additional operational complexity.

Figure 1 Hybrid Multicloud and Complexity Evolution

Traditionally this calls for more manpower. But this traditional approach introduces more challenges. As shown in the following diagram, an issue in the environment triggers several events across the full stack of the business solution. This results in an unmanageable event flood. Moreover, there are often duplicate events due to full-stack level observability and these events result in data silos.

Figure 2 IT Service Management Complexity

IT is a critical part of every enterprise today, and even a small service outage directly affects the top line. Consequently, it is not uncommon for clients to ask for a 30-minute resolution commitment when something goes wrong. This is usually not enough time for a human to resolve an issue.

What is the solution?

This is where AIOps comes to the rescue, preventing these issues before they occur. AIOps is the application of artificial intelligence (AI) to enhance IT operations. Specifically, AIOps uses big data, analytics, and machine learning capabilities to do the following:

  • Collect and aggregate the huge and ever-increasing volumes of operations data generated by multiple IT infrastructure components, applications and performance-monitoring tools
  • Identify significant events and patterns related to system performance and availability issues
  • Diagnose root causes and report them to IT for rapid response and remediation, or automatically resolve these issues without human intervention

By replacing multiple manual IT operations tools with an intelligent, automated platform, AIOps enables IT operations teams to respond more quickly and proactively to slowdowns and outages, with less effort. It bridges the gap between an increasingly difficult-to-monitor IT landscape and user expectations for little to no interruption in application performance and availability. Most experts consider AIOps the future of IT operations management.

How could we reimagine cloud service management and operations with AI?

Refer to the lower part of the diagram below (box 3: Environment), which represents the environments where the workloads run. Continuous releases and deployments of these applications are typically achieved through the continuous delivery process and tooling that is shown on the left side of the diagram (box 2: Continuous Delivery).

Figure 3 AI Infused DevSecOps and IT Control Tower

The applications continuously send telemetry information into the operational management tooling (box 4: Continuous Operations). Both the continuous delivery tooling and the continuous operations tooling ingest all the data into the AIOps engine shown at the top (box 7: AIOps Engine). The AIOps engine is focused on addressing four key things:

  1. Descriptive analytics to show what happened in an environment
  2. Diagnostics to show why it happened
  3. Predictive analytics to show what will happen next
  4. Prescriptive analytics to show how to achieve or prevent the prediction

In addition to this, enterprise-specific data sources such as a shift roster, SME skill matrix or knowledge repository enrich the AIOps engine (box 1: Enterprise specific data).

Additionally, the AIOps engine consumes public domain data such as open-source communities, product documentations and sentiments from social networks (box 6: Public domain content). ChatOps and Runbook Automation ingest the insights and the automation that the AI system produces and leverage it to establish the new day in the life of an incident (box 5: Continuous Operations). ChatOps brings humans and chatbots for conversation-driven collaboration or conversation-driven DevOps. Additionally, the AIOps engine also dynamically reconfigures the DevSecOps tools, providing continuous delivery and continuous operations through AI-derived policy ingestion.

Several products in the marketplace have already evolved to provide AIOps capabilities such as an anomaly detection feature. This framework consumes the outcomes provided by these AIOps engines (denoted as edge analytics in Figure 3) and combines multiple sources to provide an enterprise-level view.

IT processes such as incident/problem-resolution processes are ad hoc in nature. They differ greatly from structured business processes such as loan approval processes or claim settlement processes. IT processes have stringent SLAs due to the high cost of outage to the business, and the persona involved collaborate intensely and interact with disparate tools to accomplish their goals. Applying business process automation technologies to IT processes will not yield high productivity benefits. ChatOps have transformed the way ITOps teams collaborate to resolve IT incidents. AIOps and ChatOps are the appropriate tools to drive productivity in IT processes. ChatOps enhances the collaboration experience of SRE with other personas participating in IT processes. AIOps delivers insights for SRE to accelerate incident resolution process.

In a nutshell, as clients undertake large digital transformation programs based on a hybrid cloud (or multicloud) architecture, IT Operations needs to be reimagined. With ever increasing complexity, AIOps is indispensable. To know more about AI for IT Operations and IBM PoV, refer to IBM Consulting.

Was this article helpful?
YesNo

More from Analytics

How the Recording Academy uses IBM watsonx to enhance the fan experience at the GRAMMYs®

3 min read - Through the GRAMMYs®, the Recording Academy® seeks to recognize excellence in the recording arts and sciences and ensure that music remains an indelible part of our culture. When the world’s top recording stars cross the red carpet at the 66th Annual GRAMMY Awards, IBM will be there once again. This year, the business challenge facing the GRAMMYs paralleled those of other iconic cultural sports and entertainment events: in today’s highly fragmented media landscape, creating cultural impact means driving captivating content…

How data stores and governance impact your AI initiatives

6 min read - Organizations with a firm grasp on how, where, and when to use artificial intelligence (AI) can take advantage of any number of AI-based capabilities such as: Content generation Task automation Code creation Large-scale classification Summarization of dense and/or complex documents Information extraction IT security optimization Be it healthcare, hospitality, finance, or manufacturing, the beneficial use cases of AI are virtually limitless in every industry. But the implementation of AI is only one piece of the puzzle. The tasks behind efficient,…

IBM and ESPN use AI models built with watsonx to transform fantasy football data into insight

4 min read - If you play fantasy football, you are no stranger to data-driven decision-making. Every week during football season, an estimated 60 million Americans pore over player statistics, point projections and trade proposals, looking for those elusive insights to guide their roster decisions and lead them to victory. But numbers only tell half the story. For the past seven years, ESPN has worked closely with IBM to help tell the whole tale. And this year, ESPN Fantasy Football is using AI models…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters