The role of AI in IT Operations

Krish
AI Sutra
Published in
5 min readFeb 21, 2018

--

Autonomic computing is not new. It has been in vogue since 2001 after IBM talked about it. But it has been in the sidelines ever since, as a research project with little mainstream attention. Industry conversation has centered on automation with cloud as the underlying fabric. Even though analytics has been playing a critical role in smoothening the IT operations, heavy reliance on humans at the intersection of analytics and operations is still causing outages causing disruptions and financial loss. Think about the AWS S3 outage caused by humans where a small mistake in the input causing large-scale outage of their services or British Airways outage due to a human switching off the servers too quickly, the cost of such mistakes are enormous. As long as humans are involved in critical operations tasks, such costly mistakes will happen even in the future.

I am not arguing for the removal of humans from operations. First, it is not possible with today’s technologies and human beings are essential to handle the machinery behind capitalism. Second, it is inhuman to replace human beings with machines in large scale before a credible socio-economic system is put in place to support such a switch (a topic beyond this publication). However, in today’s enterprise IT, with all the automation in place, humans still hold the responsibility for critical operations with machines analytics playing a supporting role. With further maturation of technologies using machine learning or deep learning, we may be in a position to put autonomic systems in place to handle critical operations with humans playing supporting roles. Autonomic systems are not the magic pill and they can have their own emergent problems (for e.g., think about some catastrophic failures of autonomic trading in the wall street) and it is critical for us to build necessary safety nets to handle such scenarios. But, artificial intelligence + automation holds promise for removing humans out of the critical IT operations in the future.

Let us be clear here. The premise of this argument is not “No humans in operations” but, rather, it is about using autonomic systems to let operations teams handle systems at large scales. It is about empowering them to do operations at a scale that is not possible even in today’s automation driven IT. It is not just about scale but also injecting resiliency in operations by using a “learning system” as the nerve center of the automation. With the digitization of the world, the need for “operations skills” is not going away but autonomic systems can help smaller teams manage large scale distributed systems without talent shortages impacting the organizations. The abstraction of complexities through automation and machine intelligence puts human beings at supervisory roles than being woken up at 3 AM to manage an immediate crisis in production systems.

We have a long way to go in the maturity curve before the scenario described above becomes a reality but the industry is taking baby steps towards this future. Even at this early stage of the evolution, machine learning holds lots of promise.

  • The insights offered by analytics tools can be more personalized than delivering generic insights driven by a set of rules
  • Analyzing logs can uncover patterns missed by human operators or even pattern matching done with a rule-set
  • Alerting can be more targeted by eliminating all the false alerts that impacts the health of human operators
  • Security can be done in a more proactive fashion than a reactive approach
  • Detection of problems can be more proactive and issues can be caught much early by letting the learning systems to be part of the root cause analysis

We are just scratching the surface and with more integration of machine learning into an existing set of tools and new ones, the face of the operations will differ from what we see today.

In this section, I will highlight efforts by various vendors in fusing AI with automation. This is not an exhaustive list by any means but it will give readers a flavor for what they can expect in the coming decade. If you are a vendor in this space tapping AI to streamline operations, please contact us to set up a briefing.

These are just some examples of available products in the industry and most of them are standalone ML or AI enabled products. These products do not fit into Industry autonomic computing but more innovation along product lines will lead the industry towards autonomic computing. Most of the web scale providers like Amazon Web Services, Microsoft Azure, Google Cloud are using ML and AI for everything from data center managements to systems management and monitoring. Oracle has announced that they will integrate ML and AI capabilities into their cloud platforms. It is still early days but watch out for more innovation and maturity in this field which will lead to a day where modern enterprises can use autonomic systems to manage their IT operations with humans playing supervisory roles. Even though this future is still 5–8 years away, it is critical for CIOs to consider this as they plot their modernization strategy.

Originally published on this blog

Disclosure: CloudFabrix is a client of Rishidot Research

--

--

Future Asteroid Farmer, Analyst, Modern Enterprise, Startup Dude, Ex-Red Hatter, Rishidot Research, Modern Enterprise Podcast, and a random walker