Shwetha Gopalakrishna
2 min readMar 15, 2022

--

MyShortWrite: AIOPS for Dynamic Alert Generation by setting run time threshold based on Multitude of direct/indirect alert attributes

Best managed service on cloud needs two things
1. A smart alerting system
2. Best business model
So in AIOps, one applies smart algorithms (powered by ML & AI) to handle IT operations by services to let engineers focus on business problems while machines can handle operational issues. These operational issues are both direct and indirect attributes that influence an alert. Alerts are in general from a predefined threshold. So here is a system, which decides the threshold in runtime based on various patterns and attributes that influence an alert.

Here is a proposal for the solution
A system that decides threshold in runtime based on various patterns and attributes that influence an alert.
A system enabling proactive detection of outages, early warning of relevant alerts, and reduced actionable workloads DevOps teams. The capabilities of the AIOps platform enable customers to become more agile and reduce business risks.

Business availability through dynamic alert management, root cause analysis, proactive anomaly detection, and predictive capabilities.

Continuous Service Promise is our only SWAG. Alerts leave us with hiccups, and customer tickets give us goosebumps. And when employees are constantly fixing emergencies, agility suffers and innovation topples upside down. We need seamless operations in order to grow.

Data are obtained from Different Incident and Alert Management DevOps Tools integrated with each cloud-managed service

Direct Attributes :

•Time of alert — Multiple sub-attributes (time of day, day of week, week of month, month of year)

•Duration of alert

•Virtualization Type of machine — Multiple sub-attributes (plan, network plan, size)

Indirect Attributes :

•Other alerts on the machine around the same time

•Other activities on the machine around alert (update/network etc)

  • Same alert on other machines around the same time

It's achieved by :
Dynamic Alert Generation by setting run time threshold based on Multiple direct/indirect alert attributes by applying ML to drive AI-OPS

Then system to apply ML to classify the inputs as direct or indirect attributes and find patterns that will influence the threshold

Next, assign weightage to change the threshold runtime and decide how often an alert that is open should be re-triggered to bump up attention to it

Finally, Override the default threshold Using priority or previous resolution times of similar alerts, bump up attention and henceforth close similar or duplicate sounding alerts based on an older threshold.

Also, Refer :
https://www.ibm.com/in-en/cloud/learn/aiops
https://www.ibm.com/cloud/aiops/

--

--