Friday, June 15, 2018

Azure Monitor - Alerting Gets an Upgrade

Earlier this week, Microsoft announced some upgrades to the alerts experience inside Azure Monitor and if you've ever worked with SCOM, then a few of these changes will have a pretty familiar look about them.


New Alert Enumeration Experience
There's a new Alert Enumeration feature which delivers a centralized view of all the alerts that have occurred across your various Azure deployments. You can query alerts across multiple subscriptions and sort them based on severity, signal types, resource type, and even resolution state. The enhanced alert enumeration feature is a serious upgrade on the previous Azure Monitor Alerts experience shown in the following image...


To upgrade to the new feature, click the purple banner at the top of the old Monitor - Alerts view and you will be presented with the following new enhanced user interface...


When you've upgraded, the first thing you will notice (assuming you've already got a few alerts present across your subscriptions), is that Azure Monitor has gathered all of your alerts into a central view and sorted them by Severity.

Now, if you've used SCOM Alert Rules in the past, you'll be familiar with Microsoft's method of defining severity levels using integers (where Critical = 2, Warning = 1 and Informational = 0). In Azure Monitor, Microsoft use a similar mapping process however, the lower numbered severity is the most important (which is the opposite to SCOM). You can read more about the exact Azure Monitor Alert Severity Mappings in my previous blog post here.

Clicking on any of the Severity links will then pivot you into the All Alerts page with a filter that's scoped to that particular severity.


Additional filters can then be applied to scope the view even further with options such as subscriptions, resource groups, time range and conditions to choose from.

Alert State Management

The next addition to Azure Monitor alerting is the new Alert State Management feature. These are essentially very similar to SCOM Alert Resolution States and in Azure Monitor, three alert resolution states are currently supported - New, Acknowledged and Closed.

You can manage the alert resolution state by drilling into an alert in the All Alerts view and clicking the Change Alert State button shown in the following image...


From there, you can use the drop-down menu to change the alert resolution state from New to either Acknowledged or Closed as shown here..


After that, you have the option to add a comment as to why you're changing the resolution state before then returning to the All Alerts view - where you should see the new Alert Resolution State assigned to your alert.

If you need to bulk-edit the resolution state of a number of alerts, then Microsoft have made this easy for you too. All you need to do is select each of the alerts that you need to modify, then hit the Change State button as shown in the following image...


Then modify your resolution state, add your comment and hit OK to return to the All Alerts view. Alert resolution states should now be easy to identify for all alerts that you've modified.

Something to keep in mind when working with these new Alert States is that they are completely separate from the Monitoring Condition - which supports two values - Fired and Resolved.  The Monitoring Condition indicates whether or not the condition that created a metric alert has subsequently been resolved.

To define the Monitoring Condition, the metric alert rules sample a particular metric at regular intervals and if the criteria in the alert rule is met, then a new alert is created with a condition of Fired. When the metric is sampled again and if the criteria is still the same, then nothing happens. However, if the criteria is not met, then the condition of the alert is changed to Resolved. The next time that the criteria is met, then a new alert is created with a condition of Fired.

Putting my SCOM hat back on again, the Monitoring Condition is a similar process to how SCOM Alert Monitors fire when a specific threshold is breached and then auto-close when that threshold is no longer breached.

One gotcha that might catch people out however, is that even though the system may set the Monitor Condition to Resolved, the alert state isn't changed until the user changes it manually and vice-versa. For example, if I modify an alert resolution state for a number of alerts and I set the resolution state to Closed, the Monitoring Condition will still show that the alert is still in a Fired state. The following image shows this exact scenario - where I've set the resolution state of a couple of my alerts to Closed, but as the metric that fired the alert in the first place is still present, the alerts are still displaying a Monitoring Condition of Fired.


Smart Groups

The final new alerting feature that I wanted to post about is Smart Groups. These contain alerts that were automatically grouped together based on either similarity, historical patterns or a combination of both. Smart Groups are automatically created using machine learning algorithms looking for similarity and co-occurrence patterns among alerts originating from a monitor service such as Log Analytics or across the rest of the Azure platform.

There's a couple of ways that you can view/access Smart Groups. The first method is to simply click the Smart Groups button from the All Alerts view in the new Alert Enumeration feature shown here...


The second method is to open the All Alerts view then click the blue banner as shown in this image...


Using Smart Groups, you can significantly reduce the number of alerts to analyze by focusing on only a handful of groups with some handy alert correlation in place.

As an example, if a performance counter such as CPU or RAM spikes on multiple virtual machines in your Azure subscription at the same time, this will generate a lot of alerts in Azure Monitor. When you click the Smart Groups feature, those alerts will get automatically grouped into a single Smart Group - offering up a much clearer picture of a common root cause.

In the following image, you can see a Smart Group that Azure Monitor has automatically created in my subscription where it has correlated 25 alerts together based on the reason that they are very similar to other alerts that have fired. From here, I can change the alert resolution state of individual alerts or I can use the Change Smart Group State button to change the resolution state of all alerts contained in the group.


Microsoft kicked the tires with alert correlation in SCOM when they released the Exchange 2010 management pack a few years ago and although it was quite noisy, the event correlation engine it came with was a similar concept to what we now have with Smart Groups. I think this is a pretty handy feature to have in your Azure monitoring toolbox and along with all the other features that have just launched, things are looking good for the next generation of Microsoft monitoring!



No comments:

Post a Comment