A few months before the Christmas break, I had the pleasure of being invited over to the excellent
SCOM Day event in Sweden to present a session and hang out with some of my friends from the Scandinavian region.
The event was organised by
Approved Consulting in Gothenburg and the target audience had a mix of IT administrators, consultants and senior IT managers. This was my first-time visiting Sweden and from the venue, to the food, the craft beers and of course, the people, it was a really enjoyable experience.
While I was over there, I had the chance to sit down with Approved CEO
Jonas Lenntun and go through some of the solutions they offer to complement System Center and OMS. I was already aware of the free community SCOM Health Check Report they released a couple of years ago (if you haven’t tried this out yet, then download it from
here):
Free solutions like this for SCOM are always good and the Health Check Report delivers an excellent overview of the health of your SCOM deployments - showing you information about the top alerts, events, performance counters, discoveries and even state changes along with database space usage and grooming history.
IT Service Analytics from Approved
Another cool solution that Jonas and the guys have been working on is their new
IT Service Analytics platform. This plug and play solution enables organisations to analyse their IT services being monitored with SCOM and then forecast potential issues – well before they occur. If you’ve deployed Service Manager (SCSM) or even Microsoft’s new Operations Management Suite (OMS), then the IT Service Analytics platform can pull data from any combination of SCOM, SCSM and OMS to give you an even deeper analysis of your IT estate.
Here’s an overview taken from their blog on how it works:
By optimizing and combining data from System Center Operations Manager, Microsoft OMS and System Center Service Manager into one holistic data model, you are able to put the IT service in focus. This allows you to extract, correlate and predict information about IT Service Management processes for things like event, capacity, availability, incident and change management.
We utilize most of the Microsoft Business Intelligence tools, such as SQL Server, SSIS, SSAS, R and SSRS. This allows our analytical platform to seamlessly blend with your System Center installation and tap software and hardware resources that are readily available.
Taking it for a Test Drive
Earlier this week I had a chance to take the IT Analytics platform for a test drive and my first impression is that it’s an awesome reporting tool to have in your locker to help with troubleshooting and predictive analysis.
From the home screen, you can choose from a wide range of pre-built reports with information about alerts, capacity management, events, configuration changes and IT service overviews to name just a few.
One of the reports I really like is the
Services report. Clicking this tile from the main reports window brings me to the
Service Overview shown in following image:
This report gives me a 30-day availability overview of all the IT services that I have modelled and monitored in my SCOM environment along with information about alerts, change tracking, capacity and predictive event risks.
Here’s a description of what the information in each of the report columns mean:
- Goal – Has the SLA goal been met or not? IT Services that have met their SLA will be displayed as green instead of red (in this demo environment, I’ve sorted the column to display all SLA’s that haven’t been met).
- Service – The name of the IT service.
- Availability – Displays the last 12 months of the IT service availability.
- Percentage – The SLA percentage that has been reached. The upwards arrow means that the SLA has reached a better result than the previous month.
- Failures – The number of outages for the service during this period.
- Downtime – Displays the number of minutes the service has been unavailable for the month.
- Alerts – The number of alerts that have been generated by the service during this defined report period. The arrow shows decreasing or increasing compared to last month.
- Events – The number of events that have been generated by the service during this period. The arrow shows decreasing or increasing compared to last month.
- Change Tracking – The amount of changes made to servers or other components of the service.
- Capacity Risks – Shows if there are risks with capacity, such as a server running out of free memory based on the usage.
- Event Risks – Shows if there are any predicted events for the service.
Identifying Bottlenecks
When I drill into a particular IT Service from the Service Overview report, I get a more targeted Service Details report with a number of informational tiles and a Top N view of common KPI’s like % CPU, % Memory and % Disk Space used.
The Bottlenecks tile sparked my interest here so I clicked this one first…
This brought me deeper to the following view – where I could see that two of my servers in this IT service were displaying potential bottlenecks.
Clicking into the server with two potential bottlenecks identified, I was then presented with a performance chart that showed a very high percentage of bandwidth used on a new network adapter we recently installed into the server to support DPM backups. The performance chart also confirms for me that although my network adapter spiked on and off for the past few days (no doubt when backup jobs are running), the overall average performance of it seems fine and it’s projected to stay around the 10% utilisation mark for the next few months.
The other potential bottleneck that was identified relates to the % Free Disk Space of a logical disk on the Hyper-V server. I can see from the chart that in the past year, the free disk space on this logical disk has fluctuated from approx. 30% free to a minimum value of less than 1%. The chart looks ahead a few months and predicts that the best I can hope for (assuming I leave things as they are) is no more than 7% free disk space.
Predictive Alerts
Back at the Service Details report, I can click the Events tile shown in the image below to give me an Events Report with a heads-up on the forecasted events and alerts that are likely to occur in my environment within the next 24 hours.
All Alert and Event reports have built-in filters for every chart to give you a more scoped analysis view of what's going on. From the Event Report shown in the image below, I can see there are some predicted alerts and events that I need to pay attention to.
Drilling further into the predicted alert value for a particular monitored object, I’m presented with a ‘IIS 8 Web Server is unavailable’ alert that´s been predicted and the amount of times it has happened over the last month. I can see the time of day the alerts usually show up. In this example, these alerts typically occur around 6am every day.
If I go back to the previous view and click into the Events tile, I can see it’s broken down into three sections.
The first section is a summary where you can see information on the top hosts, data channels, rules, management packs etc. which are generating the most events. In the image below, we can see that the server generating the most events is
SEGOTSQL01. The grey bar in the middle displays last month´s value. You can also see that this server alone has generated 88% of all events for the current period.
The middle section of this report displays the time and day of the week that the events are generated.
The final section of this report gives us an insight into both the last 30 days and the last 12 months for how events are being generated.
Custom Reports
It's easy to create your own custom reports and you can export them to PowerBi or Microsoft Excel in a matter of minutes. Here's a nice example of one-such custom exported report...
Licensing
I mentioned earlier that I love free solutions for SCOM and when I quizzed Jonas on how much this awesome offering costs to license, I was delighted to hear that Approved have decided to release it for free! They do require a one-off nominal setup and training fee but aside from that, there's no other limitations on the platform.
Summary
If you're interested in deploying these free solutions into your SCOM environment, then use the contact info
here to get in touch with the team at Approved. For more information on the IT Analytics platform, take a read of some blog posts written by well known SCOM community blogger Daniel Ă–rneling
here and
here.