Tuesday, February 28, 2017

SCOM 2016 - The Curious Case of the Missing Agent Patch List Property and Static Agent Version Value

Last week Microsoft released the second update rollup (UR2) for SCOM 2016 and a common trend I've noticed with these UR's is that the Patch List property is missing from the Agents by Version view in the Monitoring workspace of the console.


This is a bug with the SCOM 2016 agent and a bit of an annoyance when deploying update rollups as it's handy to know which agents need to be upgraded and which ones don't.

A quick check in the Agent Managed view of the Administration workspace will show a version for the agent but this version won't update to any new UR versions. The following image shows the default SCOM 2016 agent version even though I've deployed UR1 to this environment months ago...


Now, if you're thinking that after an update, all agents always drop into the Pending Management view of the Administration workspace and patiently wait until you're ready to upgrade them, then you'd be wrong. Unfortunately, depending on how you deploy the update rollup (e.g. non-admin permissions, manually installed etc.), there's a good chance that some if not all of these agents will not appear in Pending Management and you'll end up with something similar to this...


So, now your only option in the console to upgrade the agents is to run a series of bulk Repair jobs from the Agent Managed view on all of them and then hope for the best that all agents have been successfully upgraded. This is not a fun process and I really don't like not having a central view of all my agent versions direct in the console.

Thankfully Microsoft's Kevin Holman (SCOM Deity and all-round awesome community contributor) has created the new SCOM Agent Version Addendum Management Pack to help address this exact problem!

This management pack runs a script that disables the built in discovery for Microsoft.SystemCenter.DiscoverHealthServiceProperties (which has a display name of 'Discover Health Service Properties') and replaces it with a new discovery that attempts to retrieve the actual update rollup Agent Version value from a DLL file in the agent installation path.

Straight after I import this new MP, my agent version in the Agent Managed view changes to reflect the existing agent versions (the 8.0.10931.0 version shows the UR1 agents that I currently have running) and after I've deployed UR2,  I can select those agents for a Repair job as shown in the image below...


When the Repair job has completed, the version changes to show that my agents have now been updated to UR2 as shown here:


I love this MP as it adds some much needed functionality to the Agent Managed view within the console. An extra bonus is that this MP also works perfectly on SCOM 2012 R2 too!

If you want to know more, check out Kevin Holman's blog post here and you can download it directly from the TechNet Gallery here.

Enjoy!

Wednesday, February 22, 2017

SCOM 2016 Update Rollup 2 (UR2) Now Available

Today, Microsoft released a new Update Rollup (UR2) for SCOM 2016.


This update contains twenty documented fixes with the following few of particular interest to me (based on what I've come across on customer sites so far):

  • When alerts are closed from the Alerts view after you run a Search, the closed Alerts still appear in the View when the Search is cleared.
  • Groups disappear from Group view after they are added to a Distributed Application.
  • When the maintenance mode option for the dependency monitor is set to “Ignore,” and the group (consisting of the server to which this dependency monitor is targeted) is put in Maintenance mode, the state of the monitor changes to critical and does not ignore maintenance mode.
  • Because of a rare scenario of incorrect computation of configuration and overrides, some managed entities may go into an unmonitored state. This behavior is accompanied by 1215 events that are written to the Operations Manager log.

You can see the full list of fixes from the official UR2 knowledge base article here.

To get access to this update, you can choose to either manually download it from the Microsoft Update Catalog here or you can use Windows Update to pull down the update automatically to your SCOM 2016 environment.

Whatever method you choose to deploy this update, make sure to read through the full installation instructions as there are some manual tasks to carry out once the update has been applied to each SCOM role and if you're not confident, I'd always recommend waiting for Microsoft's Kevin Holman to add his walk-through post for this UR to his blog here.

Finally, this update is one part of a larger UR2 release for covering other products in the System Center 2016 suite. If you've deployed additional components of the suite alongside SCOM, then you might be interested to check out the updates now available for DPM 2016, SCSM 2016, SPF 2016 and SCVMM 2016.

Full details of all the fixes in the main System Center 2016 UR2 downloads can be viewed at:



Tuesday, February 14, 2017

Scandinavian SCOM Solutions with a Global Reach

A few months before the Christmas break, I had the pleasure of being invited over to the excellent SCOM Day event in Sweden to present a session and hang out with some of my friends from the Scandinavian region.


The event was organised by Approved Consulting in Gothenburg and the target audience had a mix of IT administrators, consultants and senior IT managers. This was my first-time visiting Sweden and from the venue, to the food, the craft beers and of course, the people, it was a really enjoyable experience.

While I was over there, I had the chance to sit down with Approved CEO Jonas Lenntun and go through some of the solutions they offer to complement System Center and OMS. I was already aware of the free community SCOM Health Check Report they released a couple of years ago (if you haven’t tried this out yet, then download it from here):


Free solutions like this for SCOM are always good and the Health Check Report delivers an excellent overview of the health of your SCOM deployments - showing you information about the top alerts, events, performance counters, discoveries and even state changes along with database space usage and grooming history.

IT Service Analytics from Approved

Another cool solution that Jonas and the guys have been working on is their new IT Service Analytics platform. This plug and play solution enables organisations to analyse their IT services being monitored with SCOM and then forecast potential issues – well before they occur. If you’ve deployed Service Manager (SCSM) or even Microsoft’s new Operations Management Suite (OMS), then the IT Service Analytics platform can pull data from any combination of SCOM, SCSM and OMS to give you an even deeper analysis of your IT estate.

Here’s an overview taken from their blog on how it works:

By optimizing and combining data from System Center Operations Manager, Microsoft OMS and System Center Service Manager into one holistic data model, you are able to put the IT service in focus. This allows you to extract, correlate and predict information about IT Service Management processes for things like event, capacity, availability, incident and change management.

We utilize most of the Microsoft Business Intelligence tools, such as SQL Server, SSIS, SSAS, R and SSRS. This allows our analytical platform to seamlessly blend with your System Center installation and tap software and hardware resources that are readily available.



Taking it for a Test Drive

Earlier this week I had a chance to take the IT Analytics platform for a test drive and my first impression is that it’s an awesome reporting tool to have in your locker to help with troubleshooting and predictive analysis.

From the home screen, you can choose from a wide range of pre-built reports with information about alerts, capacity management, events, configuration changes and IT service overviews to name just a few.


One of the reports I really like is the Services report. Clicking this tile from the main reports window brings me to the Service Overview shown in following image:


This report gives me a 30-day availability overview of all the IT services that I have modelled and monitored in my SCOM environment along with information about alerts, change tracking, capacity and predictive event risks.

Here’s a description of what the information in each of the report columns mean:

  • Goal – Has the SLA goal been met or not? IT Services that have met their SLA will be displayed as green instead of red (in this demo environment, I’ve sorted the column to display all SLA’s that haven’t been met).
  • Service – The name of the IT service.
  • Availability – Displays the last 12 months of the IT service availability.
  • Percentage – The SLA percentage that has been reached. The upwards arrow means that the SLA has reached a better result than the previous month.
  • Failures – The number of outages for the service during this period.
  • Downtime – Displays the number of minutes the service has been unavailable for the month.
  • Alerts – The number of alerts that have been generated by the service during this defined report period. The arrow shows decreasing or increasing compared to last month.
  • Events – The number of events that have been generated by the service during this period. The arrow shows decreasing or increasing compared to last month.
  • Change Tracking – The amount of changes made to servers or other components of the service.
  • Capacity Risks – Shows if there are risks with capacity, such as a server running out of free memory based on the usage.
  • Event Risks – Shows if there are any predicted events for the service.

Identifying Bottlenecks

When I drill into a particular IT Service from the Service Overview report, I get a more targeted Service Details report with a number of informational tiles and a Top N view of common KPI’s like % CPU, % Memory and % Disk Space used.

The Bottlenecks tile sparked my interest here so I clicked this one first…


This brought me deeper to the following view – where I could see that two of my servers in this IT service were displaying potential bottlenecks.


Clicking into the server with two potential bottlenecks identified, I was then presented with a performance chart that showed a very high percentage of bandwidth used on a new network adapter we recently installed into the server to support DPM backups. The performance chart also confirms for me that although my network adapter spiked on and off for the past few days (no doubt when backup jobs are running), the overall average performance of it seems fine and it’s projected to stay around the 10% utilisation mark for the next few months.


The other potential bottleneck that was identified relates to the % Free Disk Space of a logical disk on the Hyper-V server. I can see from the chart that in the past year, the free disk space on this logical disk has fluctuated from approx. 30% free to a minimum value of less than 1%. The chart looks ahead a few months and predicts that the best I can hope for (assuming I leave things as they are) is no more than 7% free disk space.


Predictive Alerts

Back at the Service Details report, I can click the Events tile shown in the image below to give me an Events Report with a heads-up on the forecasted events and alerts that are likely to occur in my environment within the next 24 hours.


All Alert and Event reports have built-in filters for every chart to give you a more scoped analysis view of what's going on. From the Event Report shown in the image below, I can see there are some predicted alerts and events that I need to pay attention to.


Drilling further into the predicted alert value for a particular monitored object, I’m presented with a ‘IIS 8 Web Server is unavailable’ alert that´s been predicted and the amount of times it has happened over the last month. I can see the time of day the alerts usually show up. In this example, these alerts typically occur around 6am every day.


If I go back to the previous view and click into the Events tile, I can see it’s broken down into three sections.

The first section is a summary where you can see information on the top hosts, data channels, rules, management packs etc. which are generating the most events. In the image below, we can see that the server generating the most events is SEGOTSQL01. The grey bar in the middle displays last month´s value. You can also see that this server alone has generated 88% of all events for the current period.


The middle section of this report displays the time and day of the week that the events are generated.


The final section of this report gives us an insight into both the last 30 days and the last 12 months for how events are being generated.


Custom Reports

It's easy to create your own custom reports and you can export them to PowerBi or Microsoft Excel in a matter of minutes. Here's a nice example of one-such custom exported report...


Licensing

I mentioned earlier that I love free solutions for SCOM and when I quizzed Jonas on how much this awesome offering costs to license, I was delighted to hear that Approved have decided to release it for free! They do require a one-off nominal setup and training fee but aside from that, there's no other limitations on the platform.

Summary

If you're interested in deploying these free solutions into your SCOM environment, then use the contact info here to get in touch with the team at Approved. For more information on the IT Analytics platform, take a read of some blog posts written by well known SCOM community blogger Daniel Ă–rneling here and here.