Windows Services 

monitoring and

auto-healing

Why is monitoring Windows Services important?

Windows Services are critical components of many IaaS systems running in Azure cloud or on-premise. Processes like SQL Server or IIS run as Windows Services. Moreover, organizations often rely on custom-developed Windows Services to perform important background tasks.

In this case study, we’ll discuss how to set up monitoring and automatically restart crashed Windows Services using CloudMonix.

How to set up Windows Services monitoring

1. Run CloudMonix Setup Wizard to connect to an Azure environment (optional)

If you aren’t using CloudMonix yet, sign up for a free account, then authorize CloudMonix to view your Azure subscription.

Since monitoring and restarting of Windows Services happens thru manually deployed CloudMonix agent, Azure authorization step is not required. In fact, users can utilize this same procedure to restart Windows services on any Windows VMs, not just the ones deployed to Azure.

2. Deploy CloudMonix agent to the VM

You can download CloudMonix Windows Agent from the Resources or Dashboard screens in the CloudMonix portal. The agent is automatically configured to send data to your CloudMonix account.
CloudMonix - proactive reboots
After downloading, follow the installation instructions in the archive or learn more here. Once the agent is installed, it is automatically registered with CloudMonix. Simply refresh the CloudMonix portal to see it in the dashboard.

How to define a metric tracking Windows Service status

Once the VM resource has been brought into CloudMonix, define a new metric that tracks the status of the chosen Windows Service:

  • open configuration dialog for the VM resource;
  • in the Metrics tab, add a new metric of type “WindowsServiceState” and select service from the drop-down;
  • specify the metric name, e.g. MSDTCstatus” and save.
CloudMonix - proactive reboots

Defining self-healing actions

To define self-healing actions for a Windows Service:

  • in the “Actions” tab add a new action that will execute the “PowershellRestartService” command based on a custom expression. The action should be executed whenever the value of the previously defined metric is other than “Running”;
  • specify a meaningful Suspended period for the action, e.g. 20 min. This will allow the action to not be re-executed again within that time period and allow for the service status to stabilize;
  • give the action a name and save.
CloudMonix - proactive reboots

How to receive alerts when a particular Windows Service is not running

In order to receive alerts for a particular Windows Service complete the following steps:

  • add a new alert to monitored resource and give it a short descriptive name.  It will show up in notifications and on the dashboard;
  • specify the severity of the alert to have it be properly categorized and filtered for notifications and dashboards;
  • enter an expression that compares the value of the previously defined service state metric.  When the expression is TRUE the alert will fire;

Learn more about Windows Service State and other Windows-server based metrics here.

Hint: pressing space in the Expression field will trigger an auto-complete drop-down of all metric names that can be used in the expression.

CloudMonix - proactive reboots
The same approach can be used to automate other recovery procedures or maintenance tasks. CloudMonix can execute actions based on any metric captured anywhere in your Azure environment or according to a schedule. Learn more about CloudMonix automation here.