Windows Services are critical components of many IaaS systems running in Azure cloud or on-premise. Processes like SQL Server or IIS run as Windows Services. Moreover, organizations often rely on custom-developed Windows Services to perform important background tasks.
In this article, we’ll show you how to monitor and automatically restart crashed Windows Services using CloudMonix. If you are not familiar with CloudMonix, it is a cloud monitoring service that provides a set of sophisticated automation and monitoring capabilities for the Azure platform and on-premise Windows Servers.
In order to start monitoring and configure self-healing automation for Windows Services follow these 4 simple steps:
1. Run CloudMonix Setup Wizard to connect to an Azure environment (optional)
Since monitoring and restarting of Windows Services happens thru manually deployed CloudMonix agent, Azure authorization step is not required. In fact, users can utilize this same procedure to restart Windows services on any Windows VMs, not just the ones deployed to Azure.
2. Deploy CloudMonix agent to the VM
You can download CloudMonix Windows Agent from the Resources or Dashboard screens in the CloudMonix portal. The agent is automatically configured to send data to your CloudMonix account.
After downloading, follow the installation instructions in the archive or learn more here.
Once the agent is installed, it is automatically registered with CloudMonix. Simply refresh the CloudMonix portal to see it in the dashboard.
3. Define a metric tracking Windows Service status
Once VM resource has been brought into CloudMonix, define a new metric that tracks the status of the chosen Windows Service:
- Open configuration dialog for the VM resource.
- In the Metrics tab, add a new metric of type “WindowsServiceState” and select service from the dropdown.
- Specify metric name, e.g. “MSDTCstatus” and save.
4. Define a self-healing action
To define self-healing actions for a Windows Service:
- In the “Actions” tab add a new action that will execute the “PowershellRestartService” command based on a custom expression. The action should be executed whenever the value of the previously defined metric is other than “Running”.
- Specify a meaningful Suspended period for the action, e.g. 20 min. This will allow the action to not be re-executed again within that time period and allow for the service status to stabilize.
- Give action a name and save.
The same approach can be used to automate other recovery procedures or maintenance tasks. CloudMonix can execute actions based on any metric captured anywhere in your Azure environment or according to a schedule.
You can learn more about CloudMonix automation here.