
Why is rebooting Cloud Service Role instances important?
When running Cloud Services, Microsoft recommends at least 2 active instances in each role, so that if one instance is being rebooted or malfunctioning, the other instance remains operational. This is why, auto-rebooting of instances is a vital strategy to heal memory leaks, connection leaks, and other similar issues.
In this case study, we’ll discuss a few very simple approaches to keep your Azure Cloud Roles stable proactively and reactively. CloudMonix is a tool that helps to ensure stability for Azure Cloud Roles as well as for its successor Azure VM Scale Sets. We’ll also discuss methods to ensure that reboots do not cause issues or outages.
Proactive stability – DAILY SCHEDULED REBOOTS
CloudMonix has specific functionality that makes proactive daily rebooting of Cloud Role instances, one at a time, at a simple checkbox click. Read below >
Reactive stability – REBOOTS ON DEMAND
CloudMonix allows for immediate and automatic recovery from such events via reactive reboots. Read below >
Gracefully handling reboots
How To Do Daily Reboots
“Daily Reboot” action is disabled by default. To enable this action go to settings of the specific resource, open “Actions” tab and select the “Daily Reboot” action from the default list. Check the “Enabled” checkbox and adjust parameters as applicable.
CloudMonix triggers “Daily Reboot” action with a single line of code:
CheckTimeUtc.Hour == (InstanceIndex % 24)
“CheckTimeUtc” is a variable that represents the current time in UTC. “InstanceIndex” is another variable that CloudMonix tracks when it evaluates action against every Azure cloud role instance. The rest of the code checks if the division of that number by 24 returns the remainder equal to the current hour. This is a simple and elegant way to ensure every instance is restarted once per day.
How To Configure Reboots On Demand
Setting up an action that reboots an instance when available memory drops below some threshold for a sustained amount of time is trivial and takes a few seconds. This action is built-in into the default CloudMonix profiles as “Low Ram Reboot” and is disabled by default. In order to activate this action select the “Low Ram Reboot” action under “Actions” tab in instance settings and check the “Enabled” checkbox. Configure additional parameters as described in the picture.
Gracefully Handling Reboots
Azure Web Roles:
Most of the time such reboots are handled by the platform out of the box. The instance is taken out of the load balancer first but is not rebooted right away. This allows it to stop receiving any further requests and finalize web requests that are in the queue. Rebooting Azure Web Roles runs relatively painless if the amount of current web requests is small.
Azure Worker Roles and Azure Web Roles with slower response times:
In this case, Azure will wait with the reboot until all work is complete. This is done by overriding the “OnStop” method in the WorkerRole class and ensuring that work is completed before allowing the method to exit. Do keep in mind that Azure will wait for up to 5 minutes before it forces a reboot, so it’s necessary to quickly clean up any work.
A great article that outlines proper handling of “OnStop” event in the WorkerRole class written by Rick Anderson at Microsoft is available here.