Making IT Fly Like a Canary

Photo credit: iStockphoto/Montileo

IT operations (ITOps) teams have been doing their jobs using traditional practices, such as information technology infrastructure library (ITIL), for many years. Their focus has been on consistency, reliability, and stability, using precise and standardized approaches for ensuring performance and service availability.

On the other hand, DevOps is defined by speed and flexibility. Yet modern IT organizations need both. That’s why there’s been a movement in recent years to merge DevOps principles into IT operations, given the more dynamic, unpredictable nature of IT infrastructure today.

Canary deployments, a widespread DevOps practice for staggered rollouts, is a prime example of how DevOps can positively influence enterprise IT operations. This deployment method sends updates to small groups incrementally to catch issues and fix them quickly rather than deploying to the entire population of users at once.

In a mass deployment, if a significant bug is discovered, it takes a lot more money and pain to find and fix the issues and then redeploy across a large user base. Canary deployments also allow for continual improvements: fine-tuning the release for better user experience or outcome.

Consider the “canary in the coal mine” analogy. Based on an actual practice in the mining industry where canaries were ruthlessly sent into mines to test for poisonous gases and protect the humans, canary deployments intend to isolate the impact of software defections to a small audience.

Thankfully, mining operations no longer sacrifice innocent birds to maintain worker safety. Still, in software and IT, canaries play a valuable role in reducing the risk of the rapid release and change cycle. And nobody dies.

Use cases in ITOps

Canary deployments have been around for several years, and while common in DevOps and SaaS organizations, they are still rare in enterprise IT environments. When done correctly, they can result in faster, cleaner, and more successful changes. Ostensibly, this practice could be useful in a lot of IT operations change events, but I think the two below will provide the most immediate, measurable benefit.

  • Patching desktops, servers, and operating systems is a routine yet important IT operations task. If something goes wrong during the update, and heaven forbid takes down the network or results in terrible response time for the entire company for a day, it's going to make a lot of people unproductive and unhappy. Instead, you can create logical segments for an incremental canary rollout: this could be by device type, cluster, geography or data center, business unit, or even by the customer, such as at an MSP. Depending upon the kind of release you're doing, you could stagger each rollout by an hour or even a day, monitoring the new environment for any issues. Once clear, you move on to the next batch unless you've got a problem to fix first.
  • Nightly backups are another area where canary deployments can be useful. Let’s say you want to back up the virtual storage (VMDK) in two vCenters to a cloud service like AWS. In a large environment where you've got 100 or more VMs, having to stop the backup midstream, deal with any issues such as corrupt files, and then repeat the entire backup process is time-consuming. By segmenting that workload into four 25 VM groups and then using various tools (including sys log monitoring and cloud service monitoring), you'll know quickly if there have been incomplete transfers or any other problems before starting other backup segments.

A few considerations

While straightforward from the outside, there are a few things to keep in mind when running canary deployments:

  • Put in the time in the beginning: Most companies have a patching tool or service like Ansible to automate change events. But many of these IT automation tools don’t natively support segmentation and scheduling. So, while you can use your existing toolsets, you'll likely need to adapt them to canary rollouts. Initially, the configuration will be time-consuming but will get easier with more practice.
  • Consider the tradeoffs: With staggered deployments, not everybody benefits from the new release at once. If your team must fix bugs before releasing it to other groups, it can create tension in the business for anticipated updates. For a mission-critical end-user service, such as an update to the sales management software during the end-of-quarter reporting period, it may not be wise to do a canary release. Your people may need that update ASAP to meet their deadlines.
  • Re-orient toward user experience: As with many changes in IT, mindsets must also shift. The age-old “set and forget” IT management practices are becoming less relevant in today’s distributed, always-changing, hybrid cloud environment. With a DevOps orientation focused on delivering an unforgettable end-user experience, IT operations teams can adopt these new practices swiftly. IT Ops will need to give up some of their tried-and-true practices to experiment with new, more agile ways of completing tasks.

Ultimately, experimenting with DevOps practices such as canary deployments can help IT (and ITOps) bridge the gap with the business and deliver more value, faster.

Michael Fisher, product manager at OpsRamp, wrote this article. The views and opinions expressed in this article are those of the author and do not necessarily reflect those of CDOTrends. Photo credit: iStockphoto/Montileo