Products

Cloud management

Kubernetes rightsizing

Cloud reselling

Customer stories

Lobster Data customer story

Blogs

Bill-Accurate Kubernetes Cost Allocation, Now Built Into CloudBolt

Partners

CloudBolt Partner Program

Become a partner

MSP/CSPs

Videos, demos, webinars

Kubernetes rightsizing trust gap: Why the stakes just got higher

Blogs

The VMware Double Tax: Why So Many Enterprises Stay Longer Than They Want

Videos, demos, webinars

How Acquia cut web node infrastructure by 65% with continuous Kubernetes rightsizing

Resources

All resources

Blog

Industry Research

Events

Support center

Documentation

Cloud infrastructure complexity calculator

Kubernetes utilization benchmark calculator

VMware benchmark calculator

Interactive cloud architecture builder

Videos, demos, webinars

CloudBolt CMP 3-minute demo

Company

About us

Press

Careers

SUPPORT

Service offerings

Documentation

Support center

Videos, demos, webinars

StormForge Optimize Live: 5-minute demo

Get started

‘Tis the Season to Avoid Enterprise IT and DevOps Gremlins

by: Brian Baggett / November 27, 2018

The likelihood of dealing with enterprise IT gremlins is heightened during certain times of the year for any DevOps team…

The likelihood of dealing with enterprise IT gremlins¹ is heightened during certain times of the year for any DevOps team. My brother, who works in IT Disaster Recovery for a healthcare agency, reminded me of this during our most recent Thanksgiving gathering. He had to address four hours of downtime right before the holiday, as something DevOps related pushed a change to the production system instead of in a test environment. Sound familiar?

Whether it’s a holiday, close of the quarter, or “go live” day, any number of factors can put a little extra stress on IT staff with more of a chance for network gremlins to plague any enterprise. Although not as mischievous as mythical gremlins, sloppiness causes trouble, difficulties, or unexpected failures—threatening security as well as contributing to downtime and poor performance.

Self-Service Resources and IT Automation

Keeping gremlins at bay can be achieved with a solid plan for self-service options and IT automation. End users need to have access to hardened resources and processes when others who have the keys to these resources are on PTO or swamped by other high priority projects.

Leaving users in the dust while waiting for resources or an update can make them turn to workarounds or short cuts. The idea is that you don’t want anyone in your organization going rogue during the stressful times. The more that enterprise IT and DevOps teams have self-service IT enabled, the less likely the chance for folks to fend for themselves.

Making any DevOps practice or IT process bulletproof for occasional mishaps is nearly impossible, but reducing the likelihood is worth the effort needed by using the following approaches:

Eliminate bottlenecks
Consider a typical workflow from start to finish and make sure that if there are dependencies that require manual input, you have taken that into consideration and have an alternative method for achieving the end result. One way to do this is to make sure that administrator access is enabled for trusted individuals who can step in if the primary admin is not available. In some cases, this person could be above or just below the person on the org chart. Get your boss’s boss to intervene when necessary and you’ll be guaranteed to move the bottleneck issue along a little faster.
Automate approvals
Always consider routine approval processes and automate them whenever you can. That does not mean to approve any request automatically but rather to set up automated checklists so that if the request meets those requirements, there’s no need to have a manual approval. You could also set up specific sets of resources that meet the requirements without an approval. This is particularly useful when you want to have self-service resources but not an open faucet. IT automation eliminates the unnecessary manual errors.
Consolidate resources
Another way to reduce mishaps or what is considered the “who’s on first?” effect is to make sure that resource management is centralized to specific teams with defined roles and plans for backup coverage. When resources and roles are scattered throughout the whole organization and someone with a key role is out on PTO, you’ll be scrambling to figure out where to get the IT resources you need—just like the old Abbott & Costello skit.
Embed security
Security must be part of the whole process from start to finish. When provisioning IT resources on premises for both private and public cloud environments, there’s special consideration for containerization and other virtualized environments in the cloud. Here’s a quick reference for security concerns and DevOps resources: Enterprise Hybrid Cloud Containerization and Rugged DevOps and DevSecOps for Security. This post drills down to these two manifestos, which are also helpful in hardening security issues.
- Rugged Software
- DevSecOps

A centrally managed platform like CloudBolt can get any IT organization on the right path to avoiding the “gremlin” effect, especially as we approach another holiday season and schedules and priorities will undoubtedly be different for many enterprises.

1—Gremlins are unexplained problems or faults (↑BACK↑)

Exclusive insights and strategies for cloud pros. Delivered straight to your inbox.

AUTHOR

Brian Baggett

Learn more

Related Blogs

When Karpenter isn’t enough: a real Kubernetes cost teardown

A high-volume payments platform runs hundreds of Kubernetes clusters and thousands of services, with a platform team responsible for the…

What actually happens when a workload OOMs in production

The pager goes off at 2:47 AM. CrashLoopBackOff on a payment service. The on-call rolls over, opens a laptop, runs…

Directionally Close Isn’t Defensible: Reconciling Kubernetes Cost to the Penny

Every Kubernetes chargeback program dies in the same meeting. The platform team puts together a thoughtful dashboard with costs broken…