Pod disruption is when a pod temporarily stops or restarts, as initiated by a Kubernetes administrator or an automated system. Disruptions happen during scheduled maintenance, updates, scaling operations, or when enforcing pod disruption budgets (PDBs). The goal of a PDB is to manage the availability and performance of applications running in the cluster and avoid causing significant service interruptions.
There are two types of disruptions: voluntary and involuntary. Involuntary disruptions occur due to unavoidable hardware or system errors, such as VM deletion or network partition. Voluntary disruptions are intentional actions by an application owner or cluster administrator, such as deleting or updating a deployment or directly deleting a pod. Pods are also evicted out of nodes while rolling out software updates, autoscaling on nodes, and implementing priority classes, resulting in disruptions. Disruptions, even if voluntary result in application downtime, so managing those disruptions with pod disruption budgets are vital to maintain the reliability and availability of applications and services.
This article will explain what a PDB is, how it works, the benefits of its usage, and how it functions as a safety net for Kubernetes clusters. We will also look at the process of creating and managing PDBs.
Summary of key pod distribution budget concepts
Concept | Description |
---|---|
What is a Kubernetes pod disruption budget? | A PDB in Kubernetes specifies the minimum number of pods that must remain running and available during a disruption. |
How PDBs work | When creating a PDB, you specify a minimum number of pods to remain available and a selector to identify applicable pods. During disruptions, Kubernetes prioritizes evicting pods outside the PDB’s scope, following a process to minimize workload disturbance. |
Benefits of using PDBs | PDBs ensure that a minimum number of pods remain running during disruptions and give control over pod evictions during maintenance or scaling operations. |
Creating and managing PDBs | A PodDisruptionBudget Kubernetes resource consists of three main components: a label selector to identify pods, the minimum availability to define the minimum number of pods that must remain operational, and the maximum unavailability to indicate the maximum number of pods that can be non-operational after an eviction. |
How PDBs act as a safety net | A pod disruption budget sets the acceptable level of disruption for an application by defining a minimum number of available pods or a maximum number of unavailable pods in a deployment. PDBs are crucial for preventing node operation failures when using daemonsets or during node operations, such as upgrades or node pool scale-downs. |
Important considerations | PDBs minimize downtime during voluntary disruptions in Kubernetes by controlling the number of pods disrupted simultaneously. Despite some limitations, PDBs are essential for maintaining application availability, integrating seamlessly with Kubernetes tools for disruption management. |
Best Practices for Using PDBs | To ensure application availability, use accurate PDB selectors and favor percentage-based disruption budgets. Regularly monitor PDBs and associated pods, and update them as needed. |
What is a Kubernetes Pod Disruption Budget?
Kubernetes provides a range of features designed to ensure the high availability of applications even when you introduce frequent voluntary disruptions. A pod disruption budget in Kubernetes specifies the minimum number of pods that must remain running and available during a disruption.
PDBs protect application availability by controlling how many pods can be disrupted at any given time. The pod disruption budget does not constrain all voluntary disruptions, such as deleting deployments or stateful sets—it bypasses these disruptions. However, pod deletion or unavailability from a rolling upgrade to an application counts against the disruption budget.
How PDBs work
When creating a PDB, you specify a minimum available replica count and a selector that identifies the set of pods to which the PDB applies. The selector can be based on pod labels. During a disruption event, Kubernetes primarily evicts pods that do not match the selector in the PDB, ensuring that only the workloads described in the PDB are affected by the disruption.
Once Kubernetes determines the set of pods that a PDB is applied to, it will attempt to safely evict those pods from the affected nodes. The eviction process follows a priority order to minimize disturbance to workloads. Kubernetes first tries to gracefully stop all connections to the pod, such as open network connections or active requests. After that, it sends a SIGTERM signal to the pod’s container to initiate a graceful shutdown. If the pod has not terminated by the end of the specified grace period, Kubernetes sends a SIGKILL signal, forcing the pod to terminate.
The PDB also defines a disruption budget, representing the maximum number of pods that can be evicted at any time. This budget can be an absolute number or a percentage of the total number of replicas for the workload, ensuring that Kubernetes does not remove more pods than can be safely accommodated by the remaining nodes.
Benefits of using PDBs
The benefits of using pod disruption budgets include:
- Application availability: PDBs help maintain your application’s availability during disruptions by ensuring that a minimum number of pods remain running. This is crucial for preventing outages and ensuring that your services remain accessible to users.
- Control over pod evictions: PDBs give you greater control over how and when pods are evicted during maintenance events or scaling operations. You can protect critical workloads from being disrupted by specifying a minimum number of pods that must remain available.
- Support for autoscaling: PDBs work in conjunction with Kubernetes autoscaling features. They allow you to safely scale your applications up or down without risking the disruption of essential services since the PDB ensures that enough pods are always running to handle the workload.
Creating and managing PDBs
A PodDisruptionBudget Kubernetes resource consists of three main components:
- Label selector (.spec.selector): This mandatory field identifies the specific set of pods that the budget will cover.
- Minimum availability (.spec.minAvailable): This parameter defines the minimum number of pods that must remain operational after an eviction, even if one or more pods are evicted. You can specify this value as either an absolute number or a percentage.
- Maximum unavailability (.spec.maxUnavailable): Introduced in Kubernetes version 1.7, this field indicates the maximum number of pods that can be non-operational after an eviction. Similar to minAvailable, this value can be expressed as either a fixed number or a percentage.
Shown below is an example YAML of a pod disruption budget. In this case, if there are three replicas of an application, then the example-pdb disruption budget allows for the eviction of one pod; two pods will be retained because the minAvailable value is 2. You can see how to check the pod disruption budget status here.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: example-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: my-app
You can only set either maxUnavailable or minAvailable in a single pod disruption budget, not both. Both minAvailable and maxUnavailable can be expressed as integers or percentages. When you use Kubernetes, it will round up to the nearest whole number. If you specify maxUnavailable as a percentage, Kubernetes will round up the number of pods that can be disrupted, which may result in a disruption slightly exceeding your defined maxUnavailable percentage. Despite this, a count of maxUnavailable is generally preferred because it adjusts dynamically to changes in the number of replicas the controller manages.
A disruption budget does not ensure that the specified number or percentage of pods will always be available. For instance, if a node hosting a pod from a collection of pods fails while the collection is at the minimum size defined in the budget, the number of available pods could drop below the required threshold. The budget protects against voluntary evictions but not all possible causes of unavailability. If you configure maxUnavailable to 0 percent or 0, or set minAvailable to 100 percent or the total number of replicas, draining a node will not be possible (since you requested no pod evictions).
How PDBs act as a safety net
As mentioned earlier, a pod disruption budget (PDB) sets the acceptable level of disruption for your application by defining either a minimum number of available pods (minAvailable) or a maximum number of unavailable pods (maxUnavailable) in a deployment. It monitors the number of running replicas, often controlled by the Horizontal Pod Autoscaler (HPA). It uses the same pod label selector as the service to determine which pods the rules apply to.
The choice between setting a minAvailable or maxUnavailable value depends on the nature of your application. For example, a distributed system requiring a quorum would need minAvailable set to match the quorum size to avoid service failure. In most cases, setting maxUnavailable to 1 or more is sufficient. A maxUnavailable value of 1 ensures that pods are moved individually from a draining node to an available node. A higher value can be set if faster pod movement is needed, assuming that the replica set size is large enough to tolerate such disruption.
During node operations where one or more nodes may become unavailable—such as upgrades or node pool scale-downs—the drain process will pause until all PDB rules are met before evicting pods. Pods that would cause a PDB violation if evicted will wait until conditions are valid, which can prevent a node operation from completing. For instance, if a node is being upgraded and there’s insufficient capacity, the eviction process will fail because there is no alternative node to reschedule the moving pods. Here is an example of how the draining of nodes takes place when the pod disruption budget is set. Without a PDB as a safety net, the node would drain and evict all running pods, potentially reducing the deployment or replica set below a critical threshold for the service to function correctly.
Important considerations
Pod disruption budgets in Kubernetes are crucial to maintaining application availability during voluntary disruptions like maintenance or updates. However, they have limitations. PDBs only manage voluntary disruptions, leaving applications vulnerable to involuntary issues like node failures. Configuring PDBs can be complex, requiring a deep understanding of application needs. They may conflict with autoscaling operations, potentially delaying recovery. Additionally, they require ongoing adjustment as application demands change, and they don’t guarantee full availability—other resilience measures are needed.
Despite these limitations, PDBs are essential for optimal functionality in Kubernetes. They help minimize downtime by controlling the number of pods that can be disrupted simultaneously, ensuring that a minimum level of service remains available. This is vital for maintaining redundancy in high-availability applications and handling disruptions in a controlled manner.
PDBs also integrate seamlessly with Kubernetes tools such as kubectl, controllers, the cluster autoscaler, and node draining, making them essential to managing disruptions within the cluster. By using PDBs, operators can better align maintenance and operational activities with the application’s availability requirements, contributing to overall system stability and resilience.
Best practices for using PDBs
Using PDBs will help you ensure high availability and less downtime in your deployment. The following best practices can help maximize these advantages:
- Understand your applications: Be sure to understand your applications’ availability requirements. Determine how many pod replicas can be lost without impacting functionality.
- Use selectors wisely: Make sure that your pod disruption budget selectors are accurate to the pods you intend to protect. This will ensure that the PDB applies to the pods as intended.
- Prefer percentage-based disruption budgets: It is often recommended to use percentages rather than integer values as minAvailable or maxUnavailable values. This provides flexibility as the scale of your deployments increases over time.
- Set an unhealthy pod eviction policy: It is recommended that you set the AlwaysAllow unhealthy pod eviction policy for your pod disruption budgets to support the eviction of misbehaving applications during a node drain. The default behavior is to wait for the application pods to become healthy before the drain can proceed.
- Use eviction-aware tools: Use tools that respect PDBs by calling the Eviction API instead of directly deleting pods or deployments. When a pod is evicted using the eviction API, it is gracefully terminated, honoring the terminationGracePeriodSeconds setting in its PodSpec. The eviction requests that kubectl submits on your behalf may be temporarily rejected, so the tool periodically retries all failed requests until all pods on the target node are terminated or until a configurable timeout is reached.
- Monitor PDBs: Continuously monitor the status and details of your PDBs and associated pods. Use Kubernetes commands or the dashboard to understand the disruptions’ effects and ensure that PDBs are configured correctly.
- Review and update: Using PDBs is an ongoing activity. Periodically review and update PDBs based on any changes made to applications, the Kubernetes cluster, or other organizational needs. Proactively change values to match the needs of your organization.
Conclusion
Managing pod disruptions within Kubernetes is essential for maintaining application availability and performance. Pod disruption budgets ensure that a minimum number of pods remain available during voluntary disruptions, offering control over pod evictions and supporting autoscaling. However, PDBs only manage voluntary disruptions and can sometimes conflict with other Kubernetes features.
Best practices for using PDBs include understanding application needs, using selectors correctly, preferring percentage-based budgets, allowing unhealthy pod evictions, utilizing eviction-aware tools, and regularly reviewing and updating PDBs. By carefully managing PDBs, Kubernetes administrators can minimize downtime and enhance application resilience during cluster operations, making PDBs a crucial tool for maintaining system stability.
Related Blogs
How I Rethought Cloud Budgeting—And What Finance Leaders Need to Know
If you’re a finance leader trying to bring more structure and strategy to cloud budgeting, you’re not alone. While most…