The Kubernetes scheduler assigns pods to nodes based on resource availability. In many cases, this default behaviour works well. However, it may not meet specific compliance, performance, or high availability requirements in more complex setups.
Kubernetes affinity rules allow you to control pod placement within a cluster. This feature helps optimize resource allocation and application performance in complex environments. Affinity rules provide granular control over pod scheduling by:
- Placing pods on nodes with specific hardware or in certain regions
- Colocating related pods to reduce inter-service latency
- Distributing pods across failure domains to improve reliability
These capabilities become increasingly important as Kubernetes deployments expand. For example, you might use node affinity to ensure that GPU-intensive workloads run on nodes with appropriate hardware, or you might use pod anti-affinity to spread replicas of a stateless service across different nodes for better fault tolerance.
In this article, we cover types of affinity rules, their implementation, and their practical use in production environments. We also explore real-world scenarios with examples and discuss how automation tools can simplify affinity management in large-scale deployments.
Summary of Key Kubernetes Affinity Concepts
Concept | Description |
---|---|
Understanding Kubernetes affinity | Kubernetes affinity rules determine pod placement using node and pod labels. These mechanisms extend the basic scheduler to provide precise control over where workloads run in your cluster. |
Node affinity in detail | Required node affinity must be satisfied for pod scheduling. Preferred node affinity influences scheduling when possible. Node affinity use cases include hardware-specific workloads and geographic distribution of pods. |
Pod affinity and anti-affinity | Pod affinity attracts related pods to the same node or topology domain, while pod anti-affinity prevents similar pods from colocating. These mechanisms optimize network performance, ensure fault tolerance, and balance workload distribution across your cluster. |
Advanced affinity concepts | Multiple affinity rules can layer scheduling requirements for precise pod placement. Affinity interacts with taints/tolerations to create targeted workload isolation. Complex affinity settings impact scheduler performance and can cause resource fragmentation in large clusters. |
Using Affinity in Production | Production environments require balancing strict affinity requirements with scheduling flexibility. Monitoring pod placement patterns helps refine affinity settings over time. Affinity rules can trigger unexpected autoscaling behaviors and cause resource fragmentation that requires ongoing optimization. |
Understanding Kubernetes Affinity
Kubernetes affinity lets you control where pods run in your cluster. When you create a pod, the scheduler analyzes affinity rules to pick the suitable node for that pod. These rules can consider hardware requirements, custom labels, and even where other pods are running. Think of affinity as giving specific instructions to the scheduler about pod placement.
General Operation
By default, the Kubernetes scheduler places pods on a node with enough resources. Affinity rules give you more control over this process by letting you define specific conditions for pod placement. These rules work through Kubernetes labels: key-value pairs you add to nodes and pods. The scheduler uses these labels to match pods with nodes.
For example, consider a scenario with a database requiring high-performance SSD storage:
# Node with SSD storage
apiVersion: v1
kind: Node
metadata:
name: node-1
labels:
storage: ssd
---
# Pod that requires SSD storage
apiVersion: v1
kind: Pod
metadata:
name: database
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: storage
operator: In
values:
- ssd
In this example, the database
pod will only run on nodes labeled with storage: ssd.
Node Affinity in Detail
Node affinity controls pod placement on specific nodes in your Kubernetes cluster. Going beyond basic node selectors, it lets you set complex rules for the scheduler to match pods with nodes based on their labels.
Required node affinity
Required node affinity (requiredDuringSchedulingIgnoredDuringExecution
) sets hard rules that must be met. The scheduler will not compromise on these rules—if no nodes match the requirements, the pod remains pending. This behavior ensures that critical requirements like hardware compatibility or compliance regulations are never violated.
This example ensures that a pod runs only on nodes with GPUs:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpu
operator: In
values:
- "true"
Preferred Node Affinity
Preferred node affinity (preferredDuringSchedulingIgnoredDuringExecution
) works on a scoring system. Each rule weighs 1 to 100, with higher weights indicating stronger preference, and the scheduler calculates scores for each node based on how well it matches these preferences. The scheduler will still place pods even if no nodes match the preferences.
This example shows weighted preferences for pod placement:
apiVersion: v1
kind: Pod
metadata:
name: high-performance-pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80 # Strong preference for high-performance nodes
preference:
matchExpressions:
- key: performance
operator: In
values:
- high
- weight: 20 # Weaker preference for power-efficient nodes
preference:
matchExpressions:
- key: power
operator: In
values:
- efficient
Use Cases for Node Affinity
Node affinity addresses specific operational requirements in Kubernetes clusters:
- Hardware requirements: Some applications need specific hardware capabilities. Node affinity ensures that workloads run on compatible hardware, like placing machine learning pods on GPU nodes or database pods on nodes with SSDs.
- Geographic distribution: Multi-region clusters require careful workload placement. Node affinity helps maintain data sovereignty by keeping pods in specific regions or reducing latency by placing pods closer to users.
- Cost optimization: Cloud environments offer various instance types at different price points. Node affinity can direct noncritical workloads to spot instances while ensuring that production services use reserved instances, optimizing costs without compromising reliability.
- Resource isolation: In multi-tenant clusters, keeping workloads separate improves security and resource management. Node affinity creates clear boundaries between different environments, teams, or application types.
Pod Affinity and Anti-Affinity
Pod affinity and anti-affinity introduce a more sophisticated scheduling approach focusing on pod relationships. Unlike node affinity, which considers static node properties, these rules analyze the dynamic state of pod placements in your cluster.
Pod affinity creates colocation rules. When you specify pod affinity, the scheduler looks at running pods, checks their labels, and uses this information to determine where to place new pods. Consider a web application that frequently communicates with its Redis cache. Pod affinity ensures that the cache runs on the same node as the web application, minimizing network latency and improving response times.
Pod anti-affinity establishes repulsion rules. It prevents pods from running near other pods with matching labels. For instance, in a stateful application like MongoDB, you want replica pods to run on different nodes. Pod anti-affinity ensures that if one node fails, your database remains available through replicas on other nodes.
Feature | Pod affinity | Pod anti-affinity |
---|---|---|
Purpose | Colocates pods together | Keeps pods apart |
Use case | Reduce latency between services | Improve availability |
Scope | Works within topology domain | Works within topology domain |
Impact | Can increase resource density | Spreads resource usage |
Failure domain | May increase node failure impact | Reduces node failure impact |
Differences between pod affinity and anti-affinity
Pod affinity and anti-affinity mechanisms work through topology domains, which define the scope of affinity rules in your cluster. Think of topology as layers of your infrastructure, from a single node to an entire geographic region. This could be a node, a rack, a zone, or any other infrastructure subdivision labeled in your cluster.
When you set a topology key, you’re telling the scheduler: “Apply this rule within this boundary.” The scheduler then uses node labels to identify these boundaries.
Let’s examine an example configuration that combines both pod affinity and anti-affinity:
apiVersion: v1
kind: Pod
metadata:
name: cache
labels:
app: web-store
spec:
affinity:
podAffinity: # First rule set
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web
topologyKey: kubernetes.io/hostname
podAntiAffinity: # Second rule set
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- cache
topologyKey: topology.kubernetes.io/zone
The configuration above tells the scheduler the following:
- This pod is named
cache
and labeled as part ofweb-store
. - For
podAffinity
, it must run on the same node (topologyKey: kubernetes.io/hostname
) as pods labeled app=web. - For
podAntiAffinity
, it cannot run in the same zone (topologyKey: topology.kubernetes.io/zone
) as other pods labeledapp=cache
. - All rules are required (must be satisfied for scheduling).
Common Scenarios for Pod Affinity and Anti-Affinity
Pod affinity use cases:
- Web servers running near their caching layers
- API servers colocated with their authentication services
- Data processing pods placed close to data storage
- Logging agents running alongside specific applications
Pod anti-affinity use cases:
- Database replicas distributed across zones
- Load balancers spread across different nodes
- Critical service instances separated for high availability
- Competing workloads kept apart to prevent resource conflicts
Combining affinity patterns and topology domains gives granular control over workload placement. For example, you can place data processing pods on the same node as their data source for fast local access while spreading these processing units across different availability zones for resilience. If one zone goes down, the application continues running in other zones. This multi-level approach optimizes performance and reliability based on your application’s specific needs.
Advanced Affinity Concepts
Kubernetes affinity becomes more effective when you combine multiple rules and understand how they interact with other scheduling features. This understanding enables you to create sophisticated pod placement strategies that balance performance, availability, and resource utilization.
Combining Different Types of Affinity Rules
Multiple affinity rules in a single pod specification create layered scheduling requirements. Each rule adds constraints or preferences that the scheduler must consider when placing pods. Consider this example:
apiVersion: v1
kind: Pod
metadata:
name: multi-affinity-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: zone
operator: In
values:
- us-east-1a
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- cache
topologyKey: kubernetes.io/hostname
This configuration creates a two-layer scheduling requirement. First, the pod must run in the us-east-1a
zone through node affinity. Second, it must run on the same node as cache
pods through pod affinity. The scheduler will only place the pod on nodes that satisfy both conditions.
Affinity and Taint/Toleration Interaction
While affinity rules match pods to specific nodes, taints create repulsion rules that prevent pods from running on nodes unless they have matching tolerations. These mechanisms work together to create precise pod placement control. Here’s how they interact:
# Node with both taint and labels
apiVersion: v1
kind: Node
metadata:
name: specialized-node
labels:
type: gpu
environment: production
spec:
taints:
- key: dedicated
value: gpu
effect: NoSchedule
# Pod using both affinity and toleration
apiVersion: v1
kind: Pod
metadata:
name: gpu-workload
spec:
tolerations:
- key: dedicated
value: gpu
effect: NoSchedule
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: type
operator: In
values:
- gpu
In this example, the node is tainted to prevent general workloads from running on it. Only pods with the matching toleration can be scheduled there. Additionally, node affinity ensures that only pods specifically requesting GPU resources will be placed on this node. This combination creates a specialized node that runs only GPU workloads.
Scheduling Behaviors and Rule Evaluation
The Kubernetes scheduler processes affinity rules through an evaluation process. It first identifies nodes that satisfy all required rules during the filtering phase. These rules act as hard constraints—if a node doesn’t meet them, it’s immediately excluded from consideration.
After filtering, the scoring phase begins. The scheduler evaluates preferred rules and assigns scores based on their weights. Consider this preference configuration:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 90 # High priority for performance
preference:
matchExpressions:
- key: performance
operator: In
values:
- high
- weight: 10 # Lower priority for cost
preference:
matchExpressions:
- key: cost
operator: In
values:
- low
This configuration strongly emphasizes high-performance nodes (weight: 90
) while maintaining a minor preference (weight: 10
) for low-cost nodes. The scheduler will strongly favor high-performance nodes but might choose a low-cost node if other factors make it a better overall fit.
Operator Mechanics
The affinity system provides several operators for label matching, each serving different comparison needs:
In
: Label value must match one of the specified values.NotIn
: Label value must not match any specified values.Exists
: Label key must exist (no value checking).DoesNotExist
: Label key must not exist.Gt
: Label value must be greater than specified value.Lt
: Label value must be less than specified value.
Consider the pod specification below:
apiVersion: v1
kind: Pod
metadata:
name: app-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cpu-cores # Numeric comparison
operator: Gt
values: ["4"]
- key: disk-type # Value matching
operator: In
values: ["ssd", "nvme"]
- key: environment # Label existence
operator: Exists
- key: maintenance # Label absence
operator: DoesNotExist
This example demonstrates different operator behaviors:
Gt
: Matches nodes with more than 4 CPU coresIn
: Matches nodes with either SSD or NVMe disksExists
: Ensures that the node has an environment label (value doesn’t matter)DoesNotExist
: Ensures that the node is not marked for maintenance
These operators give you control over how affinity rules match labels. When the scheduler evaluates rules, it uses these operators to determine whether each node or pod meets the specified criteria. Understanding these operators helps you create accurate and efficient scheduling rules.
Using Affinity in Production
Setting up affinity rules for production workloads involves balancing control and flexibility. While affinity allows you to control pod placement, you must plan for scale and manage increasing complexity.
Balancing Flexibility and Constraints
Production deployments need to balance strict requirements with flexible scheduling. Using required rules only when necessary helps maintain this balance. For example:
apiVersion: v1
kind: Pod
metadata:
name: production-app
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: environment # Strict environment requirement
operator: In
values:
- production
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 70
preference:
matchExpressions:
- key: instance-type # Flexible hardware preference
operator: In
values:
- high-memory
This configuration enforces critical requirements while maintaining scheduling flexibility. The pod must run in production but has flexibility in hardware selection.
Monitoring and Adjusting Affinity Rules
Monitor your cluster’s scheduling decisions and pod placement patterns. Key metrics to watch:
- Number of pending pods due to affinity rules
- Pod scheduling latency
- Node resource utilization across zones
- Distribution of pods across topology domains
Adjust affinity rules based on these observations. If pods frequently remain unscheduled, consider relaxing strict requirements to preferred rules, expanding the set of matching labels, or adjusting the topology key scope.
Integration with Autoscaling
Affinity rules can impact cluster autoscaling behavior. Consider this deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 10
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-app
topologyKey: kubernetes.io/hostname
The anti-affinity rule forces replicas onto different nodes. Combined with autoscaling, this might trigger node scaling even when total cluster capacity is available but not in the required distribution.
Challenges and Limitations
Managing affinity in large clusters introduces several challenges:
- Resource fragmentation: Strict affinity rules can lead to underutilized nodes. When pods can only run on specific nodes, you might have resources spread inefficiently across the cluster. Resource fragmentation becomes even more complex when managing CPU and memory requests alongside affinity rules. Tools like StormForge can help by automatically right-sizing resource requests based on actual utilization data while working within the constraints set by your affinity rules. This helps maintain efficient resource usage even with strict node and pod affinity requirements.
- Scale considerations: Complex affinity rules can impact scheduler performance in large clusters. Each pod creation requires evaluating multiple rules across many nodes. Consider these factors when designing affinity rules:
- Number of label selectors per rule
- Complexity of matching expressions
- Frequency of pod creation
- Cluster size and node count
- Performance Impact: Complex affinity configurations can increase scheduling latency. The scheduler must evaluate more conditions and consider more factors when placing pods. This becomes more noticeable in clusters with many nodes, frequent pod creation/deletion, multiple affinity rules per pod, or complex topology requirements.
Tools and Automation for Managing Affinity
As you manage multiple affinity rules for various deployments and nodes, testing and maintaining these configurations becomes complex. While a few pods with simple affinity rules are manageable, production environments often involve dozens of rules spanning multiple applications, teams, and infrastructure layers.
The Kubernetes scheduler’s extensible architecture supports the need for managing complex setups through plugins and simulation tools. These tools help automate routine tasks and ensure that affinity rules work as intended across your cluster.
Kubernetes Scheduler Simulator
The scheduler simulator provides a safe environment to test affinity configurations before applying them to production clusters. It replicates the behavior of the Kubernetes scheduler without affecting real workloads. This simulation environment helps you understand how affinity rules interact with your cluster’s topology and resource constraints.
The simulator constructs a virtual cluster environment:
# Example simulation input
cluster:
nodes:
- name: node-1
labels:
zone: us-east-1
instance: high-cpu
- name: node-2
labels:
zone: us-east-1
instance: high-memory
pods:
- name: test-pod
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: instance
operator: In
values:
- high-cpu
Running simulations helps identify scheduling issues that only surface under specific conditions. You can test scenarios like node failures, zone outages, or resource constraints to verify that your affinity rules maintain proper pod distribution.
Scheduler Plugins
The Kubernetes scheduler architecture supports plugins that modify or extend scheduling behavior. These plugins integrate directly with the scheduling pipeline, adding new capabilities while maintaining compatibility with existing affinity rules.
Popular scheduler plugins include the following:
- NodeResourceTopology: Makes scheduling decisions based on NUMA topology and device locations. This helps optimize performance for hardware-dependent workloads.
- Capacity Scheduling: Controls resource allocation across namespaces while respecting affinity rules. This prevents one team’s affinity rules from impacting others’ resource guarantees.
Conclusion
As you start working with Kubernetes affinity, you’ll first use simple node and pod affinity rules. Over time, you’ll combine these with anti-affinity patterns to build more sophisticated scheduling strategies. When combined, these rules create exact workload distribution patterns across your cluster.
Node affinity controls pod-to-node matching through labels. This enables scheduling pods on nodes with specific hardware like GPUs or in particular regions for compliance requirements.
Pod affinity manages service relationships. It reduces network latency by colocating communicating services, like placing caches near their applications. Pod anti-affinity improves reliability by distributing replicas across failure domains.
Topology domains define the scope of affinity rules, from node-level to region-level constraints. Scheduling weights influence pod placement when using preferred rules. These features work with taints and tolerations to implement complex scheduling policies. Understanding when to use each feature will help you build practical scheduling configurations.
Related Blogs
How I Rethought Cloud Budgeting—And What Finance Leaders Need to Know
If you’re a finance leader trying to bring more structure and strategy to cloud budgeting, you’re not alone. While most…