Kubernetes Scalability thresholds


Since 1.6 release Kubernetes officially supports 5000-node clusters. However, the question is what that actually means. As of early Q3 2017 we are in the process of defining set of performance-related SLIs (Service Level Indicators) and SLOs (Service Level Objectives).

However, no matter what SLIs and SLOs we have, there will always be some users coming and saying that their cluster is not meeting the SLOs. And in most cases it appears that the reason behind is that we (as developers) have silently assumed something (e.g. there will be no more than 10000 services in the cluster) and users were not aware of that.

This document is trying to explicitly summarize limits for the number of objects in the system that we are aware of and state if we will try to relax them in the future or not.

Kubernetes thresholds

We start with explicit definition of quantities and thresholds we assume are satisfied in the cluster. This is followed by an explanation for some of those. Important notes about the numbers: 1. In most cases, exceeding these thresholds doesn’t mean that the cluster fails over - it just means that its overall performance degrades. 1. Some thresholds below (e.g. total number of all objects, or total number of pods or namespaces) are given for the largest possible cluster. For smaller clusters, the limits are proportionally lower. 1. The thresholds obviously differ between different Kubernetes releases (hopefully each of them is non-decreasing). The numbers we present are for the current release (Kubernetes 1.7 release). 1. There are a lot of factors that influence the thresholds, e.g. etcd version or storage data format. For each of those we assume the default from the release to avoid providing numbers for huge number of combinations of those. 1. The “Head threshold” is representing the status of Kubernetes head. This column should be snapshotted at every release to produce per-release thresholds (and dedicated column for each release should then be added).

Quantity Head threshold 1.8 release Long term goal
Total number of all objects 250000 1000000
Number of nodes 5000 5000
Number of pods 150000 500000
Number of pods per node1 110 500
Number of pods per core1 10 10
Number of namespaces (ns) 10000 100000
Number of pods per ns 3000 50000
Number of services 10000 100000
Number of services per ns 5000 5000
Number of all services backends TBD 500000
Number of backends per service 5000 5000
Number of deployments per ns 2000 10000
Number of pods per deployment TBD 10000
Number of jobs per ns TBD 1000
Number of daemon sets per ns TBD 100
Number of stateful sets per ns TBD 100
Number of secrets per ns TBD TBD
Number of secrets per pod TBD TBD
Number of config maps per ns TBD TBD
Number of config maps per pod TBD TBD
Number of storageclasses TBD TBD
Number of roles and rolebindings TBD TBD

There are also thresholds for other types, but for those the numbers depend also on the environment (bare metal or which cloud provider) the cluster is running in. These include:

Quantity Head threshold 1.8 release Long term goal
Number of ingresses TBD TBD
Number of PersistentVolumes TBD TBD
Number of PersistentVolumeClaims per ns TBD TBD
Number of PersistentVolumeClaims per node TBD TBD

The rationale for some of those numbers: 1. Total number of objects
There is a limitation on the total number of objects on the system, as this affects among others etcd and its resource consumption. 1. Number of nodes
We believe that having clusters with more than 5000 nodes is not the best option and users should consider splitting into multiple clusters. However, we may consider bumping the long term goal at some time in the future. 1. Number of services and endpoints
Each service port and each service backend has a corresponding entry in iptables. Number of backends of a given service impact the size of the Endpoints objects, which impacts size of data that is being sent all over the system. 1. Number of objects of a given type per namespace
This holds for different objects (pods, secrets, deployments, …). There are a number of control loops in the system that need to iterate over all objects in a given namespace as a reaction to some changes in state. Having large number of objects of a given type in a single namespace can make those loops expensive and slow down processing given state changes.

1 The limit for number of pods on a given node is in fact minimum from the “pod per node” and “pods per core times number of cores of a node”.