Cluster lifecycle includes deployment (infrastructure provisioning and bootstrapping Kubernetes), scaling, upgrades, and turndown.

Owner: @kubernetes/sig-cluster-lifecycle (kubernetes-sig-cluster-lifecycle at, sig-cluster-lifecycle on slack)

There is no one-size-fits-all solution for cluster deployment and management (e.g., upgrades). There’s a spectrum of possible solutions, each with different tradeoffs: * opinionated solution (easier to use for a narrower solution space) vs. toolkit (easier to adapt and extend) * understandability (easier to modify) vs. configurability (addresses a broader solution space without coding)

Some useful points in the spectrum are described below.

There are a number of tasks/features/changes that would be useful to multiple points in the spectrum. We should prioritize them, since they would enable multiple solutions.

See also: * Feature-tracking issue * Cluster deployment v2 issue * Shared infrastructure issue * Kube-up replacement that works for everyone * * Deployment DX Notes

We need to improve support/behavior for cluster updates/upgrades, as well. TODO: make a list of open issues. Examples include feature gating, node upgrades, and node draining.

Single-node laptop/development cluster

Should be sufficient to kick the tires for most examples and for local development. Should be dead simple to use and highly opinionated rather than configurable.

Owner: @dlorenc

See also: *

To do: * Replace single-node getting-started guides * Docker single-node

Portable multi-node cluster understandable reference implementation

For people who want to get Kubernetes running painlessly on an arbitrary set of machines – any cloud provider (or bare metal), any OS distro, any networking infrastructure. Porting work should be minimized via separation of concerns (composition) and ease of modification rather than automated configuration transformation. Not intended to be highly optimized by default, but the cluster should be reliable.

Also a reference implementation for people who want to understand how to build Kubernetes clusters from scratch.

Ideally cluster scaling and upgrades would be supported by this implementation.

Replace Docker multi-node guide.

To facilitate this, we aim to provide an understandable, declarative, decoupled infrastructure provisioning implementation and a portable cluster bootstrapping implementation. Networking setup needs to be decoupled, so it can be swapped out with alternative implementations.

For portability, all components need to be containerized (though Kubelet may use an alternative to Docker, so long as it is portable and meets other requirements) and we need a default network overlay solution.

Eventually, we’d like to entirely eliminate the need for Chef/Puppet/Ansible/Salt. We shouldn’t need to copy files around to host filesystems.

For simplicity, users shouldn’t need to install/launch more than one component or execute more than one command per node. This could be achieved a variety of ways: monolithic binaries, monolithic containers, a launcher/controller container that spawns other containers, etc.

Once we have this, we should delete out-of-date, untested “getting-started guides” (example broken cluster debugging thread).

See also: * Summary proposal * kubernetes-anywhere umbrella issue * * Bootstrap API * jbeda’s simple setup sketch

To do: * Containerize all the components in order to achieve OS-distro independence * Self-host as much as possible, such as using Deployment, DaemonSet, ConfigMap * Eliminate the need to read configurations from local disk * Dynamic Kubelet configuration * Kubelet checkpointing in order to reliably run control-plane components without static pods * Run Kubelet in a container * DaemonSet updates * DaemonSet for bootstrapping * Bootkube * Adopt a default network overlay, but enable others to be swapped in via composition * Need to make it clear that this is for simplicity and portability, but isn’t the only option * Link etcd and the master components into a monolithic monokube-like binary and/or finish hyperkube * Replace multi-node Docker getting-started guide * Make it easy to get the right version of Docker on major Linux distros (e.g., apt-get install…) * It’s easy to get the wrong version: docker,, docker-engine, …

Starting points: * *

Building a cluster from scratch

For people starting from scratch: * * *

We should simplify this as much as possible, and clearly document it.

This is probably the only viable way to support people who want to do significant customization: * cloud provider (including bare metal) * OS distro * cluster size * master and worker node configurations * networking solution and parameters (e.g., CIDR) * container runtime (Docker or rkt) and its configuration * monitoring solutions * logging solutions * ingress controller * image registry * IAM * HA * multi-zone * K8s component configuration

To do: * Simplify release packaging and installation * Finding and installing the right version of Docker itself can be hard (apt-get install docker/ isn’t the right thing) * Build rpms, debs? * Verify that system requirements have been satisfied (docker version, kernel configuration, etc.) * And ideally degrade gracefully and warn if they are not * Documentation * What is the latest release, how can I find it, how do I install it, what version of Docker/rkt/etc. is required? * An architectural diagram (like the one we use in our presentations) would help, too. * Explain the architecture * Link to instructions about how to manage etcd * Link to Chubby paper * Document system requirements (“Node Spec”) * OS distro versions * kernel configuration * resources * IP forwarding * Document how to set up a cluster * Adequately document how to configure our components. * Improve/simplify/organize command help * Hide/remove/deemphasize test-only options * Document how to integrate IAM * Create guides to help with key decisions for production clusters * Selecting a networking model * Managing a CA * Managing user authentication and authorization * Initial deployment requirements (memory, cpu, networking, storage) * Upgrading best practices * Code changes * Finish converting components to use configuration files rather than command-line flags * Facilitate managing that configuration using ConfigMap * Cluster config * Reduce external dependencies * APIs for reusable building blocks, such as TLS bootstrap, certificate signing for addons, teardown * Need key/cert rotation (master, service accounts) * Finish generalization of component registration * Improve addon management

Production-grade, easy-to-use cluster management tools/services

Easy to use and opinionated. Potentially highly optimized. Acceptable for production use. Not necessarily easily portable nor easy to extend/adapt/change.

Examples: * Kube-AWS * kops * Kargo * (is still needed?) * kompose8 * Tectonic * Kraken * NavOps Launch * Photon Cluster Manager * Platform 9 * GKE * * Juju