Production-Grade Kubernetes on a Single Server: The Complete Guide
Why do this
One server. The full production stack. No excuses.
The idea is simple: build a Kubernetes cluster running the same stack you’d use on AKS or EKS — service mesh, automatic TLS, observability, layered security — but on your own hardware. This isn’t minikube. This isn’t a test lab. It’s an environment serving real workloads to real users.
Why a single node? Because for personal projects, side projects, and serious experimentation, a 15-30€/month server gives you more than enough. And the experience of operating this stack transfers directly to multi-node clusters in the cloud.
The end result: a cluster where deploying a new app means creating a namespace, labeling it for the service mesh, creating an HTTPRoute, and running kubectl apply. Automatic TLS, automatic DNS, metrics, logs, and security — it’s all already there.
The server and prerequisites
What you need:
- Server with Ubuntu 22.04+ — minimum 4 vCPUs and 8GB RAM. With the full stack running, base consumption is around 4-5GB RAM. If you’re running workloads on top, 16GB is more comfortable.
- cgroup v2 — Ubuntu 22.04+ has it enabled by default. Verify with
stat -fc %T /sys/fs/cgroup(should returncgroup2fs). - Domains on Cloudflare — needed for DNS-01 challenge (wildcard certs) and External DNS (automatic A record creation).
- Cloudflare API Token with
Zone:DNS:Edit+Zone:Zone:Readpermissions for all zones you’ll use. - Tailscale installed — for secure API Server access.
The Kubernetes API Server (port 6443) must never be exposed to the internet. We use Tailscale as a VPN for external access, and nftables to filter by network interface. If someone reaches your 6443, they have the keys to the kingdom.
Base: kubeadm + containerd
Prepare the system
First, the kernel modules and network parameters that Kubernetes needs:
# Kernel modules
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# Network parameters
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
Install containerd
sudo apt-get update
sudo apt-get install -y containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
# Enable SystemdCgroup — critical for cgroup v2
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd
sudo systemctl enable containerd
If you don’t enable SystemdCgroup = true, kubelet and containerd will use different cgroup drivers. The result: pods dying randomly, kubelet restarting, and hours of debugging. This is the most common mistake in kubeadm installations.
Install kubeadm, kubelet, kubectl
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | \
sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | \
sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
Initialize the cluster
sudo kubeadm init \
--pod-network-cidr=10.244.0.0/16 \
--service-cidr=10.96.0.0/16 \
--skip-phases=addon/kube-proxy
We skip kube-proxy with --skip-phases=addon/kube-proxy because Calico in eBPF mode replaces it entirely. eBPF handles service load balancing directly in the kernel, without the iptables chains that kube-proxy generates. Better performance, less complexity.
After init, configure access and remove the master node taint:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Allow the master node to run workloads
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
On a single-node cluster, this last step is mandatory. Without it, nothing gets scheduled on your only node.
CNI: Calico with eBPF
Why Calico eBPF
Calico with eBPF gives you three things that iptables mode doesn’t:
- Replaces kube-proxy — service load balancing in eBPF, no iptables
- Better performance — especially with many services, where iptables chains become linear
- Native network visibility — Calico can use eBPF data for policies and metrics
Install the Tigera Operator
helm repo add projectcalico https://docs.tigera.io/calico/charts
helm repo update
kubectl create namespace tigera-operator
helm install calico projectcalico/tigera-operator \
--version v3.29.2 \
--namespace tigera-operator
Here’s a real gotcha: you need to wait for the operator to register the CRDs before applying the Installation resource. If you apply the CR immediately, it fails silently or the resource gets stuck in an inconsistent state.
# Wait for CRDs to be registered
kubectl wait --for=condition=Established \
crd/installations.operator.tigera.io \
--timeout=120s
Apply the Installation CR with eBPF
# calico-installation.yaml
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
calicoNetwork:
ipPools:
- name: default-ipv4-ippool
blockSize: 26
cidr: 10.244.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
linuxDataplane: BPF
hostPorts: Enabled
cni:
type: Calico
controlPlaneReplicas: 1
---
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
name: default
spec: {}
kubectl apply -f calico-installation.yaml
With linuxDataplane: BPF, Calico operates entirely in eBPF. If you had kube-proxy installed (which isn’t the case if you followed the steps above), you should patch its DaemonSet so it doesn’t get scheduled:
# Only if kube-proxy is installed
kubectl patch ds -n kube-system kube-proxy \
-p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": "true"}}}}}'
Service Mesh: Istio Ambient Mode
Why ambient and not sidecar
Sidecar mode injects an Envoy proxy into every pod. It works, but has overhead: each pod consumes more memory, startup is slower, and there’s a whole category of bugs related to sidecar vs. application init order.
Ambient mode changes the model: a ztunnel daemon per node handles mTLS and L4 automatically for all pods in the namespace. If you need L7 capabilities (retries, timeouts, circuit breaking, HTTP observability), you deploy a waypoint proxy shared per namespace. You only pay the L7 cost where you actually need it.
Install Istio
istioctl install --set profile=ambient -y
To enable ambient on an application namespace:
kubectl label namespace my-app istio.io/dataplane-mode=ambient
To add a shared waypoint proxy (L7):
istioctl waypoint apply --namespace my-app --enroll-namespace
Don’t apply ambient to infrastructure namespaces like kube-system, calico-system, tigera-operator, istio-system, or monitoring. The service mesh is for your application workloads. Infrastructure components have their own security mechanisms, and putting them in the mesh only introduces problems.
Ingress: Gateway API + cert-manager + MetalLB
This is the layer that makes your services accessible from the internet with automatic TLS. Four components working together.
MetalLB
In the cloud, a LoadBalancer type Service gets an external IP from the provider. On bare metal, you need MetalLB for that.
helm repo add metallb https://metallb.github.io/metallb
helm repo update
helm install metallb metallb/metallb \
--namespace metallb-system \
--create-namespace
Wait for pods to be ready, then configure the IP pool. On a single server, the pool is the server’s public IP:
# metallb-config.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: default-pool
namespace: metallb-system
spec:
addresses:
- 203.0.113.10/32 # Your public IP
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: default
namespace: metallb-system
spec:
ipAddressPools:
- default-pool
cert-manager + ClusterIssuer
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true
The ClusterIssuer uses DNS-01 with Cloudflare to issue wildcard certificates:
# cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: your-email@example.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- dns01:
cloudflare:
apiTokenSecretRef:
name: cloudflare-api-token
key: api-token
Create the Secret with the Cloudflare token:
kubectl create secret generic cloudflare-api-token \
--namespace cert-manager \
--from-literal=api-token=YOUR_CLOUDFLARE_TOKEN
And the wildcard Certificate:
# wildcard-cert.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: wildcard-darkden-net
namespace: istio-system
spec:
secretName: wildcard-darkden-net-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- "darkden.net"
- "*.darkden.net"
Istio Gateway
# gateway.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: main-gateway
namespace: istio-system
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
gatewayClassName: istio
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
- name: https
protocol: HTTPS
port: 443
tls:
mode: Terminate
certificateRefs:
- name: wildcard-darkden-net-tls
allowedRoutes:
namespaces:
from: All
HTTPRoute example
# httproute-example.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: my-app
namespace: my-app
spec:
parentRefs:
- name: main-gateway
namespace: istio-system
sectionName: https
hostnames:
- "app.darkden.net"
rules:
- backendRefs:
- name: my-app-svc
port: 8080
External DNS
So DNS records get created automatically when you create an HTTPRoute:
helm repo add external-dns https://kubernetes-sigs.github.io/external-dns
helm repo update
helm install external-dns external-dns/external-dns \
--namespace external-dns \
--create-namespace \
--set provider.name=cloudflare \
--set env[0].name=CF_API_TOKEN \
--set env[0].valueFrom.secretKeyRef.name=cloudflare-api-token \
--set env[0].valueFrom.secretKeyRef.key=api-token \
--set "sources={gateway-httproute}" \
--set policy=sync \
--set registry=txt \
--set txtOwnerId=k8s-cluster
With this, every time you create an HTTPRoute with a hostname, External DNS creates the A record in Cloudflare automatically. When you delete it, it cleans up.
Data: Database operators
On a single server you can have multiple databases managed by Kubernetes operators. I won’t go into the configuration of each one — every operator deserves its own post — but the point is that the pattern works:
- CloudNativePG — Kubernetes-native PostgreSQL. Automated backups, failover (on multi-node), WAL archiving. For new applications, it’s the default choice.
- MariaDB Operator — for applications that need MySQL/MariaDB.
- Redis Operator (Spotahome) — for caching, sessions, and queues.
- Strimzi — Kafka for event-driven architectures. Resource-heavy, only if you really need it.
$ kubectl get pods -A | grep -E ‘cnpg|mariadb|redis’
cnpg-system cnpg-controller-manager-5d7f9b4c8-x2k9l 1/1 Running databases my-app-pg-1 1/1 Running databases my-app-mariadb-0 1/1 Running databases redis-node-0 1/1 Running
Observability: Grafana + Loki + Prometheus
kube-prometheus-stack
A single Helm chart that gives you Prometheus, Grafana, Alertmanager, node-exporter, and a bunch of preconfigured dashboards:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.adminPassword=YOUR_SECURE_PASSWORD \
--set prometheus.prometheusSpec.retention=30d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
Grafana is exposed via HTTPRoute with authentication. Configure Google SSO (or your preferred provider) in Grafana’s values so you don’t rely on username/password.
Loki + Promtail
For centralized logs:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install loki grafana/loki-stack \
--namespace monitoring \
--set promtail.enabled=true \
--set loki.persistence.enabled=true \
--set loki.persistence.size=20Gi
Recommended dashboards
- Node Exporter Full (ID 1860) — server metrics: CPU, RAM, disk, network
- Kubernetes / Pod Resources — consumption per pod and namespace
- Trivy Operator Vulnerabilities (ID 17813) — detected vulnerabilities in images
- Istio Mesh Dashboard — service mesh traffic, latencies, error rates
Security: Defense in depth
Security isn’t a component — it’s a stack of layers. Each one covers a different angle.
Trivy Operator
Automatic vulnerability scanning across all cluster images:
helm repo add aqua https://aquasecurity.github.io/helm-charts
helm repo update
helm install trivy-operator aqua/trivy-operator \
--namespace trivy-system \
--create-namespace \
--set trivy.ignoreUnfixed=true
Trivy creates VulnerabilityReport CRDs for each workload. You can query them with kubectl and visualize them in the Grafana dashboard.
OPA Gatekeeper
Policy enforcement to prevent dangerous configurations:
helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm repo update
helm install gatekeeper gatekeeper/gatekeeper \
--namespace gatekeeper-system \
--create-namespace
Constraint templates you should have from day one:
- Require resource limits — no pod without CPU and memory requests/limits
- Block
latesttag — force immutable image tags - Require labels —
app.kubernetes.io/nameandapp.kubernetes.io/versionmandatory - Block privileged containers — nobody runs as root unless explicitly needed
Falco
Runtime threat detection. Falco monitors syscalls and generates alerts when it detects suspicious behavior (shell in container, secret reads, writes to sensitive directories):
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
helm install falco falcosecurity/falco \
--namespace falco \
--create-namespace \
--set falcosidekick.enabled=true \
--set falcosidekick.webui.enabled=true
The Falco Sidekick UI should never be exposed to the internet. Access it only through Tailscale or port-forward. It contains sensitive information about your cluster’s internal behavior.
Network Policies
Default-deny on each application namespace, with explicit allows:
# default-deny.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: my-app
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
# allow-dns.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: my-app
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to: []
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
---
# allow-istio.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-istio
namespace: my-app
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: istio-system
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: istio-system
nftables to protect the API Server
Don’t use iptables with source IP filtering if you have Calico eBPF. eBPF processing modifies source IPs via DNAT before they reach iptables rules, so your source IP filter rules will never match correctly. Filter by network interface with nftables.
# /etc/nftables.conf
table inet filter {
chain input {
type filter hook input priority filter; policy accept;
# API Server: only accessible from localhost and Tailscale
tcp dport 6443 iifname "lo" accept
tcp dport 6443 iifname "tailscale0" accept
tcp dport 6443 drop
}
}
sudo systemctl enable nftables
sudo systemctl start nftables
With this, the API Server only accepts connections from the server itself (localhost) and from Tailscale VPN. Everything else is dropped.
Management: KEDA, Reloader, ESO
Three tools that simplify day-to-day operations:
KEDA (Kubernetes Event-Driven Autoscaling) — scales pods based on custom metrics, queue length, request rate, or any data source. More flexible than native HPA.
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
--namespace keda \
--create-namespace
Reloader — monitors ConfigMaps and Secrets. When they change, it automatically triggers rolling restarts of the Deployments that reference them. Without Reloader, changing a ConfigMap requires manually restarting pods.
helm repo add stakater https://stakater.github.io/stakater-charts
helm repo update
helm install reloader stakater/reloader \
--namespace reloader \
--create-namespace
External Secrets Operator — syncs secrets from external backends (Vault, Infisical, AWS Secrets Manager) into Kubernetes Secrets. Ready for when you want to centralize secret management outside the cluster.
helm repo add external-secrets https://charts.external-secrets.io
helm repo update
helm install external-secrets external-secrets/external-secrets \
--namespace external-secrets \
--create-namespace
The final result
With everything installed, here’s what you have running:
| Layer | Components |
|---|---|
| Runtime | containerd, kubelet, kubeadm |
| Networking | Calico eBPF (CNI + kube-proxy replacement) |
| Service Mesh | Istio ambient (ztunnel + waypoints) |
| Ingress | Gateway API + MetalLB + External DNS |
| TLS | cert-manager + Let’s Encrypt + Cloudflare DNS-01 |
| Data | CloudNativePG, MariaDB Operator, Redis Operator |
| Observability | Prometheus, Grafana, Alertmanager, Loki, Promtail |
| Security | Trivy, Gatekeeper, Falco, Network Policies, nftables |
| Management | KEDA, Reloader, External Secrets Operator |
$ kubectl get pods -A —field-selector status.phase=Running -o json | jq ‘.items | length’ 47
47 pods running on a single server. The infrastructure stack’s base consumption is around 4-5GB RAM and 1.5-2 vCPUs. The rest is available for your workloads.
Deploying a new app is:
# Create namespace and enable it for the service mesh
kubectl create namespace my-new-app
kubectl label namespace my-new-app istio.io/dataplane-mode=ambient
# Deploy waypoint if you need L7
istioctl waypoint apply --namespace my-new-app --enroll-namespace
# Apply manifests (Deployment, Service, HTTPRoute)
kubectl apply -f my-new-app/
DNS, TLS, metrics, logs, network policies, vulnerability scanning — it’s all already there waiting.
Conclusion
This setup isn’t a toy homelab. It’s the same architectural pattern you’d use in production cloud, adapted to a single node. The cost difference is massive: 15-30€/month for a dedicated server vs. hundreds of euros for a managed cluster with the same stack.
The gotchas are in the details that don’t appear in the official documentation:
- CRDs that don’t register in time and cause silent failures in operators
- eBPF breaking source IP filtering in iptables
- Istio init containers interfering with database operators that need to initialize the filesystem before the network is available
- containerd without SystemdCgroup causing random pod restarts
Each of these can cost you hours of debugging if you don’t know about them beforehand. Now you do.