Setting Up Kubernetes On A Budget With Ceph Volumes

I have been really excited to use Kubernetes (aka k8s) in a semi-production way on my home server; however, there is only one of said server and not the 3+ that seem to be typical of a minimal kubernetes deployment. At the same time, I know there’s minikube and similar, but those give a strong sense of development-time, experimental usage. I want a small scale, but real kubernetes experience.

For me, an additional requirement to make it a real deployment is that it should support stateful services. One of the things I like about kubernetes is that persistent volumes are a first-class construct supported in various real-world ways. Since I’m working with a tight budget and a single server, I narrowed my search to open source, software-defined storage solutions. Ceph workeds out well for me since it was easy to setup, scaled down without compromises, and is full of featurea.

In this article I’ll show how to use a little libvirt+KVM+QEMU wrapper script to create three VMs, deploy kubernetes using kube-adm and overlay a ceph cluster using ceph-deploy. The setup might appear tedious, but the benefits and ease of kubernetes usage afterwards are well worth it

My environment

If you’re following along, it might help to know what I’m using to see how that aligns with what you’re using. My one and only home server is an Intel NUC with

i5 dual-core, hyperthreaded processor
16 GB RAM
240 GB SSD with about 100 GB dedicated to VM images
Ubuntu Xenial 16.04.4
Kernel 4.4.0-124

It has plenty of RAM for three VMs, but I’m going to be knowingly overcommitting the vCPU count of 6 (2 x 3 VMs) since there’s only 4 logical processors on my system. I’m not going to be running stressful workloads, so hopefully that works out.

Setup virtual machines

Baremetal or VMs?

My original desire was to avoid virtual machines (VMs) entirely and run both kubernetes and ceph directly “on metal”. I found that was technically possible with options such as “untainting” the k8s master node; however, striving for zero downtime really requires distinct nodes in order to achieve seamless cluster scheduling.

As such, one of the wheels I re-invented was to create a small tool to help me create VMs with minimal requirements. All that it requires is installation of the packages:

libvirt-bin
qemu-kvm

Networking setup for virtualization

Since I’ll eventually be port forwarding certain traffic from my ISP’s cable modem, I need the VMs to be directly routable on my LAN. For that I needed to configure the Linux kernel bridging by following this.

In my case, I set my /etc/network/interfaces with the following since I am using my LAN’s DHCP server to assign a fixed IP address. So the dhcp option eliminates the need specify address, gateway, and DNS.

auto br0
iface br0 inet dhcp
        bridge_ports eno1
        bridge_maxwait 0
        bridge_fd 0

To make sure iptable filtering and such of the host doesn’t interfere with the guest VMs doing their own, we ease packet forwarding by creating /etc/sysctl.d/60-bridge-virt.conf with

net.bridge.bridge-nf-call-arptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0

To apply those settings immediately, run:

sudo /lib/systemd/systemd-sysctl --prefix=/net/bridge

To ensure the settings are applied on reboot after networking is started, then create the file /etc/udev/rules.d/99-bridge.rules with the content:

ACTION=="add", SUBSYSTEM=="net", KERNEL!="lo", RUN+="/lib/systemd/systemd-sysctl --prefix=/net/bridge"

Create VMs using libvirt+KVM+QEMU

Download my helper script from my libvirt-tools repo and set it to be executable:

wget https://raw.githubusercontent.com/itzg/libvirt-tools/master/create-vm.sh
chmod +x create-vm.sh

I am going to create three VMs. My network has static IP address space starting at 192.168.0.150, so I’m configuring my VMs starting from that address.

The volume on my machine that contains /var/lib/libvirt/images on my system only has 112G so I am going to allocate an extra disk device of 20G for the three VMs. In total they will use 3 x (8G root + 20G extra) = 84G. The volumes are thin provisioned, but it’s still good to plan for maximum usage since I have limited host volume space (SSD == good for speed, but bad for capacity).

View full usage and default values of the helper script by running

./create-vms.sh --help

Make sure you have an SSH key setup since the helper script will tell cloud-init to install your current user’s ssh key for access as the user ubuntu. To confirm, list your keys using:

ssh-keygen -l

Create the first VM:

sudo ./create-vm.sh --ip-address 192.168.0.150 --extra-volume 20G nuc-vm1

You should now be able to ssh to the VM as the user ubuntu

ssh ubuntu@192.168.0.150

however, if that doesn’t seem to be working you can attach to the VM’s console using:

virsh console nuc-vm1

Use Control-] to detach from the console.

Repeat the same invocation for the other two VMs changing the IP address and name:

sudo ./create-vm.sh --ip-address 192.168.0.151 --extra-volume 20G nuc-vm2
sudo ./create-vm.sh --ip-address 192.168.0.152 --extra-volume 20G nuc-vm3

Simplify ssh access

Adding /etc/hosts entries for the VMs and their IPs will help ease the remainder of the setup tasks. In my case I added these:

192.168.0.150 nuc-vm1
192.168.0.151 nuc-vm2
192.168.0.152 nuc-vm3

To avoid having to specify the default ubuntu user for ssh’ing to each VM, add this, replacing the host names with yours, to ~/.ssh/config:

Host nuc-vm1 nuc-vm2 nuc-vm3
   User ubuntu

Confirm you can ssh into each, which also gives you a chance to confirm and accept the host fingerprint for later steps:

ssh nuc-vm1

While you’re in there you could also upgrade packages to get the VMs up to date before installing more stuff:

sudo apt update
sudo apt upgrade -y
# ...and sudo reboot if the kernel was included in the upgraded packages

Create kubernetes cluster

Pre-flight checks

ssh to the first VM and install kubeadm and its prerequisites.

The VMs were configured (via defaults) to generate unique MAC addresses for their bridged network device, but here’s an example of confirming:

$ ssh nuc-vm1 ip link | grep "link/ether"
    link/ether 52:54:00:91:76:e5 brd ff:ff:ff:ff:ff:ff
$ ssh nuc-vm2 ip link | grep "link/ether"
    link/ether 52:54:00:93:4c:83 brd ff:ff:ff:ff:ff:ff
$ ssh nuc-vm3 ip link | grep "link/ether"
    link/ether 52:54:00:b7:88:ad brd ff:ff:ff:ff:ff:ff

Likewise, the VMs were created with the default behavior of generating a UUID per domain/node. Here is confirmation of that:

$ ssh nuc-vm1 sudo cat /sys/class/dmi/id/product_uuid
781526BF-31E7-4339-8424-6B886A432968
$ ssh nuc-vm2 sudo cat /sys/class/dmi/id/product_uuid
AF93665F-6137-4125-9480-08B99B040BE8
$ ssh nuc-vm3 sudo cat /sys/class/dmi/id/product_uuid
C76F05E7-0289-4479-91CA-3D47A48096F4

Install Docker and Kubernetes packages

Install the recommended version of Docker by ssh’ing into the first node and starting an interactive sudo session with

sudo -i

Use the installation snippet provided in the Installing Docker section.

Still in the interactive sudo session, install kubeadm, kubelet, and kubectl.

Repeat the same for the other nodes installing Docker and steps after that.

Create the cluster

Before running kubeadm init skip to the pod network section to see what parameters should be passed. I’m going to use kube-router, so I’ll pass --pod-network-cidr=10.244.0.0/16. You can find more information about using kube-router with kubeadm here.

The full kubeadm command to run is:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16

Use the commands it provides at the end to enable kubectl access from the regular ubuntu user on the VM:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Be sure to note the kubeadm join command it provided since it will be used when joining the other two VMs to the cluster.

Now you can install the networking add-on, kube-router in this case:

kubectl apply -f https://raw.githubusercontent.com/cloudnativelabs/kube-router/master/daemonset/kubeadm-kuberouter.yaml

After about 30 seconds you should see the kube-router pods running using:

kubectl get pods --all-namespaces -w

Join the other two VMs to the kubernetes cluster

ssh to the next VM and become root with sudo -i. Then use the kubectl join command output from the init call earlier, such as

kubeadm join 192.168.0.150:6443 --token b2pp7j.... --discovery-token-ca-cert-hash sha256:...

If you ssh back to the first node, you should see the new node become ready after about 30 seconds:

$ kubectl get nodes
NAME      STATUS    ROLES     AGE       VERSION
nuc-vm1   Ready     master    22h       v1.10.3
nuc-vm2   Ready     <none>    23s       v1.10.3

Repeat the kubeadm join on the third VM.

Create a distinct user config

On the master node’s VM I ran the following and saved its output to a file on my desktop:

sudo kubeadm alpha phase kubeconfig user --client-name itzg

By default that user can’t do much of anything, but you can quickly fix that by creating a cluster role binding to the builtin cluster-admin role. In my case I ran:

kubectl create clusterrolebinding itzg-cluster-admin --clusterrole=cluster-admin --user=itzg

I used a naming convention of <user>-<role>, but you can use whatever naming convention you want for the first argument – it just needs to distinctly name the binding of user(s) to role(s).

Create ceph cluster

Now we’ll switch gears and ignore the fact that the three VMs are participating in a kubernetes cluster. We’ll overlay a ceph cluster onto them. My goal is to create a storage pool dedicated to kubernetes that can be used to provision rbd volumes.

In the same spirit as kubeadm ceph provides an extremely useful tool called ceph-deploy that will take care of setting up our VMs as a ceph cluster. It also comes with great instructions, that start here.

At the time of this writing, the latest stable release is luminous, so that’s what I’ll be using in the next few steps.

Most of the steps I’ll run with ceph-deploy will be invoked from the main/baremetal host, but really it can be done from any node that can ssh to the three VMs.

Setup ceph-deploy

First add the release key

wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -

Then, add the repository:

echo deb https://download.ceph.com/debian-luminous/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list

Finally, update the repo index and install ceph-deploy

sudo apt update
sudo apt install ceph-deploy

The tool is going to place some important config files in the current directory, so it’s a good idea to create a new directory and work within there:

mkdir ceph-cluster
cd ceph-cluster

Before proceeding, make sure you did the ~/.ssh/config setup above to configure ubuntu as the default ssh user for accessing the three VMs. Since that user already has password-less sudo access, it’s ready to use for ceph deployment.

Initialize cluster config

Initialize the ceph cluster configuration to point at what will be the initial monitor node. Think of the ceph monitor kind of like the kubernetes master/api node.

ceph-deploy new nuc-vm1

Install packages on ceph nodes

Now, install the ceph packages on all three VMs. I had to pass the --release argument to avoid it defaulting to the prior “jewel” release.

ceph-deploy install --release luminous nuc-vm1 nuc-vm2 nuc-vm3

Create a ceph monitor

After a few minutes of installing packages, the initial monitor can be setup:

ceph-deploy mon create-initial

To enable automatic use of ceph on the three nodes, distribute the cluster config using

ceph-deploy admin nuc-vm1 nuc-vm2 nuc-vm3

Create a ceph manager

As of the luminous release, a manager node is required. I’m just going to run that also on the first VM:

ceph-deploy mgr create nuc-vm1

Setup each node as a ceph OSD

Way back in the beginning of all this, you might remember that I specified an extra volume size of 20GB for each VM. That extra volume is /dev/vdc on each VM and will be used for the OSD on each:

ceph-deploy osd create --data /dev/vdc nuc-vm1
ceph-deploy osd create --data /dev/vdc nuc-vm2
ceph-deploy osd create --data /dev/vdc nuc-vm3

You can confirm the ceph cluster is healthy and contains the three OSDs by running:

ssh nuc-vm1 sudo ceph -s

To enable running ceph commands from the host, copy the admin keyring and config file into the local /etc/ceph:

sudo cp ceph.client.admin.keyring ceph.conf /etc/ceph/

Joining ceph and kubernetes

Create a storage pool

Create the pool with 128 placement groups, as recommended here:

sudo ceph osd pool create kube 128

In order to avoid the libceph error “missing required protocol features” when kubelet mounts the rbd volume apply this adjustment:

sudo ceph osd crush tunables legacy

Create and store access keys

Export the ceph admin key and import it as a kubernetes secret:

sudo ceph auth get client.admin|grep "key = " |awk '{print  $3'} |xargs echo -n > /tmp/secret
kubectl create secret generic ceph-admin-secret \
   --type="kubernetes.io/rbd" \
   --from-file=/tmp/secret

Create a new ceph client for accessing the kube pool specifically and import that as a kubernetes secret:

sudo ceph auth get-or-create client.kube mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=kube'|grep "key = " |awk '{print  $3'} |xargs echo -n > /tmp/secret
kubectl create secret generic ceph-secret \
   --type="kubernetes.io/rbd" \
   --from-file=/tmp/secret

I might be doing something wrong, but I found I had to also save the client.kube keyring on each of the kubelet nodes:

ssh nuc-vm1 sudo ceph auth get client.kube -o /etc/ceph/ceph.client.kube.keyring
ssh nuc-vm2 sudo ceph auth get client.kube -o /etc/ceph/ceph.client.kube.keyring
ssh nuc-vm3 sudo ceph auth get client.kube -o /etc/ceph/ceph.client.kube.keyring

Setup RBD provisioner

The out-of-tree RBD provisioner is pre-configured to manage dynamic allocation of RBD persistent volume claims, so download and extract the necessary files:

wget https://github.com/kubernetes-incubator/external-storage/archive/master.zip
unzip master.zip "external-storage-master/ceph/rbd/*"

Go into the deploy directory and apply the provisioner:

cd external-storage-master/ceph/rbd/deploy
kubectl apply -f rbac

Define a storage class

Create a storage class definition file, such as storage-class.yaml containing:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: rbd
provisioner: ceph.com/rbd
parameters:
  monitors: 192.168.0.150:6789
  pool: kube
  adminId: admin
  adminSecretName: ceph-admin-secret
  userId: kube
  userSecretName: ceph-secret
  imageFormat: "2"

Apply the storage class:

kubectl apply -f storage-class.yaml

Try it out

Test the storage class by applying a persistent volume claim:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: claim1
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: rbd
  resources:
    requests:
      storage: 1Gi

Verify that the claim was provisioned and is bound by using

kubectl describe pvc claim1

Let’s test otf the persistent volume backed by ceph rbd by running a little busybox container that just sleeps so that we can exec into it:

apiVersion: v1
kind: Pod
metadata:
  name: ceph-pod1
spec:
  containers:
  - name: ceph-busybox
    image: busybox
    command: ["sleep", "60000"]
    volumeMounts:
    - name: data
      mountPath: /data
      readOnly: false
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: claim1

Now exec into the pod using

kubectl exec -it ceph-pod1 sh

You can cd /data and touch/edit files in that directory, delete the pod, and re-create a new one to confirm the content from the persistent volume claim sticks around.

Written with StackEdit.