Introductions

So this is my collection of thoughts, ideas and craziness documented for everyone to see. I have a lot of different interests that I cycle through irregularly. It's hard to keep track of the bits and pieces so I figure I should write some of this down.

I don't like blogging, much to much like trying to keep a journal or a dairy ( not that there is anything wrong in that ). Instead I have specific, task-focused documents that I want to keep around so that when I revisit a topic, I can pick up where I left off.

If others find my documents and notes of interest, then that's great. That is not the reason why I'm doing this though.

If folks have suggestions or corrections, then great! Create a fork, branch and submit a PR. I don't mind being corrected and giving credit where credit is due. I'm not right all the time and there are much more knowledgable folks out there then me.

Kubernetes

Kubernetes is a big deal, and is something that I enjoy using.

I have decided to document my kubernetes setup so that other folks can follow along and see what I have going on, and maybe learn a thing or two along the way. It is also documented because I tend to rebuild the mess every couple of months and I usually forget something along the way!

Order of Operations

Installing Kubernetes

This is for a single host installation. I'll include instructions for adding an additional host down the line if you are so inclined.

I have consolidated a number of different documents, mostly for the troubleshooting.

Setup

This is for installation on an Ubuntu 24.04 LTS machine. I would recommend a machine with at least 8GiB of RAM, 16GiB of hard drive space and at least two cores. The more the better things will run and the more stuff you can cram into Kubernetes.

Make sure your server is up to date before we get started.

sudo apt update && sudo apt full-upgrade -y

Step By Step

Step One: Disable Swap

Kubeadm will complain if swap is enabled, so let's disable able that.

sudo swapoff -a

Step Two: Kernel Parameters

There are some parameters that need to be tuned in the linux kernel for Kubernetes to work properly.

sudo tee /etc/modules-load.d/containerd.conf << EOF
overlay
br_netfilter
EOF

sudo tee /etc/sysctl.d/kubernetes.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
EOF

sudo modprobe overlay && sudo modprobe br_netfilter && sudo sysctl --system

Step Three: Container Runtime Installation

Let's install containerd.

sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/docker.gpg && \
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" && \
sudo apt update && \
sudo apt install -y containerd.io

Need to have a configuration file for containerd. Luckily, we can generate one.

containerd config default | sudo tee /etc/containerd/config.toml >/dev/null 2>&1 
sudo sed -i 's/SystemdCgroup \= false/SystemdCgroup \= true/g' /etc/containerd/config.toml

Then we'll need to enable and restart containerd.

sudo systemctl enable containerd && \
sudo systemctl restart containerd

Step Four: Kubernetes Runtime Installation

This will install kubernetes 1.34 to be installed on the machine. At the time of writing, it is the most current version.

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.34/deb/Release.key | sudo gpg  --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg && \
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.34/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list && \
sudo apt update && \
sudo apt install -y kubelet kubeadm kubectl

Step Five: Kubeadm Execution

Now for the part that we've all been waiting for! The prerequisites are in place so now it's time to get kubernetes up and running.

sudo kubeadm init

This will take a little while to run.

Step Six: Tool Configuration

Once the installation is complete, you will need to configure kubectl. Execute the following:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

This will copy the configuration to your home directory. This will allow you to use kubectl from anywhere on the system.

Step Seven: Remove the Taints

This is a single node system, which means that it's also running as a control panel. Normally, kubernetes doesn't want additional pods on the control panel, which makes for an interesting catch-22. Thankfully, we can fix that.

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

Step Eight: Container Networking

In order for the node to become ready, it will need networking installed. For simple, one node installs I personally think Calico is perfectly reasonable.

kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.3/manifests/calico.yaml

It will take a little bit for the networking jitters to settle.

Step Nine: Does it work?

Let's pull up the cluster information

kubectl cluster-info

Do all nodes show READY?

kubectl get nodes -o wide

What do the pods look like?

kubectl get pods -o wide -A

Step Ten: Install Something!

You can also create a very basic nginx deployment just to see if things work.

kubectl apply -f https://gist.githubusercontent.com/bbrietzke/c59b6132c37ea36f9b84f1fee701a642/raw/952524cec7892e9db350fc62773c32ddfd9ab867/kubernetes-test.yaml

Then:

open http://kubernetes-host.local:30081/

And you should see a website in the browser of your choice.

Local Setup

There are some tools that I like to have on my local machine that makes working with Kubernetes much easier. This document will go through the installation and configuration of them.

What to install?

Since I'm on a Mac, everything is installed through (Homebrew)[https://brew.sh/].

brew install k9s helm kubernetes-cli

Configuration

Kubernetes Tools

The only thing that really needs configuration is the kubernetes configuration tools. For that, you need to get a copy of the kube config file from the control-plane.

mkdir ~/.kube
scp ubuntu@basil.local:.kube/config ~/.kube/config

This configures both k9s and kubectl, so that bit is done. Both of the tools should work as you would expect.

Helm

You don't have to configure helm, but it's not a bad idea either. My personal chart repository is configured as follows.

helm repo add bbrietzke http://bbrietzke.github.io/charts

I have a few charts that I tend to use, in particular for setting up namespaces. I have three, prod, dev, and infra. There isn't much else to customize, so this just works.

helm install namespaces bbrietzke/namespaces

It's also the first helm chart I created.

Adding a Worker Node

On the control plane, all you need is:

kubeadm token create --print-join-command

Just copy and paste that over to the new worker node and it will do the rest.

Managing Certificates in Kubernetes

Dealing with TLS certificates is a pain in butt!

This document is just a reshash/shorten view with my specific configuration. You can find the full documentation over at cert-manager.io.

Installation

Helm

helm repo add jetstack https://charts.jetstack.io && helm repo update

There are a number of ancilliary resources that have to be installed. You can do it manually, or let the helm chart do it ( which is what I did ).

helm install \
  cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.11.0 --set installCRDs=true

Go over verify section on the official docs to make sure it's working.

Configuration

You have to create issuers per namespace that will actually create and distribute the certificates. It's one of those resources that you created when you installed the helm charts.

Self-Signed

I created self-signed certificates for my namespaces just because.

Here is an example CRD:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: dev-selfsigned-issuer
  namespace: dev
spec:
  selfSigned: {}

I'm sure I'll write up a Helm chart at somepoint with the issuers that I need.

Using the Certificate Manager

The certificates are mostly used by your ingress controllers to prove that the domain is valid and encrypt the communications between the origin and the client. I'm sure that can be used else where, but this is the scenerio that I use them for.

You will need to modify the ingress resource defination to be similiar to:

...
kind: Ingress
metadata:
    namespace: dev
    annotation:
        cert-manager.io/issuer: dev-selfsigned-issuer
...

The namespace must match the name of the issuer for that namespace.

NFS Persistent Volumes

Do your pods need to have persitent volumes for your home Kubernetes cluster? Turns out, NFS is an option

https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner

Helm Charts

You can add the helm chart with:

helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/

Then always pull the values, since you will need to customize the NFS server IP and path

helm show values nfs-subdir-external-provisioner/nfs-subdir-external-provisioner

An example customized one looks like the following:

nfs:
  server: 10.0.0.155
  path: /srv/nfs_shares/kubernetes

And of course, we have to provision:

helm install -f nfs_prov.yml nfs-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner -n infra

Metrics server is not quite as easy to install as advertise, at least on system installed with kubeadm. But, the fix to make it work is pretty simple, assuming you do it the insecure way.

The correct way is more complicated, but also secure.

Installing

The code and instructions to install metrics server can be found here. Again, not repeating them here so they go stale.

The instructions work perfectly, things get installed and then don't work. At all.

The metrics pod just never becomes ready and then logs complain about SSL certificates being invalid.

Cheating and Being Insecure

The easiest option is to add --kubelet-insecure-tls to the spec.container.args array. I added it as the last one.

You can make this edit either in the deployment or prior to doing a kubectl apply if you download the manifest files.

RAID Arrays

Let's build a few different kinds of RAID arrays and then use them.

I'm not going to go into detail about what RAID is, or the different levels since there is plenty of documentation out there already. These are the commands to setup software RAID for Linux and the general workflow to follow.

Do we have any now?

Let's just double check

cat /proc/mdstat

Okay, which Devices?

Insert the new drives into the machine. You should know what the come up as, but if you don't, try:

lsblk

That should get you a list of all the block devices on the system and where they are being used at. Some of them don't make sense, but you should see the ones that you just added. If needed, run the command and copy down the results. Then insert the drives and execute it again.

I'll be using the following:

/dev/sda
/dev/sdb
/dev/sdc
/dev/sdd

Create the ARRAYS!

RAID 0

We'll start with RAID 0, since it will allow us to use all the drives as one big ( though not redundant ) block device.

sudo mdadm --create --verbose /dev/md0 --level=raid0 --raid-devices=4 /dev/sda /dev/sdb /dev/sdc /dev/sdd

RAID 1

Simple mirroring. The easiest to use, a decent redundancy package and not all that wasteful.

sudo mdadm --create --verbose /dev/md0 --level=raid1 --raid-devices=2 /dev/sda /dev/sdb

RAID 5

Probably the best all around choice. Best use of capacity and good performance.

sudo mdadm --create --verbose /dev/md0 --level=raid5 --raid-devices=4 /dev/sda /dev/sdb /dev/sdc /dev/sdd

Creating the FileSystem

sudo mkfs -t ext4 /dev/md0

Mounting

You should have these drives come up every time you want to use them, so add the entries to /etc/fstab.

First you need the UUID of the array.

sudo blkid /dev/md0

Take the UUID and the following string and open up /etc/fstab to add something along the lines of:

UUID=655d2d3e-ab31-49c7-9cc3-583ec81fd316 /srv ext4 defaults 0 0

Then you can execute sudo mount -a and have the array appear where you wanted it.

Update config

sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf

then

sudo update-initramfs -u

Destroying RAID Arrays

You create RAID drives, so you should know who to tear them down.

So we built a few RAID arrays and mounted them, so that's grand. Now let's tear them down.

Unmount everything

Make sure you have the arrays unmounted from there normal ( or abnormal ) paths. If you don't, not much of this is going to work out well.

Check mdstat

Lets check to see which arrays we currently have configured.

cat /proc/mdstat

The above should return any arrays you current have built and what state they are in. In this case, we need to know the names of the arrays so that we can remove them.

Remove Arrays

Okay, let's stop the arrays.

mdadm --stop /dev/md127  # or whatever was returned above

Now that we don't have working arrays, we can zero them out so that they are empty.

mdadm --zero-superblock /dev/sda /dev/sdb /dev/sdc /dev/sdd

Trust, but Verify

Rerun the mdstat and make sure those pesky things are gone.

Custom Host Publishing

If you're like me, you probably have AVAHI running on all your servers just to make name resolution simplier. What you probably didn't know is you can do some neat tricks with Avahi.

Neat Trick Number One

There is a command call avahi-publish that will publish a hostname on your network. This is pretty cool, because it means you don't have to remember to add the hostname to your hosts file. It also means you can use this to publish a hostname on a server that doesn't have Avahi installed.

For example:

avahi-publish -a -R pandora.local 192.168.1.10 # or what-ever IP address that you want...

Now you can ping pandora.local and it will respond!

What good is this? Imagine giving your router an avahi registered name and you can log into it without having to remember the IP. If you're on Xfinity, you can do:

avahi-publish -a -R router.local 10.0.0.1

You will be able to ping router.local and it will respond!

Neat Trick Number Two

So pinging a router by name is nice and all, but not really that exciting.

What you can do is combine the above with Kubernetes' ingress resource definations to have multiple ingress' on the same host without having to anything magical to DNS.

Neat Trick Number Three

Again, neat-o and all, but now you have a terminal up and running hosting names and that's just a waste of energy. What if the terminal window closes or the machine resets? Then you have to manually execute the commands to get the network back online.

Systemd to the rescue!

[Unit]
Description=Avahi OwnCloud

[Service]
ExecStart=/usr/bin/avahi-publish -a -R pandora.local 10.0.0.238
Restart=always

[Install]
WantedBy=default.target

Save the file in /etc/systemd/system ( i.e. /etc/systemd/system/avahi-pandora.service ). Then treat it as any normal service.

sudo systemctl enable avahi-pandora
sudo systemctl start avahi-pandora

Systems and Application Monitoring with Prometheus

Trying to figure out what is going on when something is broke can be hard, so it's nice to have tooling to help with that. Prometheus is one such tool. It can also, with proper tuning and work, tell you before something is going to break.

And it make pretty graphs. Everybody loves pretty graphs!

Prometheus?

Prometheus monitoring solution is a free and open-source solution for monitoring metrics, events, and alerts. It collects and records metrics from servers, containers, and applications. In addition to providing a flexible query language (PromQL), and powerful visualization tools, it also provides an alerting mechanism that sends notifications when needed.

Prerequisites

A machine that can run Ubuntu 22.04 ( or other LTS ).

You should also have basic administrative knowledge and an account that has sudo access on the above box.

Installation

Update the system

sudo apt update && sudo apt -y upgrade

Create the Prometheus User Account

sudo groupadd --system prometheus
sudo useradd -s /sbin/nologin --system -g prometheus prometheus

Create Directories

These are for configuration files and libraries.

sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus

Install Prometheus

Now for the fun part!

The tarball used for this is the latest ( as of this writing ) LTS for Prometheus. You can change it to be what you need.

wget https://github.com/prometheus/prometheus/releases/download/v3.5.0/prometheus-3.5.0.linux-amd64.tar.gz
tar zvxf prometheus*.tar.gz
cd prometheus*/
sudo mv prometheus /usr/local/bin
sudo mv promtool /usr/local/bin
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
sudo mv prometheus.yml /etc/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown -R prometheus:prometheus /var/lib/prometheus

cd ..
rm -rf prom*

Configuring

The configuration guide can be found over here. It will step you through most every option in the most confusing way possible. I amd including a sample of my personal configuration in the hopes that a real world example makes more sense.

sudo nano /etc/prometheus/prometheus.yml

Here is what my simple configuration file looks like:

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
           - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
        labels:
          groups: 'monitors'
  - job_name: 'servers'
    static_configs:
      - targets:
          - 'atlas.faultycloud.lan:9182'
          - 'coeus.faultycloud.lan:9182'
          - 'gaia.faultycloud.lan:9182'
          - 'hyperion.faultycloud.lan:9182'
        labels:
          groups: 'win2022'
  - job_name: 'gitlab'
    static_configs:
      - targets:
          - '192.168.1.253:9090'
        labels:
          groups: 'development'

Run at Startup

sudo nano /etc/systemd/system/prometheus.service

with

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ 

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus

Systems and Application Notifications with Alert Manager

So you have Prometheus, the next step is AlertManager, which will notify you when an something goes awry.

AlertManager?

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.

Prerequisites

A machine that can run Ubuntu 22.04 ( or other LTS ).

You should also have basic administrative knowledge and an account that has sudo access on the above box.

Installation

Update the system

sudo apt update && sudo apt -y upgrade

Create the AlertManager User Account

sudo groupadd --system alertmanager
sudo useradd -s /sbin/nologin --system -g alertmanager alertmanager

Create Directories

These are for configuration files and libraries.

sudo mkdir /etc/alertmanager
sudo mkdir /var/lib/alertmanager

Install AlertManager

Now for the fun part!

// https://prometheus.io/download/#alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.28.1/alertmanager-0.28.1.linux-amd64.tar.gz
tar zvxf alertmanager*.tar.gz
cd alertmanager*/
sudo mv alertmanager /usr/local/bin
sudo mv amtool /usr/local/bin
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager
sudo chown alertmanager:alertmanager /usr/local/bin/amtool
sudo chown alertmanager:alertmanager /var/lib/alertmanager

sudo mv alertmanager.yml /etc/alertmanager

cd ..
rm -rf alert*

Run at Startup

sudo nano /etc/systemd/system/alertmanager.service

with

[Unit]
Description=AlertManager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file /etc/alertmanager/alertmanager.yml \
    --storage.path=/var/lib/alertmanager

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable alertmanager
sudo systemctl start alertmanager
sudo systemctl status alertmanager

Node Exporter

Runbooks

Runbooks are a way of documenting procedural Information Technology information that is repetitive in nature.

Most of the time, you see runbooks as a way of troubleshooting a problem, diagnosising an issue or a procedure.

This is a collection of the runbooks that I have decided to document for my area.

Create PostgreSQL Database and User

Alerts

Pending Apt Package Upgrades

Create User for PostgreSQL

Overview

Pre-Requistes

Access to login into database server via ssh.
sudo access on database server

Steps

Log into database server via SSH.
Sudo into the postgres user and execute the psql command line tool.
- sudo -u postgres psql
Create the user with and encrypted password:
- We recommend using a long and complex password since the values will only be saved here and can be input into the target system at the same time.
- The user name should reflect the database that they are primarily accessing.
- CREATE USER db01 WITH ENCRYPTED PASSWORD 'R3@llyL0ngP@ssw0rd123456790';
Grant the new user dbcreate privileges:
- ALTER USER db01 CREATEDB;

Troubleshooting

If the target system can not log into the database after the database/user creation process, you can simply re-run the above steps to make sure to make sure that they are correct.
If the user is present, but the password has been forgotten, you may reset the password as follows:
- ALTER USER db01_user WITH ENCRYPTED PASSWORD '3^On9D4p59^4';

Completion and Verification

You can log into the target database from callisto.lan to verify if everything is setup correctly

psql -h 'callisto.lan' -U 'db01' -d 'db01_prod'

It will prompt you for the password, which we have available and should provide. If the login occurs, then the database/user/permissions should be okay.

* 2023/04/06 - Created