infrastructure/README.md

2.4 KiB

Infrastructure

Build Status

Logging into drone/gitea

These are protected by basic HTTP auth and logging in is a pain the arse. To log in, temporarily disable it by commenting out the middlewares in gitea.yaml and drone.yaml and minio.yaml, then

kubectl apply -f apps

Now log in (make sure you click Remember me), then undo the yaml changes and re-apply

PS use Chrome for Drone!!! Also, drone will not trigger on git push if HTTP auth is enabled for gitea- disable auth and push again.

Setting up server

  • install docker
  • install k3s
  • apt-get install tmate cifs-utils

Backups

Longhorn

apt-get -y install open-iscsi nfs-common jq
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/scripts/environment_check.sh | bash

Velero

velero install \
    --use-node-agent \
    --privileged-node-agent \
    --uploader-type=restic \
    --features=EnableCSI \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.2.1 \
    --bucket velero \
    --secret-file ./secrets/credentials-velero \
    --use-volume-snapshots=true \
    --backup-location-config region=eu,s3ForcePathStyle="true",s3Url=https://eu2.contabostorage.com \
    --wait

If there's an issue with the credentials:

kubectl create secret generic cloud-credentials --namespace velero --from-file=cloud=./secrets/credentials-velero --dry-run=client -o yaml | kubectl apply -f -

Connect to services

Postgres: kubectl -n databases port-forward pod/postgres-0 5432:5432

Runbook

Failing health checks

KUBE_CONFIG is a secret on Drone https://drone.nocodelytics.com/nocodelytics/healthcheck/settings/org-secrets Value needs to come from /etc/rancher/k3s/k3s.yaml from the server

This will expire once a year, needs to be renewed per https://docs.k3s.io/cli/certificate

# ssh into server
systemctl stop k3s
k3s certificate rotate
systemctl start k3s

Then base64 encode it cat /etc/rancher/k3s/k3s.yaml | base64 -i -

The same kube config, NOT encoded, goes to ~/.kube/config, but the server section needs to be edited to point to the server IP

Disk space issues

Find the persistent volume that's full, eg in clickhouse.yaml, edit ONLY resources.requests.storage section, then kubectl apply -f ...