Go to file
Miroslav Batchkarov c772633e72 add some lessons learned the hard way 2024-12-06 13:00:14 +01:00
.vscode checkpoint 2023-12-08 17:12:01 +00:00
apps add http-auth to prometheus 2024-07-26 15:08:50 +01:00
databases expand clickhouse volume too 2024-12-06 12:54:22 +01:00
dependencies add http-auth to prometheus 2024-07-26 15:08:50 +01:00
kustomization set POSTGRES_HOST to pgBouncer 2024-07-26 15:18:00 +01:00
secrets@4ab80d3aa8 add http-auth to prometheus 2024-07-26 15:08:50 +01:00
sysadmin temporarily expand loki volume 2024-12-05 22:56:56 +01:00
.DS_Store more cleanup 2023-12-01 17:41:51 +00:00
.drone.yml drone network_mode: host 2024-03-29 12:29:31 +00:00
.gitignore add livenessProbe to events worker 2024-01-25 17:41:32 +00:00
README.md add some lessons learned the hard way 2024-12-06 13:00:14 +01:00

README.md

Infrastructure

Build Status

Setting up server

  • install docker
  • install k3s
  • apt-get install tmate cifs-utils

Backups

Longhorn

apt-get -y install open-iscsi nfs-common jq
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/scripts/environment_check.sh | bash

Velero

velero install \
    --use-node-agent \
    --privileged-node-agent \
    --uploader-type=restic \
    --features=EnableCSI \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.2.1 \
    --bucket velero \
    --secret-file ./secrets/credentials-velero \
    --use-volume-snapshots=true \
    --backup-location-config region=eu,s3ForcePathStyle="true",s3Url=https://eu2.contabostorage.com \
    --wait

If there's an issue with the credentials:

kubectl create secret generic cloud-credentials --namespace velero --from-file=cloud=./secrets/credentials-velero --dry-run=client -o yaml | kubectl apply -f -

Connect to services

Postgres: kubectl -n databases port-forward pod/postgres-0 5432:5432

Runbook

Failing health checks

KUBE_CONFIG is a secret on Drone https://drone.nocodelytics.com/nocodelytics/healthcheck/settings/org-secrets Value needs to come from /etc/rancher/k3s/k3s.yaml from the server

This will expire once a year, needs to be renewed per https://docs.k3s.io/cli/certificate

# ssh into server
systemctl stop k3s
k3s certificate rotate
systemctl start k3s

Then base64 encode it cat /etc/rancher/k3s/k3s.yaml | base64 -i -

The same kube config, NOT encoded, goes to ~/.kube/config, but the server section needs to be edited to point to the server IP

Disk space issues

Find the persistent volume that's full, eg in clickhouse.yaml, edit ONLY resources.requests.storage section, then kubectl apply -f ...