Go to file
Miroslav Batchkarov 6ffa3193d3 increase pg volume size
the cache table can get big, we don't want an outage
2025-03-11 20:47:12 +01:00
.vscode checkpoint 2023-12-08 17:12:01 +00:00
apps remove http auth on gitea- breaks Drone and health checks 2025-01-28 13:22:36 +01:00
databases increase pg volume size 2025-03-11 20:47:12 +01:00
dependencies add http-auth to prometheus 2024-07-26 15:08:50 +01:00
kustomization set POSTGRES_HOST to pgBouncer 2024-07-26 15:18:00 +01:00
secrets@4ab80d3aa8 add http-auth to prometheus 2024-07-26 15:08:50 +01:00
sysadmin increase log retention- not much to store 2025-03-11 20:46:50 +01:00
.DS_Store more cleanup 2023-12-01 17:41:51 +00:00
.drone.yml drone network_mode: host 2024-03-29 12:29:31 +00:00
.gitignore add livenessProbe to events worker 2024-01-25 17:41:32 +00:00
README.md update readme 2025-01-28 10:56:42 +01:00

README.md

Infrastructure

Build Status

Logging into drone/gitea

These are protected by basic HTTP auth and logging in is a pain the arse. To log in, temporarily disable it by commenting out the middlewares in gitea.yaml and drone.yaml and minio.yaml, then

kubectl apply -f apps

Now log in (make sure you click Remember me), then undo the yaml changes and re-apply

PS use Chrome for Drone!!! Also, drone will not trigger on git push if HTTP auth is enabled for gitea- disable auth and push again.

Setting up server

  • install docker
  • install k3s
  • apt-get install tmate cifs-utils

Backups

Longhorn

apt-get -y install open-iscsi nfs-common jq
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/scripts/environment_check.sh | bash

Velero

velero install \
    --use-node-agent \
    --privileged-node-agent \
    --uploader-type=restic \
    --features=EnableCSI \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.2.1 \
    --bucket velero \
    --secret-file ./secrets/credentials-velero \
    --use-volume-snapshots=true \
    --backup-location-config region=eu,s3ForcePathStyle="true",s3Url=https://eu2.contabostorage.com \
    --wait

If there's an issue with the credentials:

kubectl create secret generic cloud-credentials --namespace velero --from-file=cloud=./secrets/credentials-velero --dry-run=client -o yaml | kubectl apply -f -

Connect to services

Postgres: kubectl -n databases port-forward pod/postgres-0 5432:5432

Runbook

Failing health checks

KUBE_CONFIG is a secret on Drone https://drone.nocodelytics.com/nocodelytics/healthcheck/settings/org-secrets Value needs to come from /etc/rancher/k3s/k3s.yaml from the server

This will expire once a year, needs to be renewed per https://docs.k3s.io/cli/certificate

# ssh into server
systemctl stop k3s
k3s certificate rotate
systemctl start k3s

Then base64 encode it cat /etc/rancher/k3s/k3s.yaml | base64 -i -

The same kube config, NOT encoded, goes to ~/.kube/config, but the server section needs to be edited to point to the server IP

Disk space issues

Find the persistent volume that's full, eg in clickhouse.yaml, edit ONLY resources.requests.storage section, then kubectl apply -f ...