Building CloudDB: A Self Hosted Database Platform for Kubernetes

Share:

Running databases on Kubernetes has a reputation for being painful. The tooling is powerful, but it is scattered, the learning curve is steep, and every team ends up writing the same glue code to automate the same routine tasks. I wanted to change that, at least for my own projects, so I built CloudDB: a Kubernetes native, self hosted Database as a Service (DBaaS) control plane with a clean React dashboard and a Go API that talks directly to the cluster.

This post walks through the architecture, the design decisions, and what it took to get this running in production.


Why run databases on Kubernetes?

Managed database services from cloud providers are convenient, but they come with real trade offs: vendor lock in, limited configuration access, surprise bills, and the inability to run things on your own hardware. On the other hand, running databases on raw VMs or bare metal requires deep operational knowledge. You are on the hook for provisioning, upgrades, backups, and recovery.

What was missing was a middle ground: a control plane that wraps the existing, battle tested Kubernetes database operators (which do the heavy lifting), and exposes them through a single, clean UI. That is exactly what CloudDB is.


Designing the control plane

At its core, CloudDB lets you spin up and manage production grade PostgreSQL clusters from a browser, without touching kubectl or writing a single line of YAML. Under the hood, every action translates to a Kubernetes Custom Resource that the Zalando Postgres Operator reconciles.

Here is a look at what the platform handles today:

Cluster Provisioning

You fill out a form (name, PostgreSQL version, CPU, memory, storage limits, replica count, and whether you want an external load balancer) and CloudDB takes care of the rest. The Go backend translates that request into a postgresqls.acid.zalan.do Custom Resource and applies it to the cluster via client go. No YAML, no manual patching.

Day 2 Operations

A cluster does not need management just on day one. CloudDB covers the ongoing operations that would otherwise require kubectl access:

  • Scale: adjust CPU, memory, and replica count at any time
  • Pause: scale a cluster to zero replicas to save resources during off hours or in development environments
  • Resume: bring a paused cluster back to life with a single click
  • Delete: cleanly remove a cluster with a confirmation dialog to prevent accidental deletions

PostgreSQL Major Version Upgrades

This is the feature I am most proud of, and the one production teams care about most.

Upgrading a PostgreSQL major version (say, from 14 to 16) is notoriously risky. The Zalando operator supports it through pg_upgrade, but triggering it incorrectly can cause downtime or data integrity issues. CloudDB wraps the entire process in a 5 check preflight validation layer before a single byte is changed:

  1. The operator must be in manual or full upgrade mode
  2. The target version must be strictly greater than the current one
  3. The version jump cannot exceed 2 major versions at a time
  4. The cluster must be in Running status
  5. The cluster must have 5 or fewer replicas (larger clusters should use the operator CLI directly)

If all checks pass, the backend patches the cluster resource and the operator handles the in place upgrade. On the frontend, users go through a 3 step confirmation flow: choose the target version, acknowledge the irreversibility with a checkbox, then type the cluster name to confirm. It is intentionally full of friction. The kind of guardrails that save you at 2am.

Logical Backups

Each cluster can be configured with pg_dumpall based scheduled backups that upload to S3. The schedule, S3 bucket, region, server side encryption settings, and retention time are all configurable per cluster via the Edit page. Credentials pass through the system but are redacted in read responses: the API returns *** for sensitive fields to avoid accidental exposure in logs or UIs.

Operator Settings

A dedicated Settings page lets you manage the underlying Postgres Operator itself:

  • Update the operator version by patching the Deployment image tag directly
  • Switch the major version upgrade mode (off, manual, or full)
  • Configure global backup settings including the cloud provider (S3, GCS, or Azure), credentials, scheduling, and retention

This is the kind of operational control that usually requires direct cluster access. Here, it is a simple form in the UI.


The tech stack

The architecture is intentionally simple: a single developer can run it locally with one command, and it is solid enough to use in a real environment.

Browser
  │  HTTP (Vite dev proxy → /api/v1)
  ▼
Frontend  (React 18 + Vite + Tailwind CSS)
  │  REST / JSON
  ▼
Backend   (Go 1.25 + Gin)
  │  client go (dynamic + typed clients)
  ▼
Kubernetes Cluster (any distribution)
  └─ postgresqls.acid.zalan.do CRDs
  └─ operatorconfigurations CRDs

Backend (Go + Gin): A straightforward RESTful API with ten endpoints covering the full cluster lifecycle plus operator management. The key design choice here was to use the Kubernetes dynamic client for CRD operations. This avoids generating type code for the Zalando CRDs and makes it much easier to add support for other database operators in the future.

Frontend (React + TypeScript + Tailwind): A dark mode dashboard with a typed Axios API client, toasts for non blocking feedback, and custom confirmation dialogs that replace all native browser popups. The 3 step upgrade modal is a good example of how the UI enforces safety workflows that the backend alone cannot guarantee.

Local development: The full stack comes up with make dev using Docker Compose. The Go container auto detects Mac and Windows environments and routes Kubernetes API calls through host.docker.internal, so there is no manual kubeconfig editing involved.


Pragmatic engineering choices

A few principles guided every decision in this project:

Correctness over convenience. The upgrade preflight checks, the credential redaction, the confirmation dialogs: these all add friction. That friction is intentional. A tool that is easy to misuse is a liability in production.

Operator agnostic by design. The backend translation layer (translator.go) cleanly separates user intent from the Kubernetes resource shape. Adding support for a new database engine (say, MySQL via the PlanetScale operator) means writing a new translator function and registering a GroupVersionResource. The rest of the stack does not change.

Minimal dependencies, maximal clarity. The backend has no ORM, no message queue, and no internal memory state. Every request reads directly from Kubernetes and writes directly to it. Kubernetes is the source of truth, which is exactly what it is designed to be.


Roadmap

The project is actively evolving. Here are the areas I am currently working on:

  • Multi engine support: MySQL and Redis operator integration are on the roadmap, following the same translation layer pattern
  • Monitoring integration: surfacing Prometheus metrics (connections, replication lag, query throughput) directly in the cluster detail view
  • RBAC: the platform currently runs with a single kubeconfig; fine grained role based access control is the next major infrastructure piece
  • Helm chart: packaging the full platform as a Helm chart for one command deployment into existing clusters

Looking ahead

CloudDB started as a personal project to scratch a real itch: I wanted a way to manage Kubernetes PostgreSQL clusters without constantly reaching for kubectl. It has grown into a full featured control plane that covers provisioning, day to day operations, major version upgrades, backups, and operator configuration.

If you are running databases on Kubernetes and want to reduce your operational overhead, let us talk. I am available for consulting engagements around Kubernetes database infrastructure and custom control planes.