Install Michelangelo with Helm

The michelangelo Helm chart installs the Michelangelo control plane on any Kubernetes cluster — production, staging, or a local development cluster. After installation you can manage the platform with standard helm install, helm upgrade, and helm uninstall commands.

This guide walks you through prerequisites, a minimal install, verification, customization, and upgrade or removal. The final sections explain the chart's design so you know what you are getting.

Who this is for

This guide is for platform engineers and infrastructure operators who want to run Michelangelo on a Kubernetes cluster they manage. By the end, you will have:

A running Michelangelo control plane on your cluster
A clear understanding of what the chart owns and what you must provide
The commands to upgrade, customize, and remove the release

If you are looking for a fully scripted local sandbox instead, see Dev Environment Setup.

Prerequisites

Before you install, make sure you have:

Requirement	Version	Notes
Kubernetes cluster	1.27 or newer	Chart's `kubeVersion` constraint is `>=1.27.0-0`
`kubectl`	Compatible with your cluster	Configured with cluster admin or equivalent permissions
`helm`	3.8 or newer	Required for OCI dependencies and the `lookup` template function

You also need the following infrastructure reachable from the cluster before you install the chart. The chart expects you to point it at running services — it does not create them (unless you opt into a bundled subchart).

Component	What the chart needs	Examples
Metadata storage	A reachable MySQL or PostgreSQL endpoint, plus credentials (root password OR an existing Kubernetes Secret)	RDS, Cloud SQL, an in-cluster pod
Object storage	An S3-compatible endpoint and access keys (or an existing Secret)	S3, GCS, MinIO
Workflow engine	A Cadence or Temporal frontend address — or enable a bundled subchart (see Self-contained install)	Managed Cadence/Temporal, in-cluster install

Quick start

A minimal install pointing at existing infrastructure:

helm install michelangelo ./helm/michelangelo \
  --set metadataStorage.host=mysql.example.com \
  --set metadataStorage.rootPassword=changeme \
  --set objectStorage.endpoint=s3.amazonaws.com \
  --set objectStorage.accessKeyId=AKID \
  --set objectStorage.secretAccessKey=SECRET \
  --set workflow.endpoint=cadence-frontend.example.com:7833

If any required value is missing, helm install exits before creating any resources and tells you which value to set. See the values reference for the full list.

Verify the install

kubectl get deployments -l app.kubernetes.io/instance=michelangelo

You should see five Deployments, each with READY 1/1:

NAME                          READY   UP-TO-DATE   AVAILABLE
michelangelo-apiserver        1/1     1            1
michelangelo-controllermgr    1/1     1            1
michelangelo-envoy            1/1     1            1
michelangelo-ui               1/1     1            1
michelangelo-worker           1/1     1            1

Run the chart's built-in connectivity test:

helm test michelangelo

Reach the UI

For a default ClusterIP install, port-forward to your laptop:

kubectl port-forward svc/michelangelo-ui 8080:80

Then open http://localhost:8080 in your browser. For production, see Expose the UI and API below.

Customization

Disable services you do not need

helm install michelangelo ./helm/michelangelo \
  --set ui.enabled=false \
  --set envoy.enabled=false \
  ...

Restrict the controller to specific namespaces

By default controllermgr watches all namespaces (it gets a ClusterRole). Set controllermgr.watchNamespace to a list to switch to namespaced Role + RoleBinding:

controllermgr:
  watchNamespace:
    - team-ml
    - team-ranking

Use a values file

Copy helm/michelangelo/values.yaml, edit it, and pass with -f:

helm install michelangelo ./helm/michelangelo -f my-values.yaml

Use existing Secrets (GitOps-friendly)

Instead of passing credentials through --set, point the chart at Secrets you manage out-of-band:

metadataStorage:
  existingSecret: my-mysql-secret    # must contain key: rootPassword
objectStorage:
  existingSecret: my-s3-secret       # must contain keys: accessKeyId, secretAccessKey

Expose the UI and API

The chart includes per-service Ingress templates (apiserver.ingress, envoy.ingress, ui.ingress). Enable and configure them to expose the UI and API outside the cluster:

ui:
  ingress:
    enabled: true
    className: nginx
    hosts:
      - host: michelangelo.example.com
        paths: ["/"]
    tls:
      - secretName: michelangelo-ui-tls
        hosts: [michelangelo.example.com]

envoy:
  ingress:
    enabled: true
    # ... mirror UI config for the gRPC-Web endpoint

ui:
  apiBaseUrl: https://michelangelo.example.com/api    # match your Envoy/Ingress URL

Enable TLS on the API server

The apiserver gRPC port supports TLS via apiserver.tls.*:

apiserver:
  tls:
    enabled: true
    secretName: michelangelo-apiserver-tls

Local development on k3d

A ready-made overrides file for k3d clusters switches services to NodePort, enables the bundled Cadence subchart, and points at in-cluster infrastructure:

helm install michelangelo ./helm/michelangelo -f helm/michelangelo/values-k3d.yaml

See Dev Environment Setup for the full local workflow.

Self-contained install with Cadence or Temporal

If you do not have a workflow engine available, the chart can install one for you. Both Cadence and Temporal are declared as optional subcharts and are disabled by default. Enable exactly one — templates/validations.yaml rejects installs that enable both.

Cadence subchart

helm dependency update helm/michelangelo

helm install michelangelo ./helm/michelangelo \
  --set cadence.enabled=true \
  --set workflow.engine=cadence \
  --set workflow.endpoint=michelangelo-cadence-frontend:7833 \
  ...   # other required infrastructure values

Cadence subchart values are namespaced under cadence:. A common setup shares MySQL with the control plane but uses a separate cadence database:

cadence:
  enabled: true
  persistence:
    defaultStore: mysql
    mysql:
      driver: mysql
      host: mysql.example.com
      port: 3306
      database: cadence
      user: root
      password: changeme
  web:
    enabled: true

Temporal subchart

helm install michelangelo ./helm/michelangelo \
  --set temporal.enabled=true \
  --set workflow.engine=temporal \
  --set workflow.endpoint=michelangelo-temporal-frontend:7233 \
  ...

Pass Temporal subchart values under the temporal: key. See the official Temporal Helm chart for the full surface.

Upgrade

helm upgrade michelangelo ./helm/michelangelo --reuse-values

To change a single value:

helm upgrade michelangelo ./helm/michelangelo --reuse-values \
  --set ui.enabled=true

Review the chart's CHANGELOG.md before upgrading across a minor version.

Uninstall

helm uninstall michelangelo

This removes all Deployments, Services, ConfigMaps, RBAC, and CRDs created by the chart. Two Secrets are intentionally retained: metadata-storage-secret and object-storage-secret. They are annotated helm.sh/resource-policy: keep so an uninstall/install cycle does not destroy externally-injected credentials.

Delete them manually for a fully clean slate:

kubectl delete secret metadata-storage-secret object-storage-secret

Troubleshooting

Symptom	Likely cause	What to try
`helm install` fails with `<value> is required`	A required value is unset	Add it with `--set` or in your values file
`helm install` fails with a validation error	Both `cadence.enabled=true` and `temporal.enabled=true` set	Pick one workflow engine
`apiserver` Pod stuck in `Init:0/2`	`wait-for-metadata-storage` cannot reach DB	`kubectl run -it --rm mysql-test --image=mysql:8.0 -- mysqladmin ping -h <host>`
`apiserver` Pod stuck in `Init:1/2`	`schema-init` cannot apply SQL schema	`kubectl logs <pod> -c schema-init` — usually a credentials issue
`worker` Pod `CrashLoopBackOff`	Cannot reach workflow engine	Verify `workflow.endpoint` resolves and the port is open from the cluster
`helm test` fails	API server unreachable from inside cluster	`kubectl logs <test-pod>`; check apiserver Service exists
UI shows network errors in browser	`ui.apiBaseUrl` does not match how you exposed the API	Set `ui.apiBaseUrl` to the URL the browser uses to reach the API

For deeper diagnostics see Troubleshooting.

Values reference

Most commonly set values. See helm/michelangelo/values.yaml for the complete list.

Key	Required	Default	Description
`metadataStorage.host`	yes	—	Hostname of MySQL or PostgreSQL
`metadataStorage.port`	no	`3306`	Database port
`metadataStorage.rootPassword`	yes¹	—	Root password for schema initialization
`metadataStorage.existingSecret`	no	—	Name of a Secret containing `rootPassword`
`objectStorage.endpoint`	yes	—	S3-compatible endpoint
`objectStorage.accessKeyId`	yes¹	—	S3 access key ID
`objectStorage.secretAccessKey`	yes¹	—	S3 secret access key
`objectStorage.existingSecret`	no	—	Secret containing `accessKeyId`/`secretAccessKey`
`workflow.engine`	no	`cadence`	`cadence` or `temporal`
`workflow.endpoint`	yes	—	`host:port` of workflow frontend
`apiserver.tls.enabled`	no	`false`	Enable TLS on the apiserver gRPC port
`ui.apiBaseUrl`	no	`/api`	URL the UI uses to call the API
`ui.enabled` / `envoy.enabled` / etc.	no	`true`	Per-service install toggle
`<service>.ingress.enabled`	no	`false`	Per-service Ingress toggle
`controllermgr.watchNamespace`	no	`[]` (all)	Namespaces the controller manager watches
`cadence.enabled`	no	`false`	Install bundled Cadence subchart
`temporal.enabled`	no	`false`	Install bundled Temporal subchart

¹ Required unless existingSecret is set.

What the chart installs

The michelangelo chart owns the control plane only. Three tiers, with clear ownership boundaries:

Infrastructure tier — stateful, long-lived. You provide it (or opt into a subchart): MySQL/PostgreSQL, S3-compatible storage, and a Cadence or Temporal service.
Control plane tier — stateless, frequently redeployed. The chart installs all five services as Deployments:
- michelangelo-apiserver — gRPC API server (port 15566)
- michelangelo-envoy — gRPC-Web proxy (port 8081)
- michelangelo-ui — React frontend (port 80)
- michelangelo-worker — Cadence/Temporal workflow client
- michelangelo-controllermgr — Kubernetes controller manager
Observability tier — optional. Bring your own Prometheus and Grafana, or see Monitoring & Observability.

Chart layout

helm/michelangelo/
├── Chart.yaml              # includes optional cadence + temporal dependencies
├── README.md
├── values.yaml             # production defaults (ClusterIP, empty addresses, subcharts off)
├── values-k3d.yaml         # k3d overrides (NodePorts, in-cluster infra, cadence enabled)
├── files/schema/
│   └── mysql-init-schema.sql      # CRD schema applied at first install
├── crds/                          # placeholder — CRDs self-register at apiserver startup
└── templates/
    ├── _helpers.tpl
    ├── NOTES.txt                  # post-install instructions
    ├── validations.yaml           # chart-level guardrails (e.g. cadence/temporal exclusivity)
    ├── rbac/                      # ServiceAccount, ClusterRole, ClusterRoleBinding
    ├── tests/
    │   └── test-connection.yaml   # helm test hook
    └── core/                      # 20 templates for the 5 services
        ├── apiserver-{deployment,service,configmap,ingress,schema-init-configmap}.yaml
        ├── envoy-{deployment,service,configmap,ingress}.yaml
        ├── ui-{deployment,service,configmap,ingress}.yaml
        ├── worker-{deployment,configmap}.yaml
        ├── controllermgr-{deployment,service,configmap}.yaml
        ├── metadata-storage-secret.yaml    # resource-policy: keep
        └── object-storage-secret.yaml      # resource-policy: keep

Design notes

All control plane workloads are Deployments

Every service runs as a Deployment, not a bare Pod, so you get self-healing and rolling updates.

Schema initialization runs as init containers

Two init containers on the apiserver Pod: wait-for-metadata-storage polls the DB until reachable, then schema-init applies the CRD schema idempotently. This removes the ordering race that exists when the schema is applied by a separate Job.

Credential Secrets are retained on uninstall

metadata-storage-secret and object-storage-secret carry helm.sh/resource-policy: keep, so externally-injected credentials survive an uninstall/reinstall.

Required values fail fast

Required values use Helm's required template function — helm install fails before any resource is created.

Chart-level validations

templates/validations.yaml enforces chart-wide invariants (notably, that cadence.enabled and temporal.enabled are not both true). Validation errors surface during helm install/upgrade.

Least-privilege RBAC

The chart installs a scoped ClusterRole covering only what controllermgr and apiserver need: CRD lifecycle, Michelangelo CRs, KubeRay/Spark CRs, namespaces (create/update/patch/delete), pods/services, configmaps/secrets, and leader-election leases. There is no cluster-admin grant.

Pod security defaults

All control plane Pods run with runAsNonRoot: true, runAsUser: 65534, and drop: [ALL] capabilities by default. Override per-service if a custom image needs different settings.

Per-service `enabled` toggle

Each service has an enabled flag, and templates wrap the Deployment, Service, ConfigMap, and Ingress accordingly.

Envoy backend is release-scoped

The Envoy ConfigMap references {{ include "michelangelo.fullname" . }}-apiserver, so multiple releases in different namespaces do not collide.

KubeRay log-collector sidecar

When enabled via controllermgr.jobs.k8sengine.mapper.logPersistence, controllermgr injects a kuberayCollector sidecar into Ray jobs to persist logs to object storage. See values.yaml comments for the available knobs.

Next steps

Platform Setup — configure the components installed by this chart through their ConfigMaps
Network & Ingress — TLS and multi-cluster connectivity
Authentication — connect an identity provider and configure RBAC
Register a Compute Cluster — connect a cluster for Ray and Spark job dispatch
Monitoring & Observability — scrape metrics and configure dashboards

Who this is for​

Prerequisites​

Quick start​

Verify the install​

Reach the UI​

Customization​

Disable services you do not need​

Restrict the controller to specific namespaces​

Use a values file​

Use existing Secrets (GitOps-friendly)​

Expose the UI and API​

Enable TLS on the API server​

Local development on k3d​

Self-contained install with Cadence or Temporal​

Cadence subchart​

Temporal subchart​

Upgrade​

Uninstall​

Troubleshooting​

Values reference​

What the chart installs​

Chart layout​

Design notes​

All control plane workloads are Deployments​

Schema initialization runs as init containers​

Credential Secrets are retained on uninstall​

Required values fail fast​

Chart-level validations​

Least-privilege RBAC​

Pod security defaults​

Per-service enabled toggle​

Envoy backend is release-scoped​

KubeRay log-collector sidecar​

Next steps​