Skip to main content

Install Michelangelo with Helm

The michelangelo Helm chart installs the Michelangelo control plane on any Kubernetes cluster — production, staging, or a local development cluster. After installation you can manage the platform with standard helm install, helm upgrade, and helm uninstall commands.

This guide walks you through prerequisites, a minimal install, verification, customization, and upgrade or removal. The final sections explain the chart's design so you know what you are getting.

Who this is for

This guide is for platform engineers and infrastructure operators who want to run Michelangelo on a Kubernetes cluster they manage. By the end, you will have:

  • A running Michelangelo control plane on your cluster
  • A clear understanding of what the chart owns and what you must provide
  • The commands to upgrade, customize, and remove the release

If you are looking for a fully scripted local sandbox instead, see Dev Environment Setup.

Prerequisites

Before you install, make sure you have:

RequirementVersionNotes
Kubernetes cluster1.27 or newerChart's kubeVersion constraint is >=1.27.0-0
kubectlCompatible with your clusterConfigured with cluster admin or equivalent permissions
helm3.8 or newerRequired for OCI dependencies and the lookup template function

You also need the following infrastructure reachable from the cluster before you install the chart. The chart expects you to point it at running services — it does not create them (unless you opt into a bundled subchart).

ComponentWhat the chart needsExamples
Metadata storageA reachable MySQL or PostgreSQL endpoint, plus credentials (root password OR an existing Kubernetes Secret)RDS, Cloud SQL, an in-cluster pod
Object storageAn S3-compatible endpoint and access keys (or an existing Secret)S3, GCS, MinIO
Workflow engineA Cadence or Temporal frontend address — or enable a bundled subchart (see Self-contained install)Managed Cadence/Temporal, in-cluster install

Quick start

A minimal install pointing at existing infrastructure:

helm install michelangelo ./helm/michelangelo \
--set metadataStorage.host=mysql.example.com \
--set metadataStorage.rootPassword=changeme \
--set objectStorage.endpoint=s3.amazonaws.com \
--set objectStorage.accessKeyId=AKID \
--set objectStorage.secretAccessKey=SECRET \
--set workflow.endpoint=cadence-frontend.example.com:7833

If any required value is missing, helm install exits before creating any resources and tells you which value to set. See the values reference for the full list.

Verify the install

kubectl get deployments -l app.kubernetes.io/instance=michelangelo

You should see five Deployments, each with READY 1/1:

NAME                          READY   UP-TO-DATE   AVAILABLE
michelangelo-apiserver 1/1 1 1
michelangelo-controllermgr 1/1 1 1
michelangelo-envoy 1/1 1 1
michelangelo-ui 1/1 1 1
michelangelo-worker 1/1 1 1

Run the chart's built-in connectivity test:

helm test michelangelo

Reach the UI

For a default ClusterIP install, port-forward to your laptop:

kubectl port-forward svc/michelangelo-ui 8080:80

Then open http://localhost:8080 in your browser. For production, see Expose the UI and API below.

Customization

Disable services you do not need

helm install michelangelo ./helm/michelangelo \
--set ui.enabled=false \
--set envoy.enabled=false \
...

Restrict the controller to specific namespaces

By default controllermgr watches all namespaces (it gets a ClusterRole). Set controllermgr.watchNamespace to a list to switch to namespaced Role + RoleBinding:

controllermgr:
watchNamespace:
- team-ml
- team-ranking

Use a values file

Copy helm/michelangelo/values.yaml, edit it, and pass with -f:

helm install michelangelo ./helm/michelangelo -f my-values.yaml

Use existing Secrets (GitOps-friendly)

Instead of passing credentials through --set, point the chart at Secrets you manage out-of-band:

metadataStorage:
existingSecret: my-mysql-secret # must contain key: rootPassword
objectStorage:
existingSecret: my-s3-secret # must contain keys: accessKeyId, secretAccessKey

Expose the UI and API

The chart includes per-service Ingress templates (apiserver.ingress, envoy.ingress, ui.ingress). Enable and configure them to expose the UI and API outside the cluster:

ui:
ingress:
enabled: true
className: nginx
hosts:
- host: michelangelo.example.com
paths: ["/"]
tls:
- secretName: michelangelo-ui-tls
hosts: [michelangelo.example.com]

envoy:
ingress:
enabled: true
# ... mirror UI config for the gRPC-Web endpoint

ui:
apiBaseUrl: https://michelangelo.example.com/api # match your Envoy/Ingress URL

Enable TLS on the API server

The apiserver gRPC port supports TLS via apiserver.tls.*:

apiserver:
tls:
enabled: true
secretName: michelangelo-apiserver-tls

Local development on k3d

A ready-made overrides file for k3d clusters switches services to NodePort, enables the bundled Cadence subchart, and points at in-cluster infrastructure:

helm install michelangelo ./helm/michelangelo -f helm/michelangelo/values-k3d.yaml

See Dev Environment Setup for the full local workflow.

Self-contained install with Cadence or Temporal

If you do not have a workflow engine available, the chart can install one for you. Both Cadence and Temporal are declared as optional subcharts and are disabled by default. Enable exactly one — templates/validations.yaml rejects installs that enable both.

Cadence subchart

helm dependency update helm/michelangelo

helm install michelangelo ./helm/michelangelo \
--set cadence.enabled=true \
--set workflow.engine=cadence \
--set workflow.endpoint=michelangelo-cadence-frontend:7833 \
... # other required infrastructure values

Cadence subchart values are namespaced under cadence:. A common setup shares MySQL with the control plane but uses a separate cadence database:

cadence:
enabled: true
persistence:
defaultStore: mysql
mysql:
driver: mysql
host: mysql.example.com
port: 3306
database: cadence
user: root
password: changeme
web:
enabled: true

Temporal subchart

helm install michelangelo ./helm/michelangelo \
--set temporal.enabled=true \
--set workflow.engine=temporal \
--set workflow.endpoint=michelangelo-temporal-frontend:7233 \
...

Pass Temporal subchart values under the temporal: key. See the official Temporal Helm chart for the full surface.

Upgrade

helm upgrade michelangelo ./helm/michelangelo --reuse-values

To change a single value:

helm upgrade michelangelo ./helm/michelangelo --reuse-values \
--set ui.enabled=true

Review the chart's CHANGELOG.md before upgrading across a minor version.

Uninstall

helm uninstall michelangelo

This removes all Deployments, Services, ConfigMaps, RBAC, and CRDs created by the chart. Two Secrets are intentionally retained: metadata-storage-secret and object-storage-secret. They are annotated helm.sh/resource-policy: keep so an uninstall/install cycle does not destroy externally-injected credentials.

Delete them manually for a fully clean slate:

kubectl delete secret metadata-storage-secret object-storage-secret

Troubleshooting

SymptomLikely causeWhat to try
helm install fails with <value> is requiredA required value is unsetAdd it with --set or in your values file
helm install fails with a validation errorBoth cadence.enabled=true and temporal.enabled=true setPick one workflow engine
apiserver Pod stuck in Init:0/2wait-for-metadata-storage cannot reach DBkubectl run -it --rm mysql-test --image=mysql:8.0 -- mysqladmin ping -h <host>
apiserver Pod stuck in Init:1/2schema-init cannot apply SQL schemakubectl logs <pod> -c schema-init — usually a credentials issue
worker Pod CrashLoopBackOffCannot reach workflow engineVerify workflow.endpoint resolves and the port is open from the cluster
helm test failsAPI server unreachable from inside clusterkubectl logs <test-pod>; check apiserver Service exists
UI shows network errors in browserui.apiBaseUrl does not match how you exposed the APISet ui.apiBaseUrl to the URL the browser uses to reach the API

For deeper diagnostics see Troubleshooting.

Values reference

Most commonly set values. See helm/michelangelo/values.yaml for the complete list.

KeyRequiredDefaultDescription
metadataStorage.hostyesHostname of MySQL or PostgreSQL
metadataStorage.portno3306Database port
metadataStorage.rootPasswordyes¹Root password for schema initialization
metadataStorage.existingSecretnoName of a Secret containing rootPassword
objectStorage.endpointyesS3-compatible endpoint
objectStorage.accessKeyIdyes¹S3 access key ID
objectStorage.secretAccessKeyyes¹S3 secret access key
objectStorage.existingSecretnoSecret containing accessKeyId/secretAccessKey
workflow.enginenocadencecadence or temporal
workflow.endpointyeshost:port of workflow frontend
apiserver.tls.enablednofalseEnable TLS on the apiserver gRPC port
ui.apiBaseUrlno/apiURL the UI uses to call the API
ui.enabled / envoy.enabled / etc.notruePer-service install toggle
<service>.ingress.enablednofalsePer-service Ingress toggle
controllermgr.watchNamespaceno[] (all)Namespaces the controller manager watches
cadence.enablednofalseInstall bundled Cadence subchart
temporal.enablednofalseInstall bundled Temporal subchart

¹ Required unless existingSecret is set.

What the chart installs

The michelangelo chart owns the control plane only. Three tiers, with clear ownership boundaries:

  • Infrastructure tier — stateful, long-lived. You provide it (or opt into a subchart): MySQL/PostgreSQL, S3-compatible storage, and a Cadence or Temporal service.
  • Control plane tier — stateless, frequently redeployed. The chart installs all five services as Deployments:
    • michelangelo-apiserver — gRPC API server (port 15566)
    • michelangelo-envoy — gRPC-Web proxy (port 8081)
    • michelangelo-ui — React frontend (port 80)
    • michelangelo-worker — Cadence/Temporal workflow client
    • michelangelo-controllermgr — Kubernetes controller manager
  • Observability tier — optional. Bring your own Prometheus and Grafana, or see Monitoring & Observability.

Chart layout

helm/michelangelo/
├── Chart.yaml # includes optional cadence + temporal dependencies
├── README.md
├── values.yaml # production defaults (ClusterIP, empty addresses, subcharts off)
├── values-k3d.yaml # k3d overrides (NodePorts, in-cluster infra, cadence enabled)
├── files/schema/
│ └── mysql-init-schema.sql # CRD schema applied at first install
├── crds/ # placeholder — CRDs self-register at apiserver startup
└── templates/
├── _helpers.tpl
├── NOTES.txt # post-install instructions
├── validations.yaml # chart-level guardrails (e.g. cadence/temporal exclusivity)
├── rbac/ # ServiceAccount, ClusterRole, ClusterRoleBinding
├── tests/
│ └── test-connection.yaml # helm test hook
└── core/ # 20 templates for the 5 services
├── apiserver-{deployment,service,configmap,ingress,schema-init-configmap}.yaml
├── envoy-{deployment,service,configmap,ingress}.yaml
├── ui-{deployment,service,configmap,ingress}.yaml
├── worker-{deployment,configmap}.yaml
├── controllermgr-{deployment,service,configmap}.yaml
├── metadata-storage-secret.yaml # resource-policy: keep
└── object-storage-secret.yaml # resource-policy: keep

Design notes

All control plane workloads are Deployments

Every service runs as a Deployment, not a bare Pod, so you get self-healing and rolling updates.

Schema initialization runs as init containers

Two init containers on the apiserver Pod: wait-for-metadata-storage polls the DB until reachable, then schema-init applies the CRD schema idempotently. This removes the ordering race that exists when the schema is applied by a separate Job.

Credential Secrets are retained on uninstall

metadata-storage-secret and object-storage-secret carry helm.sh/resource-policy: keep, so externally-injected credentials survive an uninstall/reinstall.

Required values fail fast

Required values use Helm's required template function — helm install fails before any resource is created.

Chart-level validations

templates/validations.yaml enforces chart-wide invariants (notably, that cadence.enabled and temporal.enabled are not both true). Validation errors surface during helm install/upgrade.

Least-privilege RBAC

The chart installs a scoped ClusterRole covering only what controllermgr and apiserver need: CRD lifecycle, Michelangelo CRs, KubeRay/Spark CRs, namespaces (create/update/patch/delete), pods/services, configmaps/secrets, and leader-election leases. There is no cluster-admin grant.

Pod security defaults

All control plane Pods run with runAsNonRoot: true, runAsUser: 65534, and drop: [ALL] capabilities by default. Override per-service if a custom image needs different settings.

Per-service enabled toggle

Each service has an enabled flag, and templates wrap the Deployment, Service, ConfigMap, and Ingress accordingly.

Envoy backend is release-scoped

The Envoy ConfigMap references {{ include "michelangelo.fullname" . }}-apiserver, so multiple releases in different namespaces do not collide.

KubeRay log-collector sidecar

When enabled via controllermgr.jobs.k8sengine.mapper.logPersistence, controllermgr injects a kuberayCollector sidecar into Ray jobs to persist logs to object storage. See values.yaml comments for the available knobs.

Next steps