Skip to main content

Sandbox Setup

This guide walks you through setting up a local Michelangelo environment on your laptop. The sandbox runs a fully functional cluster — API server, controller manager, workflow engine, object storage, and supporting services — entirely on your machine, so you can explore Michelangelo or develop against it without any cloud infrastructure.

Who this is for: ML engineers, platform engineers, and contributors who want to try Michelangelo locally or develop new features against it.

What you'll have at the end: a running sandbox cluster and a successful demo pipeline run, ready for you to build your own workflows on top of.

Time estimate: ~20 minutes (assuming prerequisites are installed; first-time pulls of container images can add 5–10 minutes).

Supported platforms: macOS (Apple Silicon and Intel) and Linux. Windows is not officially supported, but WSL2 with Docker Desktop should work for most steps.

Get the code

Clone the Michelangelo repository to your machine:

git clone https://github.com/michelangelo-ai/michelangelo.git
cd michelangelo

Throughout this guide, <repo-root> refers to the directory you just cloned (for example, ~/michelangelo).

Prerequisites

Before you begin, make sure you have the following installed. Install commands below show macOS (Homebrew) and Linux options where they differ; on Linux, follow the linked official guide if you don't see a direct command. Run each verification command to confirm:

ToolInstall (macOS)Install (Linux)Verify
DockerDocker Desktop or ColimaDocker Enginedocker --version
kubectlbrew install kubectlofficial guidekubectl version --client
k3dbrew install k3dofficial guidek3d --version
Helmbrew install helmofficial guidehelm version
Python 3.9+python.org or brew install python@3.11distro package manager (e.g., apt install python3)python3 --version
Poetrycurl -sSL https://install.python-poetry.org | python3 -same as macOSpoetry --version
temporal (Temporal only)brew install temporalofficial guidetemporal --version

Colima resource requirements (macOS only)

If you are on macOS and using Colima as your Docker runtime, the default VM resources are too limited for the sandbox. Start Colima with at least:

colima start --cpu 4 --memory 8 --disk 60
ResourceMinimumRecommended
CPU cores46
Memory (GB)812
Disk (GB)60100

Warning: Starting Colima with the default settings (2 CPU, 2 GB RAM) will cause pods to crash or fail to schedule. Always pass explicit resource flags.

If Colima is already running with insufficient resources, stop it and restart with the new settings:

colima stop
colima start --cpu 4 --memory 8 --disk 60

Configure host.docker.internal

Docker containers need to communicate with services on your host machine. Verify this hostname resolves correctly:

  1. Open your hosts file: sudo nano /etc/hosts
  2. Look for this line:
    127.0.0.1 host.docker.internal
  3. If missing, add it to the end of the file and save.

Install Python dependencies

From the repository you cloned, install the Michelangelo Python packages:

cd <repo-root>/python
poetry install

Quick start

Once prerequisites and Python dependencies are installed, the sandbox is three commands away:

# 1. Activate the Poetry virtual environment (from <repo-root>/python)
source .venv/bin/activate

# 2. Create the sandbox (~10–15 min on first run)
ma sandbox create

# 3. Verify everything works by running the demo pipeline
ma sandbox demo pipeline

Tip: If you prefer not to activate the venv, you can prefix each command with poetry run (e.g., poetry run ma sandbox create).

Choosing a workflow engine

ma sandbox create defaults to Cadence, which is the recommended choice for most users — it's the most-tested path and matches the examples in this guide. Pass --workflow temporal only if you specifically want to develop or test against Temporal (for example, if your team is migrating to it). The two engines are interchangeable from a workflow-author perspective; the choice mainly affects which web UI and CLI you use.

Verifying success

When ma sandbox create completes, all Michelangelo services start in your k3d cluster. Verify with:

kubectl get pods

You should see roughly 10–15 pods (the exact count depends on which engine you chose and any --exclude flags). All pods should reach Running status within 2–3 minutes; some pods may briefly show ContainerCreating or Init while images pull.

Then open the Michelangelo UI at http://localhost:8090 — if the dashboard loads, your sandbox is healthy. See Sandbox Ports and Endpoints for the full list of services and their URLs.


Sandbox commands

The ma sandbox command manages your local Kubernetes development environment.

For a complete command reference, see the CLI Reference - Sandbox Commands.

Lifecycle

The typical sandbox workflow:

create → (develop) → stop → start → (develop) → delete

Create

ma sandbox create [OPTIONS]
FlagDescriptionDefault
--workflow cadence|temporalChoose workflow enginecadence
--exclude [services]Exclude services: apiserver, controllermgr, ui, worker, prometheus, grafananone
--create-compute-clusterCreate an additional Ray compute cluster for distributed jobsdisabled
--compute-cluster-name <name>Custom name for the compute clusterauto-generated
--include-experimental [services]Include experimental servicesnone

Examples:

# Full sandbox with all services (default: Cadence workflow engine)
ma sandbox create

# Sandbox with Temporal workflow engine
ma sandbox create --workflow temporal

# Sandbox without UI, with a Ray compute cluster
ma sandbox create --exclude ui --create-compute-cluster

Stop / Start

Pause and resume your sandbox without losing state:

ma sandbox stop    # preserves state
ma sandbox start # resume where you left off

Delete

Tear down the cluster and remove all resources:

ma sandbox delete

Demo

Create pre-configured demo resources for testing:

ma sandbox demo pipeline    # registers and runs a sample pipeline
ma sandbox demo inference # sets up demo inference server

Smoke test: run the BERT CoLA example

After ma sandbox demo pipeline succeeds, you've already proven the sandbox works end to end. If you'd like to run a real example workflow against it before moving on, the BERT CoLA text-classification example is a quick way to confirm local execution works:

cd <repo-root>/python
poetry install --extras example
PYTHONPATH=. poetry run python ./examples/bert_cola/bert_cola.py

You should see workflow logs in your terminal and, when it finishes, a trained model artifact written to local storage.

For the full story on local vs. remote execution, building Docker images, configuring storage, and using either workflow engine end to end, see:

Note: Local execution doesn't support caching, retries, or resource constraints. Use remote execution (covered in the ML Pipelines guides) for production-like behavior.


Troubleshooting

ModuleNotFoundError: No module named 'grpc_reflection'

This error occurs when Python dependencies aren't fully installed. Fix it by reinstalling from the python/ directory:

cd <repo-root>/python
poetry install

If the error persists, try removing the virtual environment and reinstalling:

rm -rf .venv
poetry install

Pods stuck in ImagePullBackOff or ErrImagePull

The cluster can't pull a Docker image. Check which image is failing:

kubectl describe pod <pod-name> | grep -A 5 "Events"

Common causes:

  • Network issues: Ensure Docker can reach ghcr.io (try docker pull ghcr.io/michelangelo-ai/worker:latest)
  • Image doesn't exist: Verify the image tag matches what's available in the registry

Worker crashes with Namespace default is not found (Temporal only)

The Temporal default namespace must be registered after the sandbox starts. If the worker is in CrashLoopBackOff:

# Port-forward the Temporal frontend
kubectl port-forward svc/michelangelo-temporal-frontend 7233:7233 &

# Register the default namespace
temporal operator namespace create default

# Restart the worker to pick it up
kubectl rollout restart deployment/michelangelo-worker

Pods stuck in CrashLoopBackOff

A service is starting but immediately crashing. Check its logs:

kubectl logs <pod-name>

If a single service is wedged, the simplest recovery is to delete the pod and let Kubernetes recreate it:

kubectl delete pod <pod-name>

If that doesn't help, recreate the sandbox cleanly:

ma sandbox delete
ma sandbox create

Port already in use

If ma sandbox create fails because a port is already bound:

# Find what's using the port (e.g., port 9090)
lsof -i :9090

# Kill the process if it's safe to do so
kill <PID>

See Sandbox Ports and Endpoints for the full list of ports used.

Poetry install fails with build errors on macOS

If you see C++ compilation errors during poetry install:

export CC=clang
export CXX=clang++
poetry install

Add those exports to your ~/.zshrc to make them permanent.


What's next?