Sandbox Setup
This guide walks you through setting up a local Michelangelo environment on your laptop. The sandbox runs a fully functional cluster — API server, controller manager, workflow engine, object storage, and supporting services — entirely on your machine, so you can explore Michelangelo or develop against it without any cloud infrastructure.
Who this is for: ML engineers, platform engineers, and contributors who want to try Michelangelo locally or develop new features against it.
What you'll have at the end: a running sandbox cluster and a successful demo pipeline run, ready for you to build your own workflows on top of.
Time estimate: ~20 minutes (assuming prerequisites are installed; first-time pulls of container images can add 5–10 minutes).
Supported platforms: macOS (Apple Silicon and Intel) and Linux. Windows is not officially supported, but WSL2 with Docker Desktop should work for most steps.
Get the code
Clone the Michelangelo repository to your machine:
git clone https://github.com/michelangelo-ai/michelangelo.git
cd michelangelo
Throughout this guide, <repo-root> refers to the directory you just cloned (for example, ~/michelangelo).
Prerequisites
Before you begin, make sure you have the following installed. Install commands below show macOS (Homebrew) and Linux options where they differ; on Linux, follow the linked official guide if you don't see a direct command. Run each verification command to confirm:
| Tool | Install (macOS) | Install (Linux) | Verify |
|---|---|---|---|
| Docker | Docker Desktop or Colima | Docker Engine | docker --version |
| kubectl | brew install kubectl | official guide | kubectl version --client |
| k3d | brew install k3d | official guide | k3d --version |
| Helm | brew install helm | official guide | helm version |
| Python 3.9+ | python.org or brew install python@3.11 | distro package manager (e.g., apt install python3) | python3 --version |
| Poetry | curl -sSL https://install.python-poetry.org | python3 - | same as macOS | poetry --version |
| temporal (Temporal only) | brew install temporal | official guide | temporal --version |
Colima resource requirements (macOS only)
If you are on macOS and using Colima as your Docker runtime, the default VM resources are too limited for the sandbox. Start Colima with at least:
colima start --cpu 4 --memory 8 --disk 60
| Resource | Minimum | Recommended |
|---|---|---|
| CPU cores | 4 | 6 |
| Memory (GB) | 8 | 12 |
| Disk (GB) | 60 | 100 |
Warning: Starting Colima with the default settings (2 CPU, 2 GB RAM) will cause pods to crash or fail to schedule. Always pass explicit resource flags.
If Colima is already running with insufficient resources, stop it and restart with the new settings:
colima stop
colima start --cpu 4 --memory 8 --disk 60
Configure host.docker.internal
Docker containers need to communicate with services on your host machine. Verify this hostname resolves correctly:
- Open your hosts file:
sudo nano /etc/hosts - Look for this line:
127.0.0.1 host.docker.internal - If missing, add it to the end of the file and save.
Install Python dependencies
From the repository you cloned, install the Michelangelo Python packages:
cd <repo-root>/python
poetry install
Quick start
Once prerequisites and Python dependencies are installed, the sandbox is three commands away:
# 1. Activate the Poetry virtual environment (from <repo-root>/python)
source .venv/bin/activate
# 2. Create the sandbox (~10–15 min on first run)
ma sandbox create
# 3. Verify everything works by running the demo pipeline
ma sandbox demo pipeline
Tip: If you prefer not to activate the venv, you can prefix each command with
poetry run(e.g.,poetry run ma sandbox create).
Choosing a workflow engine
ma sandbox create defaults to Cadence, which is the recommended choice for most users — it's the most-tested path and matches the examples in this guide. Pass --workflow temporal only if you specifically want to develop or test against Temporal (for example, if your team is migrating to it). The two engines are interchangeable from a workflow-author perspective; the choice mainly affects which web UI and CLI you use.
Verifying success
When ma sandbox create completes, all Michelangelo services start in your k3d cluster. Verify with:
kubectl get pods
You should see roughly 10–15 pods (the exact count depends on which engine you chose and any --exclude flags). All pods should reach Running status within 2–3 minutes; some pods may briefly show ContainerCreating or Init while images pull.
Then open the Michelangelo UI at http://localhost:8090 — if the dashboard loads, your sandbox is healthy. See Sandbox Ports and Endpoints for the full list of services and their URLs.
Sandbox commands
The ma sandbox command manages your local Kubernetes development environment.
For a complete command reference, see the CLI Reference - Sandbox Commands.
Lifecycle
The typical sandbox workflow:
create → (develop) → stop → start → (develop) → delete
Create
ma sandbox create [OPTIONS]
| Flag | Description | Default |
|---|---|---|
--workflow cadence|temporal | Choose workflow engine | cadence |
--exclude [services] | Exclude services: apiserver, controllermgr, ui, worker, prometheus, grafana | none |
--create-compute-cluster | Create an additional Ray compute cluster for distributed jobs | disabled |
--compute-cluster-name <name> | Custom name for the compute cluster | auto-generated |
--include-experimental [services] | Include experimental services | none |
Examples:
# Full sandbox with all services (default: Cadence workflow engine)
ma sandbox create
# Sandbox with Temporal workflow engine
ma sandbox create --workflow temporal
# Sandbox without UI, with a Ray compute cluster
ma sandbox create --exclude ui --create-compute-cluster
Stop / Start
Pause and resume your sandbox without losing state:
ma sandbox stop # preserves state
ma sandbox start # resume where you left off
Delete
Tear down the cluster and remove all resources:
ma sandbox delete
Demo
Create pre-configured demo resources for testing:
ma sandbox demo pipeline # registers and runs a sample pipeline
ma sandbox demo inference # sets up demo inference server
Smoke test: run the BERT CoLA example
After ma sandbox demo pipeline succeeds, you've already proven the sandbox works end to end. If you'd like to run a real example workflow against it before moving on, the BERT CoLA text-classification example is a quick way to confirm local execution works:
cd <repo-root>/python
poetry install --extras example
PYTHONPATH=. poetry run python ./examples/bert_cola/bert_cola.py
You should see workflow logs in your terminal and, when it finishes, a trained model artifact written to local storage.
For the full story on local vs. remote execution, building Docker images, configuring storage, and using either workflow engine end to end, see:
- Pipeline Running Modes — the four execution modes Michelangelo supports
Note: Local execution doesn't support caching, retries, or resource constraints. Use remote execution (covered in the ML Pipelines guides) for production-like behavior.
Troubleshooting
ModuleNotFoundError: No module named 'grpc_reflection'
This error occurs when Python dependencies aren't fully installed. Fix it by reinstalling from the python/ directory:
cd <repo-root>/python
poetry install
If the error persists, try removing the virtual environment and reinstalling:
rm -rf .venv
poetry install
Pods stuck in ImagePullBackOff or ErrImagePull
The cluster can't pull a Docker image. Check which image is failing:
kubectl describe pod <pod-name> | grep -A 5 "Events"
Common causes:
- Network issues: Ensure Docker can reach
ghcr.io(trydocker pull ghcr.io/michelangelo-ai/worker:latest) - Image doesn't exist: Verify the image tag matches what's available in the registry
Worker crashes with Namespace default is not found (Temporal only)
The Temporal default namespace must be registered after the sandbox starts. If the worker is in CrashLoopBackOff:
# Port-forward the Temporal frontend
kubectl port-forward svc/michelangelo-temporal-frontend 7233:7233 &
# Register the default namespace
temporal operator namespace create default
# Restart the worker to pick it up
kubectl rollout restart deployment/michelangelo-worker
Pods stuck in CrashLoopBackOff
A service is starting but immediately crashing. Check its logs:
kubectl logs <pod-name>
If a single service is wedged, the simplest recovery is to delete the pod and let Kubernetes recreate it:
kubectl delete pod <pod-name>
If that doesn't help, recreate the sandbox cleanly:
ma sandbox delete
ma sandbox create
Port already in use
If ma sandbox create fails because a port is already bound:
# Find what's using the port (e.g., port 9090)
lsof -i :9090
# Kill the process if it's safe to do so
kill <PID>
See Sandbox Ports and Endpoints for the full list of ports used.
Poetry install fails with build errors on macOS
If you see C++ compilation errors during poetry install:
export CC=clang
export CXX=clang++
poetry install
Add those exports to your ~/.zshrc to make them permanent.
What's next?
- Build your first pipeline -- Follow Getting Started with ML Pipelines to create a training workflow (~30 min)
- Explore example projects -- Try California Housing XGBoost, BERT Text Classification, or GPT Fine-tuning
- Learn the CLI -- See the CLI Reference for managing pipelines and projects