Skip to main content

Model Registry Integration

Unlike experiment tracking — where operators connect an external server — Michelangelo includes a built-in model registry. This guide explains how operators verify the registry is healthy, configure storage and access, and integrate registered models with downstream serving and CI/CD systems.


How the Model Registry Works

The registry separates operator and user responsibilities cleanly:

┌──────────────────────────────────────────────────────────┐
│ Operator Responsibility │
│ ├─ Provision the object store bucket and IAM policy │
│ ├─ Verify the Model CRD is installed │
│ └─ Configure RBAC for namespace access │
└──────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────┐
│ User Responsibility (task code) │
│ ├─ Register models from inside @uniflow.task functions │
│ └─ Platform creates a Model CR and writes artifacts │
└──────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────┐
│ Downstream Consumers │
│ ├─ InferenceServer (Michelangelo serving layer) │
│ ├─ External serving infrastructure (reads from S3) │
│ └─ CI/CD pipelines (read artifact URIs from Model CRs) │
└──────────────────────────────────────────────────────────┘

The registry is backed by a single Kubernetes Custom Resource: Model (CRD name models.michelangelo.api).


Prerequisites

Before working through this guide, ensure you have completed:

  • Platform Setup — the Controller Manager's minio.* fields must point to a reachable S3-compatible object store.
  • Compute cluster registration — at least one compute cluster registered with the Michelangelo control plane, so Uniflow tasks have somewhere to run.
  • Sufficient cluster permissions to create Roles and RoleBindings, and to inspect Custom Resource Definitions.

Step 1: Verify the Model CRD Is Installed

Confirm the Model CRD is present in the cluster before expecting any registration to succeed:

kubectl get crd models.michelangelo.api

If the CRD is missing, re-run the Michelangelo CRD installation step described in Platform Setup.

You can also spot-check a namespace for any existing models:

kubectl get models -n <namespace>

Step 2: Configure S3 Permissions for Model Artifacts

The Controller Manager and task pods write model artifacts to your S3-compatible object store. The IAM role or service account bound to the Controller Manager needs the following permissions on the models bucket:

{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<your-models-bucket>",
"arn:aws:s3:::<your-models-bucket>/*"
]
}

Task pods produce the raw model files during registration. If task pods run under a different IAM role or service account than the Controller Manager, apply equivalent write permissions to that identity as well.

Artifact URI discovery

The exact S3 layout for a model's artifacts is set by your platform configuration — Michelangelo does not prescribe a fixed directory structure. Rather than hardcoding paths, read the actual locations from each Model resource after registration:

# Raw training artifact URIs (weights, checkpoints)
kubectl get model <model-name> -n <namespace> \
-o jsonpath='{.spec.model_artifact_uri}'

# Deployable artifact URIs (packaged for serving)
kubectl get model <model-name> -n <namespace> \
-o jsonpath='{.spec.deployable_artifact_uri}'

These spec fields are the authoritative source for artifact location. Use them in any automation that needs to consume artifacts.


Step 3: Configure RBAC

Grant teams read access to Model resources in their namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: model-registry-reader
namespace: <namespace>
rules:
- apiGroups: ["michelangelo.api"]
resources: ["models"]
verbs: ["get", "list", "watch"]

For CI/CD service accounts that need to inspect or forward model records:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ci-model-registry-reader
namespace: <namespace>
subjects:
- kind: ServiceAccount
name: ci-service-account
namespace: ci-namespace
roleRef:
kind: Role
name: model-registry-reader
apiGroup: rbac.authorization.k8s.io

For cluster-wide patterns and multi-tenant isolation, see Authentication and RBAC.


Verification

After completing the setup steps, run this smoke test to confirm the registry is operational.

1. Check the CRD is registered:

kubectl get crd models.michelangelo.api

2. List Model resources in a namespace (requires at least one registered model):

# Using kubectl
kubectl get models -n <namespace>

# Using the ma CLI
ma model get -n <namespace>

3. Inspect a specific model:

kubectl describe model <model-name> -n <namespace>

4. Verify object store reachability from a compute pod:

kubectl run storage-check \
--image=amazon/aws-cli \
--namespace=<compute-namespace> \
--restart=Never \
--rm -it -- \
s3 ls s3://<your-models-bucket>/

If any of these fail, see the Troubleshooting section below.


The Model Custom Resource

Every registered model is a Model resource. Here is a representative example:

apiVersion: michelangelo.api/v2
kind: Model
metadata:
name: fraud-detector
namespace: ml-team
labels:
algorithm: xgboost
spec:
owner:
name: <owner-username>
description: "Fraud detection model trained on transaction features"
algorithm: xgboost
trainingFramework: sklearn
kind: MODEL_KIND_BINARY_CLASSIFICATION
source: TRAINING
package_type: DEPLOYABLE_MODEL_PACKAGE_TYPE_TRITON
revision_id: 3
model_artifact_uri:
- s3://<your-bucket>/<path-to-raw-weights>
deployable_artifact_uri:
- s3://<your-bucket>/<path-to-serving-package>
input_schema:
schema_items:
- name: transaction_features
data_type: DATA_TYPE_FLOAT
output_schema:
schema_items:
- name: fraud_score
data_type: DATA_TYPE_DOUBLE

Key fields:

FieldNotes
kindModel — there is no ModelVersion resource
spec.kindML problem type (e.g. MODEL_KIND_REGRESSION, MODEL_KIND_BINARY_CLASSIFICATION)
spec.package_typeHow serving systems should interpret the deployable artifact (e.g. DEPLOYABLE_MODEL_PACKAGE_TYPE_TRITON)
spec.revision_idInteger version counter; users set this when creating the resource
spec.model_artifact_uri[]Repeated string — URIs to raw training artifacts
spec.deployable_artifact_uri[]Repeated string — URIs to packaged artifacts ready for serving
spec.input_schema / spec.output_schemaDataSchema with schema_items[]; each item has name and data_type (not dtype)
statusEmpty — the Model resource carries no status conditions, phase, or timestamps

Heads-up: ModelStatus is intentionally empty. Do not poll kubectl wait --for=condition=Ready on a Model — no such condition exists. If you need a readiness signal, key off the existence of the resource (and a non-empty spec.deployable_artifact_uri[]) or wait on a downstream resource such as an InferenceServer.


Querying the Registry

Use either kubectl or the ma CLI to inspect registered models.

kubectl

# List all models in a namespace
kubectl get models -n <namespace>

# Describe a specific model
kubectl describe model <model-name> -n <namespace>

# Read a single field for automation
kubectl get model <model-name> -n <namespace> \
-o jsonpath='{.spec.deployable_artifact_uri[0]}'

ma CLI

The ma model subcommand supports get, apply, and delete. To list all models in a namespace, omit --name:

# List models in a namespace
ma model get -n <namespace>

# Get a specific model by name
ma model get -n <namespace> --name <model-name>

# Limit results when listing
ma model get -n <namespace> --limit 20

ma model does not have a list subcommand or --version / --output flags. When you need structured output for scripting, use kubectl with -o jsonpath or -o json.


Integrating with the Serving Layer

Michelangelo's InferenceServer resource does not reference a Model resource by name in its spec. The wiring from a registered model to a running server flows through Deployment and Revision resources managed by the Controller Manager, which update a modelconfig ConfigMap consumed by the inference backend.

A representative InferenceServer manifest:

apiVersion: michelangelo.api/v2
kind: InferenceServer
metadata:
name: fraud-detector-server
namespace: ml-team
spec:
tenancyType: TENANCY_TYPE_DEDICATED
backendType: BACKEND_TYPE_TRITON
ownerSpec:
tier: 1
initSpec:
resourceSpec:
cpu: 2
memory: "4Gi"
servingSpec:
version: "latest"
numInstances: 1
decomSpec:
decommission: false
owner:
name: <owner-username>

Key fields:

FieldNotes
backendTypeEnum — BACKEND_TYPE_TRITON, BACKEND_TYPE_LLM_D, BACKEND_TYPE_DYNAMO, BACKEND_TYPE_TORCHSERVE. Not the lowercase string "triton".
initSpec.numInstancesInstance count — there is no replicas field
tenancyTypeTENANCY_TYPE_DEDICATED (one project per server) or TENANCY_TYPE_MULTI_TENANT
spec.modelVersionDoes not exist — do not attempt to reference a Model directly from InferenceServer

The InferenceServer controller emits these conditions: Cleanup, HealthCheck, BackendProvision, ModelConfigProvision, Validation. There is no Ready condition; gate readiness on BackendProvision and ModelConfigProvision instead.

For backend selection and configuration, see Integrate a Custom Backend.


CI/CD Pipeline Integration

Because Model resources have no status conditions, CI/CD pipelines should check for the presence of the resource and read artifact URIs directly from the spec — kubectl wait is not applicable.

Example: GitHub Actions step

- name: Check model is registered
run: |
kubectl get model "${{ env.MODEL_NAME }}" \
--namespace "${{ env.NAMESPACE }}"

- name: Get artifact URI
id: model
run: |
ARTIFACT_URI=$(kubectl get model "${{ env.MODEL_NAME }}" \
-n "${{ env.NAMESPACE }}" \
-o jsonpath='{.spec.deployable_artifact_uri[0]}')
echo "deployable_uri=$ARTIFACT_URI" >> "$GITHUB_OUTPUT"

- name: Forward artifact to serving infrastructure
run: |
your-serving-tool deploy \
--artifact "${{ steps.model.outputs.deployable_uri }}" \
--target production

The variable is named ARTIFACT_URI, not PATH — assigning to PATH would overwrite the shell's executable search path and break every subsequent command in the step.

Portable date math for retention scripts

date -d is GNU coreutils only and fails on macOS / BSD. Use one of the following forms depending on where the script runs:

# GNU/Linux
CUTOFF=$(date -d '90 days ago' -u +%Y-%m-%dT%H:%M:%SZ)

# macOS / BSD
CUTOFF=$(date -u -v-90d +%Y-%m-%dT%H:%M:%SZ)

# Cross-platform (Python)
CUTOFF=$(python3 -c "from datetime import datetime, timedelta, timezone; print((datetime.now(timezone.utc) - timedelta(days=90)).strftime('%Y-%m-%dT%H:%M:%SZ'))")

Retention and Cleanup

Model artifacts in S3 are not automatically removed when a Model resource is deleted. Manage artifact lifecycle at the object store level using S3 lifecycle policies, or implement a periodic cleanup job that:

  1. Lists Model resources older than your retention window. Use metadata.creationTimestamp (a real Kubernetes field) as the time signal — there is no status.registeredAt:

    kubectl get models -A -o json \
    | jq --arg cutoff "$CUTOFF" \
    '.items[]
    | select(.metadata.creationTimestamp < $cutoff)
    | {namespace: .metadata.namespace,
    name: .metadata.name,
    model_uris: .spec.model_artifact_uri,
    deployable_uris: .spec.deployable_artifact_uri}'
  2. Reads spec.model_artifact_uri[] and spec.deployable_artifact_uri[] from each result to identify the S3 paths.

  3. Deletes the S3 objects first, then deletes the Model CR — that ordering avoids orphaned references if cleanup is interrupted.


Troubleshooting

SymptomLikely causeResolution
error: the server doesn't have a resource type "models"CRD not installedRe-run the Michelangelo CRD installation step (see Platform Setup)
kubectl get models returns No resources found but CRD is presentNo models registered yet, or wrong namespaceConfirm a registration task has run; check the namespace
spec.model_artifact_uri empty after registrationController Manager lacks S3 write permissions, or the registration task failedCheck Controller Manager logs; verify IAM policy on the bucket
spec.deployable_artifact_uri emptyPackaging step did not run or failedInspect the pipeline run logs for the registration task
RBAC error reading models (User ... cannot get resource "models")Role missing the michelangelo.api API groupUse apiGroups: ["michelangelo.api"] (not [""]) and apply the manifest from Step 3
kubectl wait --for=condition=Ready hangs on a ModelModel has no status conditionsDon't gate on Model conditions; use spec.deployable_artifact_uri[0] non-empty as the readiness signal, or wait on a downstream InferenceServer
InferenceServer does not start servingbackendType set to lowercase string "triton" instead of enum valueUse backendType: BACKEND_TYPE_TRITON

For deeper diagnostic trees, see the Troubleshooting Guide.


What's Next