Skip to content

Instantly share code, notes, and snippets.

@nerdalert
Created January 2, 2026 06:07
Show Gist options
  • Select an option

  • Save nerdalert/dbe99ac89eb85e63340a4954d563f700 to your computer and use it in GitHub Desktop.

Select an option

Save nerdalert/dbe99ac89eb85e63340a4954d563f700 to your computer and use it in GitHub Desktop.

Deployment of opendatahub-io/models-as-a-service#333 tested on ROSA, requires this PR as well for a simple wait for the ODH namespace readyness: opendatahub-io/models-as-a-service#329

$ ./scripts/deploy-rhoai-stable.sh
## Installing prerequisites

* Installing cert-manager operator...
namespace/cert-manager-operator created
operatorgroup.operators.coreos.com/cert-manager-operator created
subscription.operators.coreos.com/openshift-cert-manager-operator created
  * Waiting for Subscription cert-manager-operator/openshift-cert-manager-operator to start setup...
subscription.operators.coreos.com/openshift-cert-manager-operator condition met
  * Waiting for Subscription setup to finish setup. CSV = cert-manager-operator.v1.18.0 ...
clusterserviceversion.operators.coreos.com/cert-manager-operator.v1.18.0 condition met

* Installing LWS operator...
namespace/openshift-lws-operator created
operatorgroup.operators.coreos.com/leader-worker-set created
subscription.operators.coreos.com/leader-worker-set created
  * Waiting for Subscription openshift-lws-operator/leader-worker-set to start setup...
subscription.operators.coreos.com/leader-worker-set condition met
  * Waiting for Subscription setup to finish setup. CSV = leader-worker-set.v1.0.0 ...
clusterserviceversion.operators.coreos.com/leader-worker-set.v1.0.0 condition met
* Setting up LWS instance and letting it deploy asynchronously.
leaderworkersetoperator.operator.openshift.io/cluster created

* Initializing Gateway API provider...
gatewayclass.gateway.networking.k8s.io/openshift-default created
  * Waiting for GatewayClass openshift-default to transition to Accepted status...
gatewayclass.gateway.networking.k8s.io/openshift-default condition met

* Installing RHCL operator...
namespace/kuadrant-system created
operatorgroup.operators.coreos.com/kuadrant-operator-group created
subscription.operators.coreos.com/kuadrant-operator created
  * Waiting for Subscription kuadrant-system/kuadrant-operator to start setup...
subscription.operators.coreos.com/kuadrant-operator condition met
  * Waiting for Subscription setup to finish setup. CSV = rhcl-operator.v1.2.1 ...
clusterserviceversion.operators.coreos.com/rhcl-operator.v1.2.1 condition met
* Setting up RHCL instance...
kuadrant.kuadrant.io/kuadrant created

* Installing RHOAI v3 operator...
gateway.gateway.networking.k8s.io/openshift-ai-inference created
namespace/redhat-ods-operator created
operatorgroup.operators.coreos.com/rhoai3-operatorgroup created
subscription.operators.coreos.com/rhoai3-operator created
  * Waiting for Subscription redhat-ods-operator/rhoai3-operator to start setup...
subscription.operators.coreos.com/rhoai3-operator condition met
  * Waiting for Subscription setup to finish setup. CSV = rhods-operator.3.0.0 ...
clusterserviceversion.operators.coreos.com/rhods-operator.3.0.0 condition met
* Setting up RHOAI instance and letting it deploy asynchronously.
datasciencecluster.datasciencecluster.opendatahub.io/default-dsc created

## Installing Models-as-a-Service
* Cluster domain: apps.rosa.ogmok-ot5xp-3up.c23q.p3.openshiftapps.com
* Cluster audience: https://rh-oidc.s3.us-east-1.amazonaws.com/27bd6cg0vs7nn08mue83fbof94dj4m9a
* TLS certificate: 2nigmd0bdfitkogk8mhv57kjcp4g0nee-primary-cert-bundle-secret
namespace/maas-api created
* Waiting for opendatahub namespace to be created by the operator...
namespace/opendatahub condition met
serviceaccount/maas-api serverside-applied
clusterrole.rbac.authorization.k8s.io/maas-api serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/maas-api serverside-applied
configmap/maas-parameters serverside-applied
configmap/tier-to-group-mapping serverside-applied
service/maas-api serverside-applied
deployment.apps/maas-api serverside-applied
telemetrypolicy.extensions.kuadrant.io/user-group serverside-applied
gateway.gateway.networking.k8s.io/maas-default-gateway serverside-applied
httproute.gateway.networking.k8s.io/maas-api-route serverside-applied
authpolicy.kuadrant.io/maas-api-auth-policy serverside-applied
authpolicy.kuadrant.io/gateway-auth-policy serverside-applied
ratelimitpolicy.kuadrant.io/gateway-rate-limits serverside-applied
tokenratelimitpolicy.kuadrant.io/gateway-token-rate-limits serverside-applied
servicemonitor.monitoring.coreos.com/limitador-metrics serverside-applied
* Deploying tier-to-group-mapping ConfigMap to maas-api namespace...
configmap/tier-to-group-mapping serverside-applied
* Configuring audience in MaaS AuthPolicy
authpolicy.kuadrant.io/maas-api-auth-policy patched
deployment.apps/maas-api image updated

=========================================
Deployment is complete.

Next Steps:
1. Deploy a sample model:
   kubectl create namespace llm
   kustomize build 'https://github.com/opendatahub-io/models-as-a-service.git/docs/samples/models/simulator?ref=main' | kubectl apply -f -

2. Get Gateway endpoint:
   CLUSTER_DOMAIN=$(kubectl get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
   HOST="maas.${CLUSTER_DOMAIN}"

3. Get authentication token:
   TOKEN_RESPONSE=$(curl -sSk --oauth2-bearer "$(oc whoami -t)" --json '{"expiration": "10m"}' "https://${HOST}/maas-api/v1/tokens")
   TOKEN=$(echo $TOKEN_RESPONSE | jq -r .token)

4. Test model endpoint:
   MODELS=$(curl -sSk ${HOST}/maas-api/v1/models -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" | jq -r .)
   MODEL_NAME=$(echo $MODELS | jq -r '.data[0].id')
   MODEL_URL="${HOST}/llm/facebook-opt-125m-simulated/v1/chat/completions" # Note: This may be different for your model
   curl -sSk -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d "{\"model\": \"${MODEL_NAME}\", \"prompt\": \"Hello\", \"max_tokens\": 50}" "${MODEL_URL}"

5. Test authorization limiting (no token 401 error):
   curl -sSk -H "Content-Type: application/json" -d "{\"model\": \"${MODEL_NAME}\", \"prompt\": \"Hello\", \"max_tokens\": 50}" "${MODEL_URL}" -v

6. Test rate limiting (200 OK followed by 429 Rate Limit Exceeded after about 4 requests):
   for i in {1..16}; do curl -sSk -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d "{\"model\": \"${MODEL_NAME}\", \"prompt\": \"Hello\", \"max_tokens\": 50}" "${MODEL_URL}"; done

7. Run validation script (Runs all the checks again):
   curl https://raw.githubusercontent.com/opendatahub-io/models-as-a-service/refs/heads/main/scripts/validate-deployment.sh | sh -v -

8. Check metrics generation:
   kubectl port-forward -n kuadrant-system svc/limitador-limitador 8080:8080 &
   curl http://localhost:8080/metrics | grep -E '(authorized_hits|authorized_calls|limited_calls)'

9. Access Prometheus to view metrics:
   kubectl port-forward -n openshift-monitoring svc/prometheus-k8s 9090:9091 &
   # Open http://localhost:9090 in browser and search for: authorized_hits, authorized_calls, limited_calls

brent@ip-172-31-33-128:~/segrout/models-as-a-service$ kubectl create namespace llm
   kustomize build 'https://github.com/opendatahub-io/models-as-a-service.git/docs/samples/models/simulator?ref=main' | kubectl apply -f -
namespace/llm created
llminferenceservice.serving.kserve.io/facebook-opt-125m-simulated created
brent@ip-172-31-33-128:~/segrout/models-as-a-service$ CLUSTER_DOMAIN=$(kubectl get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
   HOST="maas.${CLUSTER_DOMAIN}"
brent@ip-172-31-33-128:~/segrout/models-as-a-service$ TOKEN_RESPONSE=$(curl -sSk --oauth2-bearer "$(oc whoami -t)" --json '{"expiration": "10m"}' "https://${HOST}/maas-api/v1/tokens")
   TOKEN=$(echo $TOKEN_RESPONSE | jq -r .token)
brent@ip-172-31-33-128:~/segrout/models-as-a-service$ MODELS=$(curl -sSk ${HOST}/maas-api/v1/models -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" | jq -r .)
   MODEL_NAME=$(echo $MODELS | jq -r '.data[0].id')
   MODEL_URL="${HOST}/llm/facebook-opt-125m-simulated/v1/chat/completions" # Note: This may be different for your model
   curl -sSk -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d "{\"model\": \"${MODEL_NAME}\", \"prompt\": \"Hello\", \"max_tokens\": 50}" "${MODEL_URL}"
{"id":"chatcmpl-44e7aa76-fa30-4500-bce2-d7781d56ad17","created":1767333758,"model":"facebook/opt-125m","usage":{"prompt_tokens":0,"completion_tokens":8,"total_tokens":8},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"The rest is silence.  To be or "}}]}brent@ip-172-31-33-128:~/segrout/models-as-a-service$
brent@ip-172-31-33-128:~/segrout/models-as-a-service$ for i in {1..16}; do curl -sSk -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d "{\"model\": \"${MODEL_NAME}\", \"prompt\": \"Hello\", \"max_tokens\": 50}" "${MODEL_URL}"; done
200
200
200
200
429
429
429
429
429
429
429
429
429
429
429
429
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment