Deployment of opendatahub-io/models-as-a-service#333 tested on ROSA, requires this PR as well for a simple wait for the ODH namespace readyness: opendatahub-io/models-as-a-service#329
$ ./scripts/deploy-rhoai-stable.sh
## Installing prerequisites
* Installing cert-manager operator...
namespace/cert-manager-operator created
operatorgroup.operators.coreos.com/cert-manager-operator created
subscription.operators.coreos.com/openshift-cert-manager-operator created
* Waiting for Subscription cert-manager-operator/openshift-cert-manager-operator to start setup...
subscription.operators.coreos.com/openshift-cert-manager-operator condition met
* Waiting for Subscription setup to finish setup. CSV = cert-manager-operator.v1.18.0 ...
clusterserviceversion.operators.coreos.com/cert-manager-operator.v1.18.0 condition met
* Installing LWS operator...
namespace/openshift-lws-operator created
operatorgroup.operators.coreos.com/leader-worker-set created
subscription.operators.coreos.com/leader-worker-set created
* Waiting for Subscription openshift-lws-operator/leader-worker-set to start setup...
subscription.operators.coreos.com/leader-worker-set condition met
* Waiting for Subscription setup to finish setup. CSV = leader-worker-set.v1.0.0 ...
clusterserviceversion.operators.coreos.com/leader-worker-set.v1.0.0 condition met
* Setting up LWS instance and letting it deploy asynchronously.
leaderworkersetoperator.operator.openshift.io/cluster created
* Initializing Gateway API provider...
gatewayclass.gateway.networking.k8s.io/openshift-default created
* Waiting for GatewayClass openshift-default to transition to Accepted status...
gatewayclass.gateway.networking.k8s.io/openshift-default condition met
* Installing RHCL operator...
namespace/kuadrant-system created
operatorgroup.operators.coreos.com/kuadrant-operator-group created
subscription.operators.coreos.com/kuadrant-operator created
* Waiting for Subscription kuadrant-system/kuadrant-operator to start setup...
subscription.operators.coreos.com/kuadrant-operator condition met
* Waiting for Subscription setup to finish setup. CSV = rhcl-operator.v1.2.1 ...
clusterserviceversion.operators.coreos.com/rhcl-operator.v1.2.1 condition met
* Setting up RHCL instance...
kuadrant.kuadrant.io/kuadrant created
* Installing RHOAI v3 operator...
gateway.gateway.networking.k8s.io/openshift-ai-inference created
namespace/redhat-ods-operator created
operatorgroup.operators.coreos.com/rhoai3-operatorgroup created
subscription.operators.coreos.com/rhoai3-operator created
* Waiting for Subscription redhat-ods-operator/rhoai3-operator to start setup...
subscription.operators.coreos.com/rhoai3-operator condition met
* Waiting for Subscription setup to finish setup. CSV = rhods-operator.3.0.0 ...
clusterserviceversion.operators.coreos.com/rhods-operator.3.0.0 condition met
* Setting up RHOAI instance and letting it deploy asynchronously.
datasciencecluster.datasciencecluster.opendatahub.io/default-dsc created
## Installing Models-as-a-Service
* Cluster domain: apps.rosa.ogmok-ot5xp-3up.c23q.p3.openshiftapps.com
* Cluster audience: https://rh-oidc.s3.us-east-1.amazonaws.com/27bd6cg0vs7nn08mue83fbof94dj4m9a
* TLS certificate: 2nigmd0bdfitkogk8mhv57kjcp4g0nee-primary-cert-bundle-secret
namespace/maas-api created
* Waiting for opendatahub namespace to be created by the operator...
namespace/opendatahub condition met
serviceaccount/maas-api serverside-applied
clusterrole.rbac.authorization.k8s.io/maas-api serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/maas-api serverside-applied
configmap/maas-parameters serverside-applied
configmap/tier-to-group-mapping serverside-applied
service/maas-api serverside-applied
deployment.apps/maas-api serverside-applied
telemetrypolicy.extensions.kuadrant.io/user-group serverside-applied
gateway.gateway.networking.k8s.io/maas-default-gateway serverside-applied
httproute.gateway.networking.k8s.io/maas-api-route serverside-applied
authpolicy.kuadrant.io/maas-api-auth-policy serverside-applied
authpolicy.kuadrant.io/gateway-auth-policy serverside-applied
ratelimitpolicy.kuadrant.io/gateway-rate-limits serverside-applied
tokenratelimitpolicy.kuadrant.io/gateway-token-rate-limits serverside-applied
servicemonitor.monitoring.coreos.com/limitador-metrics serverside-applied
* Deploying tier-to-group-mapping ConfigMap to maas-api namespace...
configmap/tier-to-group-mapping serverside-applied
* Configuring audience in MaaS AuthPolicy
authpolicy.kuadrant.io/maas-api-auth-policy patched
deployment.apps/maas-api image updated
=========================================
Deployment is complete.
Next Steps:
1. Deploy a sample model:
kubectl create namespace llm
kustomize build 'https://github.com/opendatahub-io/models-as-a-service.git/docs/samples/models/simulator?ref=main' | kubectl apply -f -
2. Get Gateway endpoint:
CLUSTER_DOMAIN=$(kubectl get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
HOST="maas.${CLUSTER_DOMAIN}"
3. Get authentication token:
TOKEN_RESPONSE=$(curl -sSk --oauth2-bearer "$(oc whoami -t)" --json '{"expiration": "10m"}' "https://${HOST}/maas-api/v1/tokens")
TOKEN=$(echo $TOKEN_RESPONSE | jq -r .token)
4. Test model endpoint:
MODELS=$(curl -sSk ${HOST}/maas-api/v1/models -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" | jq -r .)
MODEL_NAME=$(echo $MODELS | jq -r '.data[0].id')
MODEL_URL="${HOST}/llm/facebook-opt-125m-simulated/v1/chat/completions" # Note: This may be different for your model
curl -sSk -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d "{\"model\": \"${MODEL_NAME}\", \"prompt\": \"Hello\", \"max_tokens\": 50}" "${MODEL_URL}"
5. Test authorization limiting (no token 401 error):
curl -sSk -H "Content-Type: application/json" -d "{\"model\": \"${MODEL_NAME}\", \"prompt\": \"Hello\", \"max_tokens\": 50}" "${MODEL_URL}" -v
6. Test rate limiting (200 OK followed by 429 Rate Limit Exceeded after about 4 requests):
for i in {1..16}; do curl -sSk -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d "{\"model\": \"${MODEL_NAME}\", \"prompt\": \"Hello\", \"max_tokens\": 50}" "${MODEL_URL}"; done
7. Run validation script (Runs all the checks again):
curl https://raw.githubusercontent.com/opendatahub-io/models-as-a-service/refs/heads/main/scripts/validate-deployment.sh | sh -v -
8. Check metrics generation:
kubectl port-forward -n kuadrant-system svc/limitador-limitador 8080:8080 &
curl http://localhost:8080/metrics | grep -E '(authorized_hits|authorized_calls|limited_calls)'
9. Access Prometheus to view metrics:
kubectl port-forward -n openshift-monitoring svc/prometheus-k8s 9090:9091 &
# Open http://localhost:9090 in browser and search for: authorized_hits, authorized_calls, limited_calls
brent@ip-172-31-33-128:~/segrout/models-as-a-service$ kubectl create namespace llm
kustomize build 'https://github.com/opendatahub-io/models-as-a-service.git/docs/samples/models/simulator?ref=main' | kubectl apply -f -
namespace/llm created
llminferenceservice.serving.kserve.io/facebook-opt-125m-simulated created
brent@ip-172-31-33-128:~/segrout/models-as-a-service$ CLUSTER_DOMAIN=$(kubectl get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
HOST="maas.${CLUSTER_DOMAIN}"
brent@ip-172-31-33-128:~/segrout/models-as-a-service$ TOKEN_RESPONSE=$(curl -sSk --oauth2-bearer "$(oc whoami -t)" --json '{"expiration": "10m"}' "https://${HOST}/maas-api/v1/tokens")
TOKEN=$(echo $TOKEN_RESPONSE | jq -r .token)
brent@ip-172-31-33-128:~/segrout/models-as-a-service$ MODELS=$(curl -sSk ${HOST}/maas-api/v1/models -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" | jq -r .)
MODEL_NAME=$(echo $MODELS | jq -r '.data[0].id')
MODEL_URL="${HOST}/llm/facebook-opt-125m-simulated/v1/chat/completions" # Note: This may be different for your model
curl -sSk -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d "{\"model\": \"${MODEL_NAME}\", \"prompt\": \"Hello\", \"max_tokens\": 50}" "${MODEL_URL}"
{"id":"chatcmpl-44e7aa76-fa30-4500-bce2-d7781d56ad17","created":1767333758,"model":"facebook/opt-125m","usage":{"prompt_tokens":0,"completion_tokens":8,"total_tokens":8},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"The rest is silence. To be or "}}]}brent@ip-172-31-33-128:~/segrout/models-as-a-service$
brent@ip-172-31-33-128:~/segrout/models-as-a-service$ for i in {1..16}; do curl -sSk -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d "{\"model\": \"${MODEL_NAME}\", \"prompt\": \"Hello\", \"max_tokens\": 50}" "${MODEL_URL}"; done
200
200
200
200
429
429
429
429
429
429
429
429
429
429
429
429