Last active
November 14, 2025 02:15
-
-
Save rajvermacas/8b7f3a4cfbdc5e1bda10544853c20d7f to your computer and use it in GitHub Desktop.
databricks model registration
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Complete Summary: Deploying a Scikit-Learn Model on Databricks Model Serving | |
| What We Accomplished | |
| Successfully trained, registered, and deployed a scikit-learn Linear Regression model as a REST API endpoint on Databricks Model Serving. | |
| Step-by-Step Process | |
| 1. Initial Setup & Understanding | |
| Question: Can scikit-learn models trained on Databricks be served using Model Serving? | |
| Answer: Yes! Databricks supports custom models packaged in MLflow format, including scikit-learn. | |
| 2. Reviewed Existing Notebook (PocML) | |
| Cell 1: Basic model training | |
| from sklearn.linear_model import LinearRegression | |
| import numpy | |
| # Dummy data | |
| X = np.array([[1], [2], [3], [4], [5]]) | |
| y = np.array([2, 4, 6, 8, 10]) | |
| # Train linear regression | |
| model = LinearRegression() | |
| model.fit(X, y) | |
| Cell 2 (Original): Had basic MLflow logging without signature - this caused issues later | |
| 3. Fixed Model Registration with Proper MLflow Logging | |
| Updated Cell 2: | |
| import mlflow.sklearn | |
| input_example = X[:1] | |
| with mlflow.start_run(): | |
| mlflow.sklearn.log_model( | |
| model, | |
| artifact_path="model", | |
| input_example=input_example, | |
| registered_model_name="linear_regression_model" | |
| ) | |
| mlflow.log_params({"fit_intercept": model.fit_intercept}) | |
| mlflow.log_metric("score", model.score(X, y)) | |
| Key Changes: | |
| Added input_example parameter for automatic signature inference | |
| Used registered_model_name to register model directly | |
| Removed Unity Catalog 3-level namespace (used workspace registry instead) | |
| Result: Successfully registered model as workspace.default.linear_regression_model (Version 1) | |
| 4. Created Model Serving Endpoint | |
| Steps: | |
| Navigated to Serving in left sidebar | |
| Clicked Create serving endpoint | |
| Filled in configuration: | |
| Name: linear-regression-endpoint | |
| Source: My models | |
| Model: workspace.default.linear_regression_model | |
| Version: 1 | |
| Compute: CPU, Custom 0-4 concurrency | |
| Traffic: 100% | |
| Clicked Create | |
| Result: Endpoint created and deployed (takes 2-5 minutes to become "Ready") | |
| 5. Testing the Endpoint with curl | |
| Endpoint URL: | |
| https://dbc-7e0a1c95-5a72.cloud.databricks.com/serving-endpoints/linear-regression-endpoint/invocations | |
| Authentication Required: | |
| Need Databricks Personal Access Token (PAT) | |
| Get from: Settings → Developer → Generate new token | |
| Your token: dapi584cb842334275ad4c32c28b06b629f5 | |
| Troubleshooting curl requests: | |
| ❌ Attempt 1: Missing authentication | |
| # Error: 401 - Credential not sent | |
| ❌ Attempt 2: Wrong input format (dataframe_split) | |
| # Error: Schema mismatch - model expects unnamed input | |
| ❌ Attempt 3: Plain array format | |
| # Error: Invalid JSON input | |
| ❌ Attempt 4: Correct JSON but wrong shape (1, 5) instead of (-1, 1) | |
| curl -d '{"inputs": [[1, 2, 3, 4, 5]]}' | |
| # Error: Shape of input (1, 5) does not match expected shape (-1, 1) | |
| ✅ Working Solution: | |
| curl -X POST "https://dbc-7e0a1c95-5a72.cloud.databricks.com/serving-endpoints/linear-regression-endpoint/invocations" \ | |
| -H "Content-Type: application/json" \ | |
| -H "Authorization: Bearer dapi584cb842334275ad4c32c28b06b629f5" \ | |
| -d '{"inputs": [[1], [2], [3], [4], [5]]}' | |
| Key Issues & Lessons Learned | |
| Issue 1: Model Signature Mismatch | |
| Problem: The model's auto-inferred signature expected shape (-1, 1) but we wanted to send shape (1, 5) | |
| Root Cause: The training data X[:1] had shape (1, 1) (one row, one column), so MLflow inferred the model expects single-column inputs. | |
| Solution for Future: | |
| # Use the FULL feature set as input_example | |
| input_example = X[:1] # This should be shape (1, 5) for 5 features | |
| # Or explicitly define signature | |
| from mlflow.models.signature import infer_signature | |
| signature = infer_signature(X, model.predict(X)) | |
| mlflow.sklearn.log_model( | |
| model, | |
| artifact_path="model", | |
| signature=signature, # Explicit signature | |
| input_example=X[:1], | |
| registered_model_name="linear_regression_model" | |
| ) | |
| Issue 2: Unity Catalog vs Workspace Registry | |
| Attempted: Register to Unity Catalog with 3-level namespace (default.default.linear_regression_model) Error: "Catalog 'default' does not exist" Solution: Used workspace registry with simple name: linear_regression_model | |
| Architecture Diagram | |
| Training (Notebook) ↓ MLflow Logging + Registration ↓ Workspace Model Registry ↓ Model Serving Endpoint (Serverless) ↓ REST API (with authentication) ↓ Client Application (curl/Python/API) | |
| Complete Working Code | |
| Notebook - Cell 1: Train Model | |
| from sklearn.linear_model import LinearRegression | |
| import numpy as np | |
| # Training data | |
| X = np.array([[1], [2], [3], [4], [5]]) | |
| y = np.array([2, 4, 6, 8, 10]) | |
| # Train model | |
| model = LinearRegression() | |
| model.fit(X, y) | |
| Notebook - Cell 2: Log & Register Model | |
| import mlflow.sklearn | |
| input_example = X[:1] | |
| with mlflow.start_run(): | |
| mlflow.sklearn.log_model( | |
| model, | |
| artifact_path="model", | |
| input_example=input_example, | |
| registered_model_name="linear_regression_model" | |
| ) | |
| mlflow.log_params({"fit_intercept": model.fit_intercept}) | |
| mlflow.log_metric("score", model.score(X, y)) | |
| Testing with curl | |
| curl -X POST "https://dbc-7e0a1c95-5a72.cloud.databricks.com/serving-endpoints/linear-regression-endpoint/invocations" \ | |
| -H "Content-Type: application/json" \ | |
| -H "Authorization: Bearer YOUR_TOKEN" \ | |
| -d '{"inputs": [[1], [2], [3], [4], [5]]}' | |
| Testing with Python | |
| import requests | |
| url = "https://dbc-7e0a1c95-5a72.cloud.databricks.com/serving-endpoints/linear-regression-endpoint/invocations" | |
| headers = { | |
| "Content-Type": "application/json", | |
| "Authorization": "Bearer YOUR_TOKEN" | |
| } | |
| data = {"inputs": [[1], [2], [3], [4], [5]]} | |
| response = requests.post(url, headers=headers, json=data) | |
| print(response.json()) | |
| Best Practices for Future | |
| Always provide input_example when logging models | |
| Use explicit signatures for complex input shapes | |
| Test model signature before deploying to serving | |
| Use Unity Catalog for production (requires proper catalog setup) | |
| Store tokens securely (use environment variables, never hardcode) | |
| Monitor endpoint metrics after deployment | |
| Set up rate limits for production endpoints | |
| Enable inference tables for logging predictions | |
| Resources | |
| Model Registry: workspace.default.linear_regression_model | |
| Serving Endpoint: linear-regression-endpoint | |
| Endpoint URL: https://dbc-7e0a1c95-5a72.cloud.databricks.com/serving-endpoints/linear-regression-endpoint/invocations | |
| Documentation: https://docs.databricks.com/machine-learning/model-serving/ | |
| Status: ✅ Successfully deployed and accessible via REST API |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import mlflow.sklearn | |
| input_example = X[:1] | |
| with mlflow.start_run(): | |
| mlflow.sklearn.log_model( | |
| model, | |
| artifact_path="model", | |
| input_example=input_example, | |
| registered_model_name="linear_regression_model" | |
| ) | |
| mlflow.log_params({"fit_intercept": model.fit_intercept}) | |
| mlflow.log_metric("score", model.score(X, y)) | |
| # curl -X POST "https://dbc-7e0a1c95-5a72.cloud.databricks.com/serving-endpoints/linear-regression-endpoint/invocations" \ | |
| # -H "Content-Type: application/json" \ | |
| # -d '{ | |
| # "dataframe_split": { | |
| # "columns": ["x0", "x1", "x2", "x3", "x4"], | |
| # "data": [[1, 2, 3, 4, 5]] | |
| # } | |
| # }' | |
| # curl -X POST "https://dbc-7e0a1c95-5a72.cloud.databricks.com/serving-endpoints/linear-regression-endpoint/invocations" \ | |
| # -H "Content-Type: application/json" \ | |
| # -d '{ | |
| # "dataframe_split": { | |
| # "columns": ["x0", "x1", "x2", "x3", "x4"], | |
| # "data": [[1, 2, 3, 4, 5]] | |
| # } | |
| # }' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment