Skip to content

Instantly share code, notes, and snippets.

@thomasht86
Created January 5, 2026 07:44
Show Gist options
  • Select an option

  • Save thomasht86/57471489773a93ba19c603fe55325316 to your computer and use it in GitHub Desktop.

Select an option

Save thomasht86/57471489773a93ba19c603fe55325316 to your computer and use it in GitHub Desktop.
ONNX export sentence-transformers
from sentence_transformers import (
SentenceTransformer,
export_optimized_onnx_model,
export_dynamic_quantized_onnx_model,
)
# 1. Load the model to be optimized with the ONNX backend
model = SentenceTransformer(
"IEITYuan/Yuan-embedding-2.0-en",
backend="onnx",
)
# 2. Export the model with O4 optimization level
export_optimized_onnx_model(
model,
optimization_config="O4",
model_name_or_path="thomasht86/Yuan-embedding-2.0-en-ONNX",
push_to_hub=True,
)
for quantization_config in ["arm64", "avx2", "avx512", "avx512_vnni"]:
# 2. Export the model with static quantization
export_dynamic_quantized_onnx_model(
model,
quantization_config=quantization_config,
model_name_or_path="thomasht86/Yuan-embedding-2.0-en-ONNX",
push_to_hub=True,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment