Skip to content

Instantly share code, notes, and snippets.

View phucdev's full-sized avatar

Phuc Tran Truong phucdev

View GitHub Profile
@phucdev
phucdev / convert_dataset.py
Last active November 18, 2025 19:23
Convert ClassLabel/Sequence[ClassLabel] to string labels for HuggingFace Datasets
from datasets import Dataset, ClassLabel, Value, load_dataset
def convert_class_labels_to_str(examples: Dataset):
"""
Utility function to turn (shallow) ClassLabel indices to string labels.
This is common for datasets hosted on the Hugging Face datasets hub with data loading scripts, where the label(s)
are stored as ClassLabel or Sequence[ClassLabel] objects.
Often times we are interested in the string labels rather than the indices.
If any ClassLabel feature is embedded in a nested structure like a dict this will not work