- https://huggingface.co/datasets/Jarbas/locallingua_pt - Recordings from Portugal scrapped from https://localingual.com
- https://huggingface.co/datasets/Jarbas/pt_basics - phonetically diverse standalone words, letters, diphtongs and basic greetings, scrapped from https://www.learningportuguese.co.uk/guide/compare-accents
- https://huggingface.co/datasets/Jarbas/compare-accents-pt - small dataset of multiple portuguese speakers from various dialects speaking the same sentence, scrapped from https://www.learningportuguese.co.uk/guide/compare-accents
- https://huggingface.co/datasets/Jarbas/VocativesEuropeanPortuguese - mirror from dataset of https://www.clul.ulisboa.pt/en/recurso/vocatives-european-portuguese
- https://huggingface.co/datasets/Jarbas/InstitutoCamoes - mirror dataset of https://www.instituto-camoes.pt
- https://huggingface.co/datasets/Jarbas/SpokenPortugueseGeographicalSocialVarieties - mirror dataset of https://www.clul.ulisboa.pt/en/recurso/spoken-portuguese-geographical-and-social-varieties
- https://huggingface.co/datasets/Jarbas/yes_no_answers - classify responses to yes/no questions
- https://huggingface.co/datasets/Jarbas/utterance_tags - categorizes sentences into commands, questions, and statements, with sub-labels for specific types of each category.
- https://huggingface.co/datasets/Jarbas/ovos_intent_examples - tiny dataset, manually labelled sentence examples from across OVOS skill README files and metadata skill.json
- https://huggingface.co/datasets/Jarbas/OVOSGitLocalize-Intents - Multilingual Gitlocalize export from OpenVoiceOS data
- https://huggingface.co/datasets/Jarbas/music_queries_templates - MusicQueries Dataset is a synthetic natural language dataset focused on music-related utterances designed for media playback scenarios
- https://huggingface.co/datasets/Jarbas/music_queries_metal_bands
- https://huggingface.co/datasets/Jarbas/music_queries_jazz
- https://huggingface.co/datasets/Jarbas/music_queries_prog
- https://huggingface.co/datasets/Jarbas/music_queries_classical
- https://huggingface.co/datasets/Jarbas/music_queries_metal_tracks
- https://huggingface.co/datasets/Jarbas/music_queries_psytrance_tracks
- https://huggingface.co/datasets/Jarbas/WikidataMediaEntities - collection of media related keywords from Wikidata collected via SPARQL queries
- https://huggingface.co/datasets/Jarbas/metal-archives-bands - from enclyclopedia-metallvm
- https://huggingface.co/datasets/Jarbas/metal-archives-tracks - from enclyclopedia-metallvm
- https://huggingface.co/datasets/Jarbas/jazz-music-archives - from jazz-music-archives
- https://huggingface.co/datasets/Jarbas/prog-archives - from prog-archives
- https://huggingface.co/datasets/Jarbas/classic-composers
- https://huggingface.co/datasets/Jarbas/trance_tracks
- https://huggingface.co/datasets/Jarbas/movie_actors - from imdb
- https://huggingface.co/datasets/Jarbas/movie_directors - from imdb
- https://huggingface.co/datasets/Jarbas/movie_writers - from imdb
- https://huggingface.co/datasets/Jarbas/movie_producers - from imdb
- https://huggingface.co/datasets/Jarbas/movie_composers - from imdb
- https://huggingface.co/datasets/Jarbas/synthetic-wakewords - hundreds of TTS samples for common wake words
- https://huggingface.co/datasets/Jarbas/wake_word_noise - mainly false activations from real world OVOS usage
- https://huggingface.co/datasets/Jarbas/building_106_kitchen_3secs - 3 seconds crops of every day household sounds
- https://huggingface.co/datasets/Jarbas/public_domain_sounds_3secs - 3 seconds crops of various sound
- https://huggingface.co/datasets/Jarbas/FMA_3secs - 3 second crops of various music genred from the FreeMusicArchive