Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save Sarverott/729b53dcb1d8695017004294247460fb to your computer and use it in GitHub Desktop.

Select an option

Save Sarverott/729b53dcb1d8695017004294247460fb to your computer and use it in GitHub Desktop.
example of use package for handling articles on, here using biblical names (wikipedia https://en.wikipedia.org/wiki/List_of_biblical_names) to prepare dataset for hugging face (https://huggingface.co/datasets/Apokryf/biblical-names-by-en-wikipedia) to make some generative human-readable namespace domains for servers.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Sarverott
Copy link
Author

key things that should been mentioned here:

  • some articles on wikipedia are unaccessible by their presented into public names, sometimes there is required to use alternative title like in case of "List of biblical names starting with B" we need to make query for "Information for "List of biblical names starting with B"" for proper call of article - THIS WAS NEVER MENTIONED BY ANY FOUND REFERENCE EITHER REDIT NOR STACK OVERFLOW NOR PACKAGE DOCS NOT MENTIONATE IT
  • standarization of formating like case of "List of biblical names starting with E" are breaking the pattern rules with no reason and principles for other languages will be unobvious while translations propably - this makes some standard usage of wikipedia useless until noone bothers to stop pretend that this is unexisting topic. it depends on user noticing it
  • results of correctness are yet not easy to check if there are not holes in collection

I don't know if there is need to inform developers that stuff above exists as presistent issues.
any feedback will be welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment