Skip to content

Instantly share code, notes, and snippets.

@johnrcui
Last active February 6, 2025 22:56
Show Gist options
  • Select an option

  • Save johnrcui/deac6392655a6eac1d062d9c163ccd1a to your computer and use it in GitHub Desktop.

Select an option

Save johnrcui/deac6392655a6eac1d062d9c163ccd1a to your computer and use it in GitHub Desktop.
Split full name into parts

Name Splitter

This uses the list of known family name prefixes to identify the correct family name in order to split a person's full name into individual parts. Note that this does not attempt to deal with every honorifics and post-nominal letters. This however supports more common honorific (ie. Mr., Ms., Mrs., etc.) and suffixes (ie. Jr., Sr., I, II, III, etc.).

This is useful for decomposing names entered into a single field before storing the data or for converting existing data that uses full names into name parts.

Regular Expression

The following regex splits the name into 8 identifiable parts:

Group Description
honorific Mr., Mrs., Ms. and other common uses
given_name The full combination of first name and all other names except last
first_name The first of the given names (can also just be an initial)
middle_name One or more middle names (can also just be initials)
last_name The last or family name which include common prefixes
suffix The generation suffix
post_nominal Any other information after the name after a comma
name A fallback for when only a single name is provided
^(?<name>[^.,?;:\s]+)$|^(?:(?<honorific>(?:mrs?|ms|dr|prof|rev|hon)\.?|miss|sir|dame|lord|lady)\s+)?(?:(?<given_name>(?<first_name>[^.,?;:\s]+\.?)(?:\s+(?<middle_name>(?:[^.,?;:\s]+\.?)(?:\s+[^.,?;:\s]+\.?)*?))??)\s+)??(?:(?<family_name>(?:(?:(?:(?:a|ab|af|ap|abu|aït|al|ālam|at|ath|aust|austre|bar|bath|bat|ben|bin|ibn|bet|bint|da|das|de la|degli|del|dele|della|der|di|dos|du|e|el|fetch|vetch|fitz|i|ka|kil|gil|la|le|lille|lu|m'|mac|mc|mck|mhic|mic|mala|mellom|myljom|na||ned|nedre|neder|ngā|nic|ní|nin|nord|norr|ny|o|ó|ua|uí|opp|upp|öfver|ost|öst|öster|øst|østre|över|øvste|øvre|øver|öz|pour|putra|putera|putri|puteri|setia|setya|stor|söder|sør|sønder|syd|søndre|syndre|søre|te|ter|ter|tre|van|van de|van den|van der|van het|van 't|väst|väster|verch|erch|vest|vestre|vesle|vetle|von|war|zu|von und zu)\s)?[^.,!?;:\s]+)-)?(?:(?:a|ab|af|ap|abu|aït|al|ālam|at|ath|aust|austre|bar|bath|bat|ben|bin|ibn|bet|bint|da|das|de la|degli|del|dele|della|der|di|dos|du|e|el|fetch|vetch|fitz|i|ka|kil|gil|la|le|lille|lu|m'|mac|mc|mck|mhic|mic|mala|mellom|myljom|na||ned|nedre|neder|ngā|nic|ní|nin|nord|norr|ny|o|ó|ua|uí|opp|upp|öfver|ost|öst|öster|øst|østre|över|øvste|øvre|øver|öz|pour|putra|putera|putri|puteri|setia|setya|stor|söder|sør|sønder|syd|søndre|syndre|søre|te|ter|ter|tre|van|van de|van den|van der|van het|van 't|väst|väster|verch|erch|vest|vestre|vesle|vetle|von|war|zu|von und zu)\s+)?[^.,?;:\s]+)??(?:,?\s+(?<suffix>Sr\.?|Snr|Jr\.?|Jnr|[IVX]+))?)?(?<post_nominal>,.*)?$

See: https://regex101.com/r/P092QP/1

Special Cases

  • When an honorific and a single name is provided, the name part is assigned as a family_name to follow the more common use case such as "Mr. Johnson" or "Ms. Andrews"
  • Single letter initials are accepted such as "J. K. Rowling"
  • Certain punctuations within names are accepted such as "King T'Challa"

Family Name Prefix List

The following are the supported family name prefixes

A A, Ab, Abu, Af, Aït, Al, Ālam, Ap, At, Ath, Aust, Austre
B Bar, bat, Bath, Ben, Bet, bin, Bint
D Da, Das, De, de la, Degli, Del, Dele, Della, Der, Di, Dos, Du
E E, El, Erch
F Fetch, Fitz
G Gil
I i, ibn
K ka, Kil
L La, Le, Lille, Lu
M M', Mac, Mala, Mc, Mck, Mellom, Mhic, Mic, Myljom
N Na, Ned, Neder, Nedre, Ngā, , Nic, Nin, Nord, Norr, Ny
O O, Ó, Öfver, Opp, Ost, Öst, Øst, Öster, Østre, Över, Øver, Øvre, Øvste, Öz
P Pour, Putera, Puteri, Putra, Putri
S Setia, Setya, Söder, Sønder, Søndre, Sør, Søre, Stor, Syd, Syndre
T Te, Ter, Ter, Tre
U Ua, , Upp
V Van, Van 't, Van De, Van Den, Van Der, Van Het, Väst, Väster, Verch, Vesle, Vest, Vestre, Vetch, Vetle, von, von und zu
W war
Z zu
?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment