This uses the list of known family name prefixes to identify the correct family name in order to split a person's full name into individual parts. Note that this does not attempt to deal with every honorifics and post-nominal letters. This however supports more common honorific (ie. Mr., Ms., Mrs., etc.) and suffixes (ie. Jr., Sr., I, II, III, etc.).
This is useful for decomposing names entered into a single field before storing the data or for converting existing data that uses full names into name parts.
The following regex splits the name into 8 identifiable parts:
| Group | Description |
|---|---|
honorific |
Mr., Mrs., Ms. and other common uses |
given_name |
The full combination of first name and all other names except last |
first_name |
The first of the given names (can also just be an initial) |
middle_name |
One or more middle names (can also just be initials) |
last_name |
The last or family name which include common prefixes |
suffix |
The generation suffix |
post_nominal |
Any other information after the name after a comma |
name |
A fallback for when only a single name is provided |
^(?<name>[^.,?;:\s]+)$|^(?:(?<honorific>(?:mrs?|ms|dr|prof|rev|hon)\.?|miss|sir|dame|lord|lady)\s+)?(?:(?<given_name>(?<first_name>[^.,?;:\s]+\.?)(?:\s+(?<middle_name>(?:[^.,?;:\s]+\.?)(?:\s+[^.,?;:\s]+\.?)*?))??)\s+)??(?:(?<family_name>(?:(?:(?:(?:a|ab|af|ap|abu|aït|al|ālam|at|ath|aust|austre|bar|bath|bat|ben|bin|ibn|bet|bint|da|das|de la|degli|del|dele|della|der|di|dos|du|e|el|fetch|vetch|fitz|i|ka|kil|gil|la|le|lille|lu|m'|mac|mc|mck|mhic|mic|mala|mellom|myljom|na|ณ|ned|nedre|neder|ngā|nic|ní|nin|nord|norr|ny|o|ó|ua|uí|opp|upp|öfver|ost|öst|öster|øst|østre|över|øvste|øvre|øver|öz|pour|putra|putera|putri|puteri|setia|setya|stor|söder|sør|sønder|syd|søndre|syndre|søre|te|ter|ter|tre|van|van de|van den|van der|van het|van 't|väst|väster|verch|erch|vest|vestre|vesle|vetle|von|war|zu|von und zu)\s)?[^.,!?;:\s]+)-)?(?:(?:a|ab|af|ap|abu|aït|al|ālam|at|ath|aust|austre|bar|bath|bat|ben|bin|ibn|bet|bint|da|das|de la|degli|del|dele|della|der|di|dos|du|e|el|fetch|vetch|fitz|i|ka|kil|gil|la|le|lille|lu|m'|mac|mc|mck|mhic|mic|mala|mellom|myljom|na|ณ|ned|nedre|neder|ngā|nic|ní|nin|nord|norr|ny|o|ó|ua|uí|opp|upp|öfver|ost|öst|öster|øst|østre|över|øvste|øvre|øver|öz|pour|putra|putera|putri|puteri|setia|setya|stor|söder|sør|sønder|syd|søndre|syndre|søre|te|ter|ter|tre|van|van de|van den|van der|van het|van 't|väst|väster|verch|erch|vest|vestre|vesle|vetle|von|war|zu|von und zu)\s+)?[^.,?;:\s]+)??(?:,?\s+(?<suffix>Sr\.?|Snr|Jr\.?|Jnr|[IVX]+))?)?(?<post_nominal>,.*)?$See: https://regex101.com/r/P092QP/1
- When an honorific and a single name is provided, the name part is assigned as a
family_nameto follow the more common use case such as "Mr. Johnson" or "Ms. Andrews" - Single letter initials are accepted such as "J. K. Rowling"
- Certain punctuations within names are accepted such as "King T'Challa"
The following are the supported family name prefixes
| A | A, Ab, Abu, Af, Aït, Al, Ālam, Ap, At, Ath, Aust, Austre |
| B | Bar, bat, Bath, Ben, Bet, bin, Bint |
| D | Da, Das, De, de la, Degli, Del, Dele, Della, Der, Di, Dos, Du |
| E | E, El, Erch |
| F | Fetch, Fitz |
| G | Gil |
| I | i, ibn |
| K | ka, Kil |
| L | La, Le, Lille, Lu |
| M | M', Mac, Mala, Mc, Mck, Mellom, Mhic, Mic, Myljom |
| N | Na, Ned, Neder, Nedre, Ngā, Ní, Nic, Nin, Nord, Norr, Ny |
| O | O, Ó, Öfver, Opp, Ost, Öst, Øst, Öster, Østre, Över, Øver, Øvre, Øvste, Öz |
| P | Pour, Putera, Puteri, Putra, Putri |
| S | Setia, Setya, Söder, Sønder, Søndre, Sør, Søre, Stor, Syd, Syndre |
| T | Te, Ter, Ter, Tre |
| U | Ua, Uí, Upp |
| V | Van, Van 't, Van De, Van Den, Van Der, Van Het, Väst, Väster, Verch, Vesle, Vest, Vestre, Vetch, Vetle, von, von und zu |
| W | war |
| Z | zu |
| ? | ณ |