- Introduction
- Modes
- Literal characters
- Metacharacters
- Backreferences
- Special Characters
- Useful Expressions
Regular expressions are symbols representing a text pattern. They are used for matching, searching and replacing text.
The goal in regular expressions is to match both what you want and only what you want!
- Standard -
/re/ - Global -
/re/g - Case-insensitive -
/re/i - Multiline Anchors -
/re/m - Dot-matches-all -
/re/s
Modes are defined after the last / of the regular expression, and could be used together.
For example, using both global and case insensitive modes: /re/gi
/car/ matches "car"
/car/ matches the first three letters of "carnival"
- Case sensitive by default (best practice)
For example:/car/doesn't match anything in "Carnival" - Standatd (non-global) matching - earliest (leftmost) match is always prefered.
word: "pazzazz"
/zz/ - will match pazzazz
/zz/g - will match pazzazz
There are only few metacharacters to learn:
\ . * + - { } [ ] ^ & $ | ? ( ) : ! =
.- Any character except newline
Examples:
/h.t/- matches "hot" , "hat" , "hit" but not "heat"/.a.a.a/- matches "banana" , "#aga!a" , " a asa"
Notice for common mistake:
/9.00/ - matches "9.00", "9500" and "9-00"
\- Escape the next metacharacters
Note that literal characters shouldn't be escaped
Examples:
/9\.00/- matches "9.00" but not "9500" or "9-00"/\/home\/usr\/doc\.txt/- matches "/home/usr/doc.txt"
-
[,]- Defining a character set (begin and end), but only one
Order of characters does not matter
Note: Metacharacters shouldn't be escaped inside a character set - they are already escaped (Except],-,^,\) Examples:/gr[ea]y/- matches both "grey" and "gray"/gr[ea]t/- doesn't match "great"/h[abc.xyz]t/- matches "hat" and "h.t" - the.is already escaped./var[[(][0-9][)\]]/- matches "var(3)" and "var(4)"/file[0\-\\_]1/- matches "file01", "file-1", "file\1" and "file_1"
Shorten character set:
\d- all digits (same as[0-9])\w- work character (same as[a-zA-Z0-9_])\s- whitespace (same as[ \t\r\n])\D- not digits (same as[^0-9])\W- not work character (same as[^a-zA-Z0-9_])\S- not whitespace (same as[^ \t\r\n])
-
-- Range of characters - represents all characters between two characters
Only inside a character set
Examples:/[0-9]/- matches for any digit/[A-Za-z]/- matches for all letters/[a-ek-ou-y]/- any letter in the specified range
Caution:
/[50-99]/ - is not all numbers from 50 to 99
^- Negate a character set - adding it as the first of character set
Still represents only one character
Examples:/see[^mn]/- matches "seek" and "sees" but not "seem" or "seen"
Caution:
/see[^mn]/ - matches "see " but not "see"
-
*- Preceding item zero or more times
Examples:/apples*/- matches "apple", "apples" and "applessss"/\d\d\d\d*/- matches numbers with three digits or more
-
+- Preceding item one or more times
Examples:/apples+/- matches "apples" and "applessss", but not "apple"/<[^>]+>/- matches any HTML tag
-
?- Preceding item zero or one time
Note that literal characters shouldn't be escaped
Examples:/apples*/- matches "apple", "apples" but not "applessss"/colou?r/- matches "color" and "colour"
-
{,}- Starting and ending quantified repetition of preceding item
Getting{min,max}- positive numbers. Min must always be included (can be zero). Max is optional.
Examples:/\d{4,8}/- matches numbers with four or eight digits/\d{4}/- matches numbers exactly four digits/\d{4,}/- matches numbers with four or more digits (max is infinite)
-
(,)- Grouping metacharacters
Makes the expressions easier to read. Cannot be used inside character set.
Examples:/(abc)+/- matches "abc" and "abcabcabc"/(in)?dependent/- matches "independent" and "dependent"/run(s)?/- is the same as/runs?/
-
|- Match previous or next expression
Examples:/apple|orange/- matches "apple" and "orange"/w(ei|ie)rd/- matches "weird" and "wierd"/(AA|BB|CC){6}/- matches "AABBAACCAABB" and more../(\d\d|[A-Z][A-Z]){3}/- matches "112233", "AA66ZZ", "11AA44" and more..
-
Anchors Metacharacters:
Anchors refers to a position, not an actual character. They are zero-width.^: Start of string / line. (Not the same as at start of a character set)
$: End of string / line
Examples:/^apple/- matches "apple" only if it's on a beginning of a string/line/apple$/- matches "apple" only if it's on a end of a string/line
Stores the matched portion in parentheses.
/a(p{2}l)l/ matches "apple" and stores "ppl". It is done automatically by default.
Refer to first backreference with \1.
\1 through \9 - backreferences for positions 1 to 9.
Usage:
- Can be used in the same expression as the group.
- Can be accessed after the match is complete (programming language needed).
Examples:
/(apples) to \1/- matches "apples to apples"/(ab)(cd)(ed)\3\2\1/- matches "abcdefefcdab"/<(i|em)>.+?</\1>/- matches "Hello" and "Hello"
- Spaces - space is a regular character
- Tabs - tabs are matchable by
\t - Line -
\r,\n,\r\n
** depends on your file mode - Non-printable characters:
- bell
\a - escape
\e
- bell
- Names:
/^\w+/- Not that good solution/^[A-Z][a-z.']+ [A-Z][a-z.']+/- Matches first name and last
- Email Adresses:
/^[\w.\-]+@[\w.\-]+\.[A-Za-z]{2,3}$/- Matches email
- URLs:
/^(http|https):\/\/[\w.\-]+(\.[\w\-]+)+[/#?]?.*$/
- IPs:
/^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/m- It is long, but assures that we won't get higher than 255 for each number.
- HTML tags:
/<([^>]+)>(.*?)</\1>/