Description
Capture Groups
Ranges
Match Count
Literals
Final Expression
Demo
Future
Author
The following tutorial explains how we can capture IP Addresses using regular expressions. The expression in question we'll
be focussing on is: ([0-9]{1,3}\.){3}[0-9]{1,3} . This is a very basic capture that does not yet take into account valid octets ie: 0-255.
Regular expressions include a concept called capture groups. Capture groups contain patterns that can be reference throughout the regular expression. To leverage this feature you simply enclose the desired pattern in parenthesis as shown in the regex within the description
Alphanumeric ranges can be defined within the [] operators. Here [0-9] we have expressed that we want a valid number between
zero and nine.
Now that we have identified our numeric range, we need to create a pattern that allows that pattern to only match one to three numbers, since IP address octets must have atleast one digit and no more than three. We acheive this by declaring our match count after our range {1,3} .
In regular expression sometimes you want to match a literal character. This can be achieved by simple placing the character within your pattern, but every so often you end up wanting to use a literal that is a reserved operator. In our instance we are wanting to use a literal "." period, however, period are reserved to match any character in regex. In order to work around this we can use an escape character to tell the engine we don't want the match, just the character. We do this be using the "" slash followed by reserved character in question. In our case, the pattern is "." .
We now have enough information to solve the entire expression, so we're going to walk through it in summation here.
([0-9]{1,3})\. - This reads: capture a number between one and three digits followed by a period, three times. sample output might look like: 127.0.0.
[0-9]{1,3} - Since we only capture three octets and we need four as IPv4 is defined as a tri-decimally noted series of octets, we are simply going to repeat the initial range and match we defnied first . This will give us our final octet.
Using using Bash v4.x we'll create a random IP Address and add some text before it, then try to capture just the IP portion using our regex
ip=""; for i in {1..4}; do ip=$ip$((1 + RANDOM % 255))"."; done; echo -n " RANDOM CHARACTERS TO NOT CAPTURE: ${ip::-1} "| grep -Eio "([0-9]{1,3}\.){3}[0-9]{1,3}"
sample output: "75.139.44.215"
note: this doesn't work with previous versions of bash because they don't support negative substring in versions prior to v4. This could be an issue with MAC OS due to the fact it uses v3.x natively. Use the bash expression below for v3.
Bash v3:
ip=""; for i in {1..4}; do ip=$ip$((1 + RANDOM % 255))"."; done; echo -n " RANDOM CHARACTERS TO NOT CAPTURE: ${ip} "| grep -Eio "([0-9]{1,3}\.){3}[0-9]{1,3}"
validate 0-255 octets
Regular Expression tutorial