Regex in Python (part 3)
Find expression containing numbers and symbols in a specific format⌗
Assuming that we have this piece of text that contains an IPv4 address that we want to extract.
import re
text = "You've recently logged in from an IP address 111.222.211.122"
The full range of IP addresses can go from 0.0.0.0 to 255.255.255.255, so we can use the following regex pattern to search
pattern = "\d\d\d.\d\d\d.\d\d\d.\d\d\d"
The result of this will be
>>> print(re.findall(pattern, text))
>>> ['111.222.211.122']
However, if the text now has something extra like this:
text = "You've recently logged in from an IP address 111.222.211.122, and something weird like this 123123123123122"
now our search result will be ['111.222.211.122', '123123123123122']
. This is because:
\d\d\d
will try to match any 3 digit numbers.
actually will try to match anything, so1231
would match with the pattern\d\d\d.
. The same goes for123!
or123@
.
If we want to specifically match the dot .
, add a backslash \
in front of the dot. It’s going to be like this: \d\d\d\.
Extra tips:
- the pattern
\d+
will help matching a number of any length