Regex in Python (part 2)
Find words of specifc length starting with specific letter⌗
Assuming we want to search for all the 2-character words that start with an i
and ends with e.g s, t, o, n, l
import re
text = "I live in Finland and the cold is killing me"
pattern = "i[stonl]"
matches = re.findall(pattern, text)
When printing the result matches
, you’ll get ['in', 'in', 'is', 'il', 'in']
, which come from
I live [in] F[in]land and the cold [is] k[il]l[in]g me
This pattern is not only finding a word, but also sub-string of a word that matches the pattern.
With ^
⌗
When we modify our pattern a bit by adding ^
so that it would look like this
pattern = "^i[stonl]"
Now our matches result will be []
. This is because ^
indicate that we’re looking for the word or substring of a word, which is at the beginning of the text, or in other word, in this case, i
must be the first character in our text.
How about "^i[stonl][nm]"
? This means that we’re searching for a substring of a word, which starts with i
as the first letter in the text, follow by one of the characters s, t, o, n, l
and ends with either n
or m
.
Extra tips:
- We can use
$
to search for the pattern at the end of a line. For example,r"me$"
will find any word that ends withme
at the end of the line. - If we want to specifically search for an independent word, use
\b
. For example:>>> text = "This is example0 and example1" >>> pattern = r"\bexample[01]?\b" >>> print(re.findall(pattern, text)) >>> ["example0", "example1"]
Hope this is helpful 😊.