Regex in Python (part 2)
Find words of specifc length starting with specific letter⌗
Assuming we want to search for all the 2-character words that start with an i and ends with e.g s, t, o, n, l
import re
text = "I live in Finland and the cold is killing me"
pattern = "i[stonl]"
matches = re.findall(pattern, text)
When printing the result matches, you’ll get ['in', 'in', 'is', 'il', 'in'], which come from
I live [in] F[in]land and the cold [is] k[il]l[in]g me
This pattern is not only finding a word, but also sub-string of a word that matches the pattern.
With ^⌗
When we modify our pattern a bit by adding ^ so that it would look like this
pattern = "^i[stonl]"
Now our matches result will be []. This is because ^ indicate that we’re looking for the word or substring of a word, which is at the beginning of the text, or in other word, in this case, i must be the first character in our text.
How about "^i[stonl][nm]"? This means that we’re searching for a substring of a word, which starts with i as the first letter in the text, follow by one of the characters s, t, o, n, l and ends with either n or m.
Extra tips:
- We can use
$to search for the pattern at the end of a line. For example,r"me$"will find any word that ends withmeat the end of the line. - If we want to specifically search for an independent word, use
\b. For example:>>> text = "This is example0 and example1" >>> pattern = r"\bexample[01]?\b" >>> print(re.findall(pattern, text)) >>> ["example0", "example1"]
Hope this is helpful 😊.