Find words of specifc length starting with specific letter

Assuming we want to search for all the 2-character words that start with an i and ends with e.g s, t, o, n, l

import re


text = "I live in Finland and the cold is killing me"
pattern = "i[stonl]"

matches = re.findall(pattern, text)

When printing the result matches, you’ll get ['in', 'in', 'is', 'il', 'in'], which come from

I live [in] F[in]land and the cold [is] k[il]l[in]g me

This pattern is not only finding a word, but also sub-string of a word that matches the pattern.

With ^

When we modify our pattern a bit by adding ^ so that it would look like this

pattern = "^i[stonl]"

Now our matches result will be []. This is because ^ indicate that we’re looking for the word or substring of a word, which is at the beginning of the text, or in other word, in this case, i must be the first character in our text.

How about "^i[stonl][nm]"? This means that we’re searching for a substring of a word, which starts with i as the first letter in the text, follow by one of the characters s, t, o, n, l and ends with either n or m.

Extra tips:

  • We can use $ to search for the pattern at the end of a line. For example, r"me$" will find any word that ends with me at the end of the line.
  • If we want to specifically search for an independent word, use \b. For example:
    >>> text = "This is example0 and example1"
    >>> pattern = r"\bexample[01]?\b"
    >>> print(re.findall(pattern, text))
    >>> ["example0", "example1"]
    

Hope this is helpful 😊.