Question

我希望在我正在编写正则表达式的文本文档中匹配电子邮件地址。我为初学者想出了类似的东西 -

((?:[a-zA-Z]+[\w+\.\-]+[\-a-zA-Z]+))[ ]*((?:@|at))[ ]*(?:[a-zA-Z\.]+)

我想确保电子邮件地址的末尾是'edu'或'com'。我该怎么做呢？我正在使用Python。

我的文字文档中的一些示例电子邮件地址

alice @ so.edu
alice at sm.so.edu
alice @ sm.com

编辑 -

我想仅对此正则表达式进行更改。我的正则表达式适合我的数据中的其他几个例子。

Answer 1

((?:[a-zA-Z]+[\w+\.\-]+[\-a-zA-Z]+))[ ]*((?:@|at))[ ]*(?:[a-zA-Z\.]+)\.(com|edu)

编辑：对于“点”而不是“。”：

((?:[a-zA-Z]+[\w+\.\-]+[\-a-zA-Z]+))[ ]*((?:@|at))[ ]*(?:[a-zA-Z\.]+) *(\.|dot) *(com|edu)

Answer 2

首先，请参阅this answer，了解如何根据RFC822匹配所有有效电子邮件地址。

我个人不会修改正则表达式，而是在regexp匹配上使用email.Utils.parseaddr()，并检查生成的字符串.endswith("edu")或.endswith("com")。 E.g。

>>> email.Utils.parseaddr("kimvais@mailinator.com")[1].endswith(".com")
True