Question

我是Python的初级学生。以下是我必须从网页中查找电子邮件地址实例的代码。

    page = urllib.request.urlopen("http://website/category")
    reg_ex = re.compile(r'[-a-z0-9._]+@([-a-z0-9]+)(\.[-a-z0-9]+)+', re.IGNORECASE
    m = reg_ex.search_all(page)
    m.group()

当我运行它时，Python模块说有一个无效的语法，它就在线上：

    m = reg_ex.search_all(page)

有人会告诉我它为什么无效吗？

Answer 1

考虑另一种选择：

## Suppose we have a text with many email addresses
str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) 
    ## ['alice@google.com', 'bob@abc.com']    
for email in emails:
    # do something with each found email string
    print email

来源：https://developers.google.com/edu/python/regular-expressions

Answer 2

此行没有关闭)：

reg_ex = re.compile(r'[a-z0-9._]+@([-a-z0-9]+)(\.[-a-z0-9]+)+', re.IGNORECASE)

另外，你的正则表达式无效，试试这个：

"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"

仅供参考，使用正则表达式验证电子邮件并非易事，请参阅以下主题：

Answer 3

此外，reg_ex没有search_all方法。你应该传递page.read()。

Answer 4

重新模块

没有.search_all方法

您正在寻找的是.findall

你可以尝试

re.findall(r"(\w(?:[-.+]?\w+)+\@(?:[a-zA-Z0-9](?:[-+]?\w+)*\.)+[a-zA-Z]{2,})", text)

我认为text是要搜索的文字，在您的情况下应为text = page.read()

或者您需要编译正则表达式：

r = re.compile(r"(\w(?:[-.+]?\w+)+\@(?:[a-z0-9](?:[-+]?\w+)*\.)+[a-z]{2,})", re.I)
results = r.findall(text)

注意： .findall返回匹配列表

如果您需要迭代以获得匹配对象，则可以使用.finditer

（来自之前的例子）

r = re.compile(r"(\w(?:[-.+]?\w+)+\@(?:[a-z0-9](?:[-+]?\w+)*\.)+[a-z]{2,})", re.I)
for email_match in r.finditer(text):
    email_addr = email_match.group() #or anything you need for a matched object

现在问题是您必须使用的Regex：）

Answer 5

将r'[-a-z0-9._]+@([-a-z0-9]+)(\.[-a-z0-9]+)+'更改为r'[aA-zZ0-9._]+@([aA-zZ0-9]+)(\.[aA-zZ0-9]+)+'。 a-z 之前的-字符是原因

使用正则表达式在网页中查找电子邮件地址

5 个答案: