Question

我只是为python启动正则表达式，遇到了这个问题，我应该从字符串中提取URL：

str = "<tag>http://example-1.com</tag><tag>http://example-2.com</tag>"

我的代码是：

import re

url = re.findall('<tag>(.*)</tag>', str)

print(url)

返回：

[http://example-1.com</tag><tag>http://example-2.com]

如果有人能指出我如何解决这个问题的方向，那将是最感激的！

谢谢大家！

Answer 1

您正在使用正则表达式，并且将与此类表达式匹配的HTML变得太复杂，太快了。

您可以使用BeautifulSoup解析HTML。

例如：

from bs4 import BeautifulSoup

str = "<tag>http://example-1.com</tag><tag>http://example-2.com</tag>"
soup = BeautifulSoup(str, 'html.parser')
tags = soup.find_all('tag')
for tag in tags:
        print tag.text

Answer 2

仅使用重新包装：

import re
str = "<tag>http://example-1.com</tag><tag>http://example-2.com</tag>"
url = re.findall('<tag>(.*?)</tag>', str)
print(url)

返回：

['http://example-1.com', 'http://example-2.com']

希望有帮助！

从字符串中提取URL

2 个答案: