我想通过正则表达式从字符串中提取一些信息,但是结果始终是None。源代码如下:
line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
x = re.match(r'property=".+?"',line)
print(x)
我要提取内容和属性元组,该如何解决?
答案 0 :(得分:0)
我会建议更合适的东西。
from bs4 import BeautifulSoup
line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
soup = BeautifulSoup(line, 'lxml')
print("Content: {}".format(soup.meta["content"]))
print("Property: {}".format(soup.meta["property"]))
输出:
Content: Allrecipes
Property: og:site_name
答案 1 :(得分:0)
@DirtyBit的答案比使用正则表达式更好。但是,如果您仍然想使用正则表达式,可能会有所帮助(RegexDemo):
line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
regex = re.search("content=\\\"(?P<content>.*)\\\".*property=\\\"(?P<prop>.*)\\\"\/>",line)
print (regex.groups())
输出:
('Allrecipes', 'og:site_name')