Question

我需要访问字符串＆＃39;我的网站＆＃39;，但这个简单的正则表达式不能与p标签匹配：

data = """<p>Site: <a href="www.example.com" style="font-weight: 100;">My site</a></p>"""
soup = BeautifulSoup(data, 'lxml')
site = soup.find('p', text = re.compile('Site: '))
print site
>> None

如果我尝试：

data = """<p>Site: <a href="www.example.com" style="font-weight: 100;">My site</a></p>"""
soup = BeautifulSoup(data, 'lxml')
site = soup.findAll('p')
print site
>> [<p>Site: <a href="www.example.com" style="font-weight: 100;">My site</a></p>]

它有效。当然，此标记位于包含各种p标记的页面上，因此我不想通过索引访问它。

Answer 1

您可以遍历所有p代码并查看其中每个代码是否都包含'Site'：

for p in soup.findAll('p'):
    if(re.match('(Site)', p.text):
        print (p.text) # Site: My site

Python - BeautifulSoup不能将标签与特定文本匹配（使用re.compile）

1 个答案: