Question

我有这段代码

import requests
from bs4 import BeautifulSoup


url = "http://www.rockefeller.edu/research/areas/summary.php?id=1"
r = requests.get(url)
soup = BeautifulSoup(r.content)
a = 'Comments'
for x in (soup.find_all('p')):
    if a in x:
        print (x)
    else:
        print ('it is not there')

基本上，我有一句话，我想知道它在页面中的位置。让我们说我的话是评论＆＃39;。我想知道那个单词的评论在哪里：能够打印出包含它的标签（例如：<a href=#>Comments</a>

更新的代码（对我来说不起作用）

import requests
from bs4 import BeautifulSoup
import re


url = "http://www.rockefeller.edu/research/areas/summary.php?id=1"
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
for x in (soup.find_all(string=re.compile('comment', flags=re.I))):
    print(x.parent)
    print(x.parent.name)

Answer 1

使用编译正则表达式对象指定string关键字参数;它将返回字符串对象包含文本;您可以使用parent属性

访问包含文本的标记

import re

...

for x in soup.find_all(string=re.compile('comment', flags=re.I)):
    print(x.parent)
    print(x.parent.name)

Answer 2

我得到了答案，现在是：

for x in (soup.find_all(True,text=re.compile(r'comment', re.I))):
print(x)

Python 3 - 使用beautifulSoup在网页中查找文本

2 个答案: