Python 3 - 使用beautifulSoup在网页中查找文本

时间:2015-07-31 23:56:14

标签: python python-3.x beautifulsoup python-requests

我有这段代码

import requests
from bs4 import BeautifulSoup


url = "http://www.rockefeller.edu/research/areas/summary.php?id=1"
r = requests.get(url)
soup = BeautifulSoup(r.content)
a = 'Comments'
for x in (soup.find_all('p')):
    if a in x:
        print (x)
    else:
        print ('it is not there')

基本上,我有一句话,我想知道它在页面中的位置。让我们说我的话是评论&#39;。我想知道那个单词的评论在哪里:能够打印出包含它的标签(例如:<a href=#>Comments</a>

更新的代码(对我来说不起作用)

import requests
from bs4 import BeautifulSoup
import re


url = "http://www.rockefeller.edu/research/areas/summary.php?id=1"
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
for x in (soup.find_all(string=re.compile('comment', flags=re.I))):
    print(x.parent)
    print(x.parent.name)

2 个答案:

答案 0 :(得分:1)

使用编译正则表达式对象指定string关键字参数;它将返回字符串对象包含文本;您可以使用parent属性

访问包含文本的标记
import re

...

for x in soup.find_all(string=re.compile('comment', flags=re.I)):
    print(x.parent)
    print(x.parent.name)

答案 1 :(得分:0)

我得到了答案,现在是:

for x in (soup.find_all(True,text=re.compile(r'comment', re.I))):
print(x)