Question

我正在使用python 3.6和beautfulsoup4

我有第一个节点：

title = self.html.find(id=(lists[1][selectionindex]))

这不是无，并且此节点以红色突出显示：

如何获取蓝色（任何div / p节点）的节点列表，但是当它到达h1 / 2/3节点时停止

Answer 1

您可以使用BeautifulSoup的find_all函数获取特定HTML标记的所有实例。

import requests, lxml
from bs4 import BeautifulSoup

r = requests.get('https://www.google.co.uk')
soup = BeautifulSoup(r.content, "lxml")

f = soup.find_all("p")

for p in f:
    print p

Answer 2

我假设您可以使用某些<h3>找到第一个id代码（如代码所示）。以下代码将获取<p>代码后的所有<div>和<h3>代码，并在遇到<h1>，<h2>或{{1}时停止标签。

<h3>

find_all_next()方法返回当前标记之后的所有标记的列表。

如果您想要两个此类代码之间的所有代码，而不仅仅是html = ''' <p>unwanted</p> <h3 id="special">some text</h3> <div class="foo">wanted</div> <p>wanted</p> <p>wanted</p> <p>wanted</p> <h2>some text</h2>''' soup = BeautifulSoup(html, 'html.parser') list_of_wanted_tags = [] starting_tag = soup.find('h3', id='special') for tag in starting_tag.find_all_next(): if tag.name in ('div', 'p'): list_of_wanted_tags.append(tag) elif tag.name in ('h1', 'h2', 'h3'): break print(list_of_wanted_tags) # [<div class="foo">wanted</div>, <p>wanted</p>, <p>wanted</p>, <p>wanted</p>]和<p>代码，请使用以下代码：

<div>

或者，使用itertools.takewhile：

for tag in starting_tag.find_all_next():
    if tag.name in ('h1', 'h2', 'h3'):
        break
    list_of_wanted_tags.append(tag)

美丽的汤在节点后返回'p'节点

2 个答案: