Question

我正试图通过Python的beautifulsoup抓取一个网页。＆＃xA;这是页面源代码的一部分：

＆＃xA;＆＃xA;

 ＆lt; div style =“display：flex”＆gt;＆＃xA; ＆lt; div class =“half”style =“font-size：0.8em; width：33％;”＆gt; apple＆lt; / div＆gt;＆＃xA; ＆lt; div class =“half”style =“font-size：0.8em; text-align：center; width：28％;”＆gt;桃子＆lt; / div＆gt;＆＃xA; ＆lt; div class =“half”style =“font-size：0.8em; text-align：right; width：33％;”标题= “nofruit” ＆GT;黄瓜＆lt; / div＆gt;＆＃xA;＆lt; / div＆gt;＆＃xA;

＆＃xA;＆＃xA;

所以我想要的是第三行（包含文本“桃子”的那个）。所以我尝试了这个：

＆＃xA;＆＃xA;

  for soup.findAll（'div'，attrs = {'class'：'half'}）：＆＃xA ;如果'font-size：0.8em; text-align：center; width：28％;'在str（水果）：＆＃xA; print（fruits.text）＆＃xA;

＆＃xA;＆＃xA;

不幸的是它根本不打印任何东西。我尝试了其他一些东西，但我找不到一个有效的解决方案。

＆＃xA;＆＃xA;

提前致谢！

＆＃xA;＆＃xA;

编辑：

＆＃xA ;＆＃xA;

对不起，我想我不够精确。我试图循环一堆有点相同的源代码，文本“桃子”不会一直保持不变。它可能是“桃子”，“草莓”，“香蕉”，“金枪鱼”或任何其他食物。只有课程和风格总是一样的。

＆＃xA;＆＃xA;

EDIT2：

＆＃xA;＆＃xA;

受到alexce解决方案的启发，我找到了解决问题的方法：

＆＃xA;＆＃xA;

  div = soup.find（'div'，attrs = {'style' ：'display：flex'}）＆＃xA; inner_divs = div.findAll（'div'，attrs = {'class'：'half'}）＆＃xA; fruits = inner_divs [1] .text＆＃xA;

＆＃xA;＆＃xA;

可能不是最好的解决方案，但它对我的小程序来说已经足够了：）

＆ #xA;＆＃xA;

BTW：祝大家新年快乐！

＆＃xA;

Answer 1

与上一个答案一样，我假设您使用的是bs4。

从我明白你需要根据属性过滤div：class和style。

find_all()能够选择多个属性和标签类型。请参阅Doc，最后，文档说您可以通过将字典传递到attrs函数的find_all()关键字参数来传递多个属性。

from bs4 import BeautifulSoup
html = """<div style="display: flex">
            <div class="half" style="font-size: 0.8em;width: 33%;"> apple </div>
            <div class="half" style="font-size: 0.8em;text-align: center;width: 28%;"> peach </div>
            <div class="half" style="font-size: 0.8em;text-align: right;width: 33%;" title="nofruit"> cucumber </div>
        </div>"""

soup = BeautifulSoup(html, "html.parser")
divs = soup.find_all('div', attrs={'style': 'font-size: 0.8em;text-align: center;width: 28%;', 'class': 'half'})
for div in divs:
    print(div.text)

输出符合要求

peach

Answer 2

首先，我假设你正在使用beautifulsoup4：

from bs4 import BeautifulSoup

此外，如果所需的div始终是第二个，您可以通过索引获取它：

for div in soup.select('div[style$=flex]'):
    inner_divs = div.find_all("div", class_="half")

    print(inner_divs[1].get_text())

或者，与nth-of-type一起去：

for div in soup.select('div[style$=flex] div:nth-of-type(2)'):
    print(div.get_text())

仅供参考，此select()来电是CSS selector次搜索。

Python通过<style>

2 个答案: