soup.select css nth of type?

时间:2014-10-12 00:13:22

标签: python beautifulsoup

我正在尝试用美丽的汤选择以下html的第二列

<div class="parent">
  <div class="column">
      <div class="inventory">1</div>
      <div class="inventory">2</div>
      <div class="inventory">3</div>
  </div>
  <div class="column">
      <div class="inventory">4</div>
      <div class="inventory">5</div>
      <div class="inventory">6</div>
  </div>
  <div class="column">
      <div class="inventory">7</div>
      <div class="inventory">8</div>
      <div class="inventory">9</div>
  </div>
</div>

我正在使用css惯用语div.column + div来选择第二列。但是,下面将迭代第2列和第3列中的行。我认为逻辑div.column + div并没有达到我的预期。

soup = BeautifulSoup(htmlSource)
secondColumn = soup.select('div.column + div div.inventory')
for row in column:
    #prints stuff about the row

有什么方法可以迭代第二列的行吗?

2 个答案:

答案 0 :(得分:3)

对于给定的CSS,结果集完全正确;第三个div跟随一个div,column也是(毕竟第二个div有该类)。

你必须找到所有 column div,然后从结果集中选出第二个:

soup.select("div > div.column")[1]

这只会给你一列,即使文档中的其他地方有更多这样的组。

如果您需要第二列每个父,请添加一个循环:

for parent in soup.select('div.parent'):
    column = parent.select('div.column')[1]

演示:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <div class="parent">
...   <div class="column">
...       <div class="inventory">1</div>
...       <div class="inventory">2</div>
...       <div class="inventory">3</div>
...   </div>
...   <div class="column">
...       <div class="inventory">4</div>
...       <div class="inventory">5</div>
...       <div class="inventory">6</div>
...   </div>
...   <div class="column">
...       <div class="inventory">7</div>
...       <div class="inventory">8</div>
...       <div class="inventory">9</div>
...   </div>
... </div>
... ''')
>>> soup.select("div.parent > div.column")[1]
<div class="column">
<div class="inventory">4</div>
<div class="inventory">5</div>
<div class="inventory">6</div>
</div>
>>> for parent in soup.select('div.parent'):
...     column = parent.select('div.column')[1]
...     print column
... 
<div class="column">
<div class="inventory">4</div>
<div class="inventory">5</div>
<div class="inventory">6</div>
</div>

答案 1 :(得分:0)

BeautifulSoup直接支持css类:

for parent in soup.find_all('div', 'parent'):
    second_column = parent('div', 'column')[1]
    # handle the second column