在美丽的汤中选择第二个孩子

时间:2016-07-06 21:11:54

标签: python web-scraping beautifulsoup

让我们说:

<div>
    <p>this is some text</p>
    <p>...and this is some other text</p>
</div>

如何从beautifulsoup中的第二段检索文本?

4 个答案:

答案 0 :(得分:13)

您可以使用CSS选择器执行此操作:

>>> from bs4 import BeautifulSoup

>>>  soup = BeautifulSoup("""<div>
.... <p>this is some text</p>
.... <p>...and this is some other text</p>
.... </div>""", "html.parser")

>>>  soup.select('div > p')[1].get_text(strip=True)
     '...and this is some other text'

答案 1 :(得分:9)

您可以使用 nth-of-type

h = """<div>
    <p>this is some text</p>
    <p>...and this is some other text</p>
</div>"""


soup = BeautifulSoup(h)

print(soup.select_one("div p:nth-of-type(2)").text)

答案 2 :(得分:2)

secondp = [div.find('p') for div in soup.find('div')]

In : secondp[1].text

Out : Your text

或者您可以直接使用findChildren -

div_ = soup.find('div').findChildren()
for i, child in enumerate(div_):
    if i == 1:
         print child.text

答案 3 :(得分:0)

您可以使用gazpacho解决此问题:

from gazpacho import Soup

html = """\
<div>
    <p>this is some text</p>
    <p>...and this is some other text</p>
</div>
"""

soup = Soup(html)
soup.find('p')[1].text

哪个会输出:

'...这是其他一些文字'