美丽的汤-如何从div类和<ul>中获取<li>项,而没有ul的任何类名并且没有ID

时间:2019-04-01 19:57:42

标签: python html beautifulsoup

输入数据如下所示,其中具有多个ul标签,可以刮取python漂亮的汤。

<div class="column one-second"><p></p> <ul> <li>Commercial automobile</li> <li>Excess liability</li> <li>General liability</li> <li>Inland marine (cargo)</li> </ul> <p></p></div> <div class="column one-second"><p></p> <ul> <li>Professional Liability</li> <li>Property</li> <li>Workers’ compensation</li> </ul> <p></p></div>

To get the listed items from `ul` tag using beautiful soup library, I tried this but did not work:

    amusements_soup.find_all('li', attrs={'id': 'menu-item-16'})


    amusements_soup.find_all('div',{'class':'column one-second'})


    ul = amusements_soup.find("h2", text="Services & Solutions").find_next_sibling("ul")

expected output :

> Commercial automobile
> 
> Excess liability
> 
> General liability
>
> Inland marine 
>
> Professional Liability
> 
> Workers’ compensation

2 个答案:

答案 0 :(得分:0)

假设amusements_soup包含您提到的HTML,它应该可以工作:

from bs4 import BeautifulSoup

page = '<div class="column one-second"><p></p> <ul> <li>Commercial automobile</li> <li>Excess liability</li> <li>General liability</li> <li>Inland marine (cargo)</li> </ul> <p></p></div> <div class="column one-second"><p></p> <ul> <li>Professional Liability</li> <li>Property</li> <li>Workers’ compensation</li> </ul> <p></p></div>'
amusements_soup = BeautifulSoup(page,"html.parser")
for item in amusements_soup.findAll('div',{'class':'column one-second'}):
    sub_items = item.findAll('li')
    for sub_item in sub_items:
        print(sub_item.text)

输出:

Commercial automobile
Excess liability
General liability
Inland marine (cargo)
Professional Liability
Property
Workers’ compensation

如果这对您不起作用,则必须检查amusements_soup确实是您认为的那样

答案 1 :(得分:0)

与类和类型选择器以及使用列表理解的后代组合器相同的东西

results = [item.text for item in amusements_soup.select('.one-second li')]