如何使用beautifulsoup查找没有特定属性的标签?

时间:2019-01-12 19:10:11

标签: python-3.x beautifulsoup

我正在尝试获取不具有特定属性的'p'标签的内容。

我有一些带有'class'='cost'的标签,还有一些带有'class'='cost'和'itemprop'='price'的标签

all_cars = soup.find_all('div', attrs={'class': 'listdata'})
...
...
tatal_cost= car.findChildren('p', attrs={'class': 'cost'})
cost= car.findChildren('p', attrs={'class': 'cost', 'itemprop':'price'})

我正在尝试查找没有'itemprop'属性的'p'标签,但找不到任何解决方案。

2 个答案:

答案 0 :(得分:2)

BeautifulSoup允许您定义一个函数并将其传递给其def has_class_but_not_itemprop(tag): return tag.has_attr('class') and not tag.has_attr('itemprop') # Pass this function into find_all() and you’ll pick up all the <p> # tags you're after: soup.find_all(has_class_but_not_itemprop) # [<p class="cost">...</p>, # <p class="cost">...</p>, # <p class="cost">...</p>] 方法:

lis = []
s = converter.parse("music2/" + "bwv525-1.mid")
a = s.flat
for item in a.notes:
    print(item.duration.type, item.duration.dots, item.quarterLength)

有关更多信息,请参见BeautifulSoup documentation

答案 1 :(得分:2)

BeautifulSoup的内置属性过滤器足以满足此要求。您可以将True的值作为简单检查属性是否存在的值。 None可用于指定该属性不存在。同样,该值可以是任何属性值(例如'cost')。

from bs4 import BeautifulSoup
html="""
<p class="cost">paragraph 1</p>
<p class="cost">paragraph 2</p>
<p class="cost">paragraph 3</p>
<p class="cost" itemprop="1">paragraph 4</p>
<p class="somethingelse">paragraph 5</p>
"""
soup=BeautifulSoup(html,'html.parser')
print("---without 'itemprop' attribute")
print(soup.find_all('p',itemprop=None))
print("---with class = 'cost' and without 'itemprop' attribute----")
print(soup.find_all('p',attrs={'itemprop':None,"class":'cost'}))
#below is an alternative way to specify this
#print(soup.find_all('p',itemprop=None,class_='cost'))

输出

---without 'itemprop' attribute
[<p class="cost">paragraph 1</p>, <p class="cost">paragraph 2</p>, <p class="cost">paragraph 3</p>, <p class="somethingelse">paragraph 5</p>]
---with class = 'cost' and without 'itemprop' attribute----
[<p class="cost">paragraph 1</p>, <p class="cost">paragraph 2</p>, <p class="cost">paragraph 3</p>]