我正在尝试获取不具有特定属性的'p'标签的内容。
我有一些带有'class'='cost'的标签,还有一些带有'class'='cost'和'itemprop'='price'的标签
all_cars = soup.find_all('div', attrs={'class': 'listdata'})
...
...
tatal_cost= car.findChildren('p', attrs={'class': 'cost'})
cost= car.findChildren('p', attrs={'class': 'cost', 'itemprop':'price'})
我正在尝试查找没有'itemprop'属性的'p'标签,但找不到任何解决方案。
答案 0 :(得分:2)
BeautifulSoup允许您定义一个函数并将其传递给其def has_class_but_not_itemprop(tag):
return tag.has_attr('class') and not tag.has_attr('itemprop')
# Pass this function into find_all() and you’ll pick up all the <p>
# tags you're after:
soup.find_all(has_class_but_not_itemprop)
# [<p class="cost">...</p>,
# <p class="cost">...</p>,
# <p class="cost">...</p>]
方法:
lis = []
s = converter.parse("music2/" + "bwv525-1.mid")
a = s.flat
for item in a.notes:
print(item.duration.type, item.duration.dots, item.quarterLength)
有关更多信息,请参见BeautifulSoup documentation。
答案 1 :(得分:2)
BeautifulSoup的内置属性过滤器足以满足此要求。您可以将True
的值作为简单检查属性是否存在的值。 None
可用于指定该属性不存在。同样,该值可以是任何属性值(例如'cost')。
from bs4 import BeautifulSoup
html="""
<p class="cost">paragraph 1</p>
<p class="cost">paragraph 2</p>
<p class="cost">paragraph 3</p>
<p class="cost" itemprop="1">paragraph 4</p>
<p class="somethingelse">paragraph 5</p>
"""
soup=BeautifulSoup(html,'html.parser')
print("---without 'itemprop' attribute")
print(soup.find_all('p',itemprop=None))
print("---with class = 'cost' and without 'itemprop' attribute----")
print(soup.find_all('p',attrs={'itemprop':None,"class":'cost'}))
#below is an alternative way to specify this
#print(soup.find_all('p',itemprop=None,class_='cost'))
输出
---without 'itemprop' attribute
[<p class="cost">paragraph 1</p>, <p class="cost">paragraph 2</p>, <p class="cost">paragraph 3</p>, <p class="somethingelse">paragraph 5</p>]
---with class = 'cost' and without 'itemprop' attribute----
[<p class="cost">paragraph 1</p>, <p class="cost">paragraph 2</p>, <p class="cost">paragraph 3</p>]