Question

我正在尝试抓取具有3个属性的标签。我使用了这段代码，但结果却一无所获。我知道此标记存在于html源代码中。

r = requests.get('https://www.immobiliare.it/69866648-Vendita-Quadrilocale-via-Mario-Ridolfi-32-Roma.html')  
soup = BeautifulSoup(r.text, 'html')
result=soup.find('div', attrs={ 'class':'col-xs-12 description-text text expanded', 'aria-expanded':'true', 'role':'contentinfo'})

我在语法的某处弄错了吗？

Answer 1

尝试一下：

# create a function to look for attrs and attr values
def foo(tag):
  return tag.has_attr('aria-expanded') and tag.has_attr('role') and tag['aria-expanded']=='true' and tag['role']=='contentinfo'

# first do a css select on classes
divs = soup.select('div.col-xs-12.description-text.text.expanded')

# then take out any that don't have the attrs/vals we need
divs = [ div for div in divs if foo(div)]

这不是很优雅，但是我从来没有想出更好的方法。

Answer 2

缺少-中的text expanded，并且由于您使用.find()选择了第一个元素，因此您可以：

result = soup.find('div', attrs={'role':'contentinfo'})
# or
result = soup.select_one('div[role="contentinfo"]')

Answer 3

实际上您是对的，但是您已经预先运行了JavaScript。请记住，如果您希望抓取某些内容，则应禁用JavaScript。因为它可以重写标签的类，数据等。

result = soup.find('div',
             attrs={ 'class':'col-xs-12 description-text text-compressed',
                     'aria-expanded':"false",
                      'role':'contentinfo'})

抓取具有多个属性的标签

3 个答案: