如何从此代码中仅提取属性“ title”?

时间:2019-05-25 07:23:59

标签: python python-3.x web-scraping beautifulsoup

我对Python有点陌生,我试图了解如何从下面的代码中提取'title='属性。我一直在尝试使用beautifulsoup,但老实说,任何对我都有益的东西。

<a class="image-link" href="/new-jersey/communities/holiday-city-at-berkeley" title="Holiday City at Berkeley"><div class="lazyload pulse out exited" style="height:auto"><div class="placeholder"><svg class="svg-placeholder-component" height="100%" viewbox="0 0 400 225" width="100%"><use xlink:href="#lazyload-placeholder"></use></svg></div></div></a>

我尝试了all[0].find_all('a', "title")all[0].find_all("title"),但都返回了'[]'

<a class="image-link" href="/new-jersey/communities/holiday-city-at-berkeley" title="Holiday City at Berkeley"><div class="lazyload pulse out exited" style="height:auto"><div class="placeholder"><svg class="svg-placeholder-component" height="100%" viewbox="0 0 400 225" width="100%"><use xlink:href="#lazyload-placeholder"></use></svg></div></div></a>

2 个答案:

答案 0 :(得分:1)

您可以使用CSS选择器提取所需的元素:

from bs4 import BeautifulSoup

html = '<a class="image-link" href="/new-jersey/communities/holiday-city-at-berkeley" title="Holiday City at Berkeley"><div class="lazyload pulse out exited" style="height:auto"><div class="placeholder"><svg class="svg-placeholder-component" height="100%" viewbox="0 0 400 225" width="100%"><use xlink:href="#lazyload-placeholder"></use></svg></div></div></a>'
soup = BeautifulSoup(html, 'lxml')

for a in soup.select('a[title]'):
    print(a['title'])

打印:

Holiday City at Berkeley

答案 1 :(得分:0)

您可以尝试如下提取@title

links = soup.findAll(attrs={"class" : "image-link"})

for link in links:
    print(link["title"])