如何在python中刮擦aria标签文本?

时间:2020-10-31 08:31:21

标签: python web-scraping

我想从网站上抓取玩家名单,但名字在标签上。我不知道如何在标签上刮取文字。 链接在这里 https://athletics.baruch.cuny.edu/sports/mens-swimming-and-diving/roster 例如,从html我们有 如何从标签上抓取文字?

<div class="sidearm-roster-player-image column">                                                                    
  <a data-bind="click: function() { return true; }, clickBubble: false" href="/sports/mens-swimming-and-diving/roster/gregory-becker/3555" aria-label="Gregory Becker - View Full Bio" title="View Full Bio">
    <img class="lazyload" data-src="/images/2018/10/19/GREGORY_BECKER.jpg?width=80" alt="GREGORY BECKER">
  </a>                                                              
</div>

2 个答案:

答案 0 :(得分:0)

您可以在BeautifulSoup中使用.get()方法。首先使用任何选择器或elemfind/find_all或任何其他变量中选择元素。然后尝试:

print(elem.get('aria-label'))

答案 1 :(得分:0)

下面的代码将帮助您从标签中提取名称

from bs4 import BeautifulSoup

with open("<path-to-html-file>") as fp:
    soup = BeautifulSoup(fp, 'html.parser') #parse the html
    
tags = soup.find_all('a') # get all the a tag
for tag in tags:
    print(tag.get('aria-label')) #get the required text