Question

我想从网站上抓取玩家名单，但名字在标签上。我不知道如何在标签上刮取文字。链接在这里 https://athletics.baruch.cuny.edu/sports/mens-swimming-and-diving/roster 例如，从html我们有如何从标签上抓取文字？

<div class="sidearm-roster-player-image column">                                                                    
  <a data-bind="click: function() { return true; }, clickBubble: false" href="/sports/mens-swimming-and-diving/roster/gregory-becker/3555" aria-label="Gregory Becker - View Full Bio" title="View Full Bio">
    <img class="lazyload" data-src="/images/2018/10/19/GREGORY_BECKER.jpg?width=80" alt="GREGORY BECKER">
  </a>                                                              
</div>

Answer 1

您可以在BeautifulSoup中使用.get()方法。首先使用任何选择器或elem在find/find_all或任何其他变量中选择元素。然后尝试：

print(elem.get('aria-label'))

Answer 2

下面的代码将帮助您从标签中提取名称

from bs4 import BeautifulSoup

with open("<path-to-html-file>") as fp:
    soup = BeautifulSoup(fp, 'html.parser') #parse the html
    
tags = soup.find_all('a') # get all the a tag
for tag in tags:
    print(tag.get('aria-label')) #get the required text

如何在python中刮擦aria标签文本？

2 个答案: