我想从网站上抓取玩家名单,但名字在标签上。我不知道如何在标签上刮取文字。 链接在这里 https://athletics.baruch.cuny.edu/sports/mens-swimming-and-diving/roster 例如,从html我们有 如何从标签上抓取文字?
<div class="sidearm-roster-player-image column">
<a data-bind="click: function() { return true; }, clickBubble: false" href="/sports/mens-swimming-and-diving/roster/gregory-becker/3555" aria-label="Gregory Becker - View Full Bio" title="View Full Bio">
<img class="lazyload" data-src="/images/2018/10/19/GREGORY_BECKER.jpg?width=80" alt="GREGORY BECKER">
</a>
</div>
答案 0 :(得分:0)
您可以在BeautifulSoup中使用.get()
方法。首先使用任何选择器或elem
在find/find_all
或任何其他变量中选择元素。然后尝试:
print(elem.get('aria-label'))
答案 1 :(得分:0)
下面的代码将帮助您从标签中提取名称
from bs4 import BeautifulSoup
with open("<path-to-html-file>") as fp:
soup = BeautifulSoup(fp, 'html.parser') #parse the html
tags = soup.find_all('a') # get all the a tag
for tag in tags:
print(tag.get('aria-label')) #get the required text