澄清<a> tag with :: symbol for webscraping

时间:2018-05-10 18:50:10

标签: html web-scraping

I am a beginner at using bs4 and webscraping. I am trying to scrape the stats.nba.com site. The problem i am having is that i can't seem to scrape the player's names and their points for each category. I've inspected the element and it shows up when i inspect it on the web page. But when i scrape it using bs4 it shows me for the tag i want:

<a ng-href="/player/{{::player.PLAYER_ID}}/" title="View Stats Profile" aria-labelledby="leaders_daily_players__{{ ::category.name }}">{{::player.PLAYER_NAME}}</a>

All the player's name is replaced by "::player.PLAYER_NAME". I've tried looking up what the :: symbol but i could not figure out why. Can someone explain what "::" does and how i would be able to scrape that info from the site?

1 个答案:

答案 0 :(得分:0)

上述网站使用Angular动态生成其内容。 Angular由浏览器中的JavaScript引擎运行,并将 :: player.PLAYER_ID 的这些占位符替换为异步加载的实际数据。

默认情况下,BeautifulSoup只会加载HTML,而不会模仿功能齐全的浏览器。它不执行JavaScript部分,因此占位符不会被实际数据替换。

您可以尝试其他方法来执行https://pythonprogramming.net/javascript-dynamic-scraping-parsing-beautiful-soup-tutorial/

中记录的JavaScript

另一种方法可能是利用现有的API项目:https://github.com/seemethere/nba_py/wiki/stats.nba.com-Endpoint-Documentation