Question

我正在尝试抓取一个网站来学习python和web抓取。特别是，我试图在以下页面上抓取足球数据：https://www.whoscored.com/Regions/108/Tournaments/5/Seasons/7468/Stages/16548/PlayerStatistics/Italy-Serie-A-2018-2019

我的主要问题是如何抓取主数据表的所有页面，而不仅仅是第一页。我试图使用硒找出问题，并在单击“下一步”按钮时分析浏览器正在发送的请求，但遇到了一些麻烦。感谢您的关注。

Answer 1

在单击“下一步”按钮时，使用浏览器的“网络”选项卡，您可以检查每次单击发送到服务器的实际xhr（AJAX）请求。请求到以下URL：

https://www.whoscored.com/StatisticsFeed/1/GetPlayerStatistics?category=summary&subcategory=all&statsAccumulationType=0&isCurrent=true&playerId=&teamIds=&matchId=&stageId=16548&tournamentOptions=5&sortBy=Rating&sortAscending=&age=&ageComparisonType=&appearances=&appearancesComparisonType=&field=Overall&nationality=&positionOptions=&timeOfTheGameEnd=&timeOfTheGameStart=&isMinApp=true&page=2&includeZeroValues=&numberOfPlayersToPick=10

请注意“页面”查询字符串参数-随您提出的每个请求增加。而且，对每个请求的响应实际上都非常易于解析JSON，因此您的工作非常顺手。

使用python

1 个答案: