我正在尝试使用分页并在刮完当前页面后转到下一页。这是我第一次抓取API,所以我有点迷茫,还没有在互联网上找到任何东西。
问题:我需要做什么才能进入下一页
代码(到目前为止我所拥有的):
import pandas as pd
import requests, re
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import json
url = 'https://games.crossfit.com/competitions/api/v1/competitions/open/2018/leaderboards?division=1®ion=0&scaled=0&sort=0&occupation=0&page=1'
nameList = []
genderList = []
regionList = []
gymList = []
ageList = []
heightList = []
weightList = []
ordList = []
overallList = []
overallScoreList = []
response = requests.get(url)
data = response.text
parsed = json.loads(data)
year = parsed['competition']['year']
comp = parsed['competition']['competitionType']
year = parsed['competition']['year']
board = parsed['leaderboardRows']
for all in board:
name = all['entrant']['competitorName']
gender = all['entrant']['gender']
region = all['entrant']['regionName']
gym = all['entrant']['affiliateName']
age = all['entrant']['age']
overall = all['overallRank']
overallS = all['overallScore']
height = all['entrant']['height']
weight = all['entrant']['weight']
nameList.append(name)
genderList.append(gender)
regionList.append(region)
gymList.append(gym)
ageList.append(age)
heightList.append(height)
weightList.append(weight)
overallList.append(overall)
overallScoreList.append(overallS)
答案 0 :(得分:2)
crossfit API在pagination
部分中提供了所有必要的信息。它给你这样的东西:
"pagination":
{
"currentPage":1,
"totalPages":3440,
"totalCompetitors":171977
},
要获取除1以外的页面,您需要在url中更改GET参数:
代替&page=1
,写&page=2
。最好使用可以传递相关参数的函数来构建网址,例如
url_for_page(20)将返回
https://games.crossfit.com/competitions/api/v1/competitions/open/2018/leaderboards?division=2®ion=0&scaled=0&sort=0&occupation=0&page=20
希望您会有所帮助。
答案 1 :(得分:1)
快速简便的方法如下所示:
import requests
url = 'https://games.crossfit.com/competitions/api/v1/competitions/open/2018/leaderboards?division=1®ion=0&scaled=0&sort=0&occupation=0&page={}'
for link in [url.format(page) for page in range(1,5)]:
response = requests.get(link)
for item in response.json()['leaderboardRows']:
name = item['entrant']['competitorName']
print(name)