我是Python的新手,我正试图从谷歌学者那里榨取一些东西作为一个项目。带有问题的代码如下所示:
yearList = []
def getYear():
for div in soup.find_all("div", class_='gs_a'):
yearRegex = re.compile(r".*(\d\d\d\d).*")
yo = yearRegex.findall(div.text)
yearList.append(yo)
print(yearList)
page = 0
i = 0
while i < numPages:
link = 'https://scholar.google.de/scholar?start=' + str(page) + '&q=' + search + '&hl=de&as_sdt=0,5'
res = requests.get(link)
soup = bs4.BeautifulSoup(res.text, 'html.parser')
getYear() #this is the function that extracts the data
page += 20 #to get to the next page of the results
i += 1`
页面变量和链接每次实际更改20。但是,由于某种原因,程序只是抓取搜索结果的第一页,就好像链接变量从未改变过一样。我错过了什么?