Question

我是Python的新手，我正试图从谷歌学者那里榨取一些东西作为一个项目。带有问题的代码如下所示：

    yearList = []
def getYear():
    for div in soup.find_all("div", class_='gs_a'):
        yearRegex = re.compile(r".*(\d\d\d\d).*")
        yo = yearRegex.findall(div.text)
        yearList.append(yo)
    print(yearList)



page = 0
i = 0 
while i < numPages:

    link = 'https://scholar.google.de/scholar?start=' + str(page) + '&q=' + search + '&hl=de&as_sdt=0,5'
    res = requests.get(link)
    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    getYear()    #this is the function that extracts the data
    page += 20      #to get to the next page of the results
    i += 1`

页面变量和链接每次实际更改20。但是，由于某种原因，程序只是抓取搜索结果的第一页，就好像链接变量从未改变过一样。我错过了什么？

Python为每个页面显示相同的结果。美丽的汤

0 个答案: