Python为每个页面显示相同的结果。美丽的汤

时间:2017-01-23 17:00:14

标签: python while-loop web-scraping beautifulsoup

我是Python的新手,我正试图从谷歌学者那里榨取一些东西作为一个项目。带有问题的代码如下所示:

    yearList = []
def getYear():
    for div in soup.find_all("div", class_='gs_a'):
        yearRegex = re.compile(r".*(\d\d\d\d).*")
        yo = yearRegex.findall(div.text)
        yearList.append(yo)
    print(yearList)



page = 0
i = 0 
while i < numPages:

    link = 'https://scholar.google.de/scholar?start=' + str(page) + '&q=' + search + '&hl=de&as_sdt=0,5'
    res = requests.get(link)
    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    getYear()    #this is the function that extracts the data
    page += 20      #to get to the next page of the results
    i += 1`

页面变量和链接每次实际更改20。但是,由于某种原因,程序只是抓取搜索结果的第一页,就好像链接变量从未改变过一样。我错过了什么?

0 个答案:

没有答案