Question

我正试图从维基百科中删除几个编号页面（以年为单位）：

for year in range(1991, 2000, 1):
    url = "https://en.wikipedia.org/wiki/" + str(year)
    source = requests.get(url)

x = BeautifulSoup(source.text, "html.parser")

x

然而，当检查'x'时，我看到我只下载了1999页面。我如何刮掉1991年至2000年所需的所有页面？

并将它们放入每年（关键）文本（值）的字典中？

Answer 1

因为你的x在for循环之外。将您的代码更改为此 -

import requests
from bs4 import BeautifulSoup

res_dict = {}
for year in range(1991, 1994, 1):
    url = "https://en.wikipedia.org/wiki/" + str(year)
    source = requests.get(url)

    soup = BeautifulSoup(source.content, "html.parser")
    res_dict[year] = soup.text

Answer 2

因为for会循环代码，并且......让我们看一个例子：

for year in range(1991, 2000, 1):
    url = "https://en.wikipedia.org/wiki/" + str(year)
    source = requests.get(url)

现在，第一次循环url为https://en.wikipedia.org/wiki/1991。第二次，url为https://en.wikipedia.org/wiki/1992。

最后一次，网址为https://en.wikipedia.org/wiki/1999。所以source是requests.get(https://en.wikipedia.org/wiki/1999)

如果您不了解我，可以尝试以下代码：

for i in range(1, 10):
    a = i
    print(a)

print(a)

所以x = BeautifulSoup(source.text, "html.parser")必须在for循环中，如下所示：

for year in range(1991, 2000, 1):
    url = "https://en.wikipedia.org/wiki/" + str(year)
    source = requests.get(url)

    x = BeautifulSoup(source.text, "html.parser")
    x

请求代码来抓取分页网站

2 个答案: