我正试图从维基百科中删除几个编号页面(以年为单位):
for year in range(1991, 2000, 1):
url = "https://en.wikipedia.org/wiki/" + str(year)
source = requests.get(url)
x = BeautifulSoup(source.text, "html.parser")
x
然而,当检查'x'时,我看到我只下载了1999页面。我如何刮掉1991年至2000年所需的所有页面?
并将它们放入每年(关键)文本(值)的字典中?
答案 0 :(得分:1)
因为你的x在for循环之外。将您的代码更改为此 -
import requests
from bs4 import BeautifulSoup
res_dict = {}
for year in range(1991, 1994, 1):
url = "https://en.wikipedia.org/wiki/" + str(year)
source = requests.get(url)
soup = BeautifulSoup(source.content, "html.parser")
res_dict[year] = soup.text
答案 1 :(得分:0)
因为for
会循环代码,并且......让我们看一个例子:
for year in range(1991, 2000, 1):
url = "https://en.wikipedia.org/wiki/" + str(year)
source = requests.get(url)
现在,第一次循环url
为https://en.wikipedia.org/wiki/1991
。
第二次,url
为https://en.wikipedia.org/wiki/1992
。
最后一次,网址为https://en.wikipedia.org/wiki/1999
。所以source
是requests.get(https://en.wikipedia.org/wiki/1999)
如果您不了解我,可以尝试以下代码:
for i in range(1, 10):
a = i
print(a)
print(a)
所以x = BeautifulSoup(source.text, "html.parser")
必须在for
循环中,如下所示:
for year in range(1991, 2000, 1):
url = "https://en.wikipedia.org/wiki/" + str(year)
source = requests.get(url)
x = BeautifulSoup(source.text, "html.parser")
x