我已经编写了一个代码来下载页面的免费pdf,所以我不手动进行操作,如我为特定子集类别(例如https:// www ... ....... / MagPi01)。 现在,我想扩展代码以获取主页之后的每个链接,该链接不仅包含“ MagPi {}。format(1,2,3 ...),而且包含带有正则表达式的(。*)”。 我出于教育目的以这种方式(正则表达式)进行尝试。
import urllib.request
import os
path = "C:/Users/kosmas/Desktop/MagPi"
try:
os.mkdir(path)
except OSError:
print ("Creation of the directory %s failed" % path)
else:
print ("Successfully created the directory %s " % path)
try:
i = 1
while i < 10:
url = "https://www.raspberrypi.org/magpi-issues/MagPi0{}.pdf".format(i)
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
urllib.request.urlretrieve(url, "C:/Users/kosmas/Desktop/MagPi/MagPi0{}.pdf".format(i))
print("MagPi0{}.pdf created successfully".format(i))
i = i + 1
except:
print('Something went wrong')
try:
i = 10
while i < 82:
url = "https://www.raspberrypi.org/magpi-issues/MagPi{}.pdf".format(i)
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
urllib.request.urlretrieve(url, "C:/Users/kosmas/Desktop/MagPi/MagPi{}.pdf".format(i))
print("MagPi{}.pdf created successfully".format(i))
i = i + 1
except:
print('Something went wrong')