Question

我已经编写了一个代码来下载页面的免费pdf，所以我不手动进行操作，如我为特定子集类别（例如https：// www ... ....... / MagPi01）。现在，我想扩展代码以获取主页之后的每个链接，该链接不仅包含“ MagPi {}。format（1,2,3 ...），而且包含带有正则表达式的（。*）”。我出于教育目的以这种方式（正则表达式）进行尝试。

import urllib.request
import os

path = "C:/Users/kosmas/Desktop/MagPi"
try:
    os.mkdir(path)
except OSError:
    print ("Creation of the directory %s failed" % path)
else:
    print ("Successfully created the directory %s " % path)
try:
    i = 1
    while i < 10:
            url = "https://www.raspberrypi.org/magpi-issues/MagPi0{}.pdf".format(i)
            opener = urllib.request.build_opener()
            opener.addheaders = [('User-agent', 'Mozilla/5.0')]
            urllib.request.install_opener(opener)
            urllib.request.urlretrieve(url, "C:/Users/kosmas/Desktop/MagPi/MagPi0{}.pdf".format(i))
            print("MagPi0{}.pdf created successfully".format(i))
            i = i + 1

except:
        print('Something went wrong')

try:
    i = 10
    while i < 82:
            url = "https://www.raspberrypi.org/magpi-issues/MagPi{}.pdf".format(i)
            opener = urllib.request.build_opener()
            opener.addheaders = [('User-agent', 'Mozilla/5.0')]
            urllib.request.install_opener(opener)
            urllib.request.urlretrieve(url, "C:/Users/kosmas/Desktop/MagPi/MagPi{}.pdf".format(i))
            print("MagPi{}.pdf created successfully".format(i))
            i = i + 1

except:
        print('Something went wrong')

解析链接的所有子页面并下载包含的特定文件

0 个答案: