Question

我认为到目前为止我已经将指令翻译得很好但现在我已经丢失了。我没有很多编程知识或技能。

import requests
from bs4 import BeautifulSoup



def make_soup(url):
    thepage = requests.get(url)
    soupdata = BeautifulSoup(thepage.text, "html.parser")
    return soupdata


i = 1
soup = make_soup("https://uwaterloo.ca")

for img in soup.findAll('img'):
    temp = img.get('src')
    if temp[:1]=="/":
        image = "https://uwaterloo.ca" + temp
    else:
        image = temp

    nametemp = img.get('alt')
    if len(nametemp) == 0:
        filename = str(i)
        i = i + 1
    else:
        filename = nametemp

这是我迷失方向的地方

imagefile = open(filename + ".jpeg", 'wb')
imagefile.write(urllib.request.urlopen(image).read()
imagefile.close()

Answer 1

只需用 requests.get 替换urllib逻辑并将内容写入文件：

with open(filename + ".jpeg", 'wb') as f:         
    f.write(requests.get(image).content)

f.write(requests.get(image).content)等同于urllib代码正在执行的操作。使用带有的上下文管理器意味着您的文件将自动关闭。

我们还可以使用css选择器和str.format来改进代码：

import requests from bs4 import BeautifulSoup from urlparse import urljoin def make_soup(url): thepage = requests.get(url) soupdata = BeautifulSoup(thepage.text, "html.parser") return soupdata soup = make_soup("https://uwaterloo.ca") i = 1 for img in soup.select('img[src]'): temp = img["src"] alt = img["alt"] if not alt: alt = i i += 1 if temp.startswith("/"): temp = urljoin("https://uwaterloo.ca", temp) with open("{}{}.jpeg".format(alt, i), 'wb') as f: f.write(requests.get(temp).content)

在python教程中进行webscraping之后，他们使用urllib im使用请求帮助翻译指令

1 个答案: