Question

我有一个包含图像的简单页面。我想将图像保存在本地。因此，我正在使用BeauitfulSoup刮擦src。以下是我的代码：

    def getImage(url):

        page = requests.get(url).text
        #print(page)
        soup = BeautifulSoup(page, 'lxml')
        #print(soup)

        img = soup.find(name='img')

        if img is not None:
            #img = img.get('src')
            print(img.attrs)

如果我打印page，我将得到the result。我还检查了该页面是否为HTML，并说不是。但我不知道在这种情况下还有其他类型。我还尝试使用其他解析器，例如lxml和html5lib。

这是我直接复制的HTML页面：

<html><head><meta name="viewport" content="width=device-width, minimum-scale=0.1">
<title>SOMETHING TITLE</title>
</head>
<body style="margin: 0px; background: #0e0e0e;">
<img style="-webkit-user-select: none;margin: auto;cursor: zoom-in;" src="http:<WHATEVER>" width="500" height="279">
</body></html>

页面是否已加密？这应该是简单的抓取方法：（

Answer 1

您的html中没有名为“ img”的属性。您可以添加一个名称（即

 <img name='myImage' style='-webkit-user-select: none;margin: auto;cursor: zoom-in; src='http:<WHATEVER>'>

然后您可以使用

 img = soup.find(name='myImage')

但是如果您不能更改HTML，则可以执行以下操作：

   images = soup.findAll('img')
   for image in images:
     # do whatever

无法使用Python请求读取HTML页面

1 个答案: