Question

我正在尝试自动化下载imgur文件的过程，为此目的我使用beautifulsoup来获取链接但是说实话我很遗憾为什么这不起作用，因为根据我的研究它应该：

    soup = BeautifulSoup("http://imgur.com/ha0WYYQ")
    imageUrl = soup.select('.image a')[0]['href']

上面的代码只返回一个空列表，因此出错。我试图修改它，但无济于事。任何和所有输入都表示赞赏。

Answer 1

您的方法存在一些问题：

BeautifulSoup 不期待一个网址，因此您需要先使用库来获取HTML流;和
根据我所看到的.post-image a

r = urllib.urlopen('http://imgur.com/ha0WYYQ').read()
soup = BeautifulSoup(r,'lxml')
soup.select('.post-image a')[0]['href']

或更优雅：

with urllib.urlopen('http://imgur.com/ha0WYYQ') as f:
    r = f.read()
    soup = BeautifulSoup(r,'lxml')
    result = soup.select('.post-image a')[0]['href']

Answer 2

<div class="post-image">


                        <a href="//i.imgur.com/ha0WYYQ.jpg" class="zoom">
                                    <img src="//i.imgur.com/ha0WYYQ.jpg" alt="Frank in his bb8 costume" itemprop="contentURL">

            </a>


</div>

这是图片标记，"post-image"是单个字，无法分开。

imageUrl = soup.select('.post-image a')[0]['href']

选择一个标签的快捷方式：

imageUrl = soup.select_one('.post-image a')['href']

要解析文档，请将其传递给BeautifulSoup构造函数。您可以传入字符串或打开文件句柄：

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("index.html"))

soup = BeautifulSoup("<html>data</html>")

在imgur上获取图片的URL

2 个答案: