Question

我现在一直在玩一些Python，并开始了解它我已经想出了一个项目，但我无法解决一些问题。

目的是查看已定义的标记，例如img标记或标记如果这是真的，它还需要寻找一个id标签，总是相同的。

如果img看起来像<img src="/overflow.png" id="true">我想要它存储
如果img看起来像<img src="/overflow.png" id="false">我不希望它存储。

希望这很容易实现，我还没有找到解决方案。我已经查找了HTMLParser的功能，但它对我来说比对我更有讽刺意味。希望有人知道如何做到这一点，并帮助我。非常感谢！

干杯，
ninjaboi21。

Answer 1

人们通常使用BeautifulSoup http://www.crummy.com/software/BeautifulSoup/来做这类事情。

安装后：

from BeautifulSoup import BeautifulSoup
# if the file is on your computer use this
#file = open('/path/to/the/file')
# and if the file is on the internet use this
#import urllib
#file = urllib.urlopen('http://www.the.com/path/to/the/file')
html = file.read()
file.close()
soup = BeautifulSoup(html)
trueimages = [image for image in soup.findAll('img') if image['id'].lower() == 'true']

编辑：添加了如何将文件放入字符串。

查找img和id标记，如果两者都为真，则将URL存储在变量中

1 个答案: