我想在这样的文字中得到img uris:
hello bla
<br> <img src="/media/photos/1084/PBWHFH7J1rzhr63o1_400.gif" class="someclass" />
some blablabla
<br> <img src="/media/photos/344/tgrfgregfwe_540.jpg" class="otherclass" />
</br>
more blabla
所以结果应该是:
['/media/photos/1084/PBWHFH7J1rzhr63o1_400.gif', '/media/photos/344/tgrfgregfwe_540.jpg']
答案 0 :(得分:2)
>>> soup = BeautifulSoup(html, "html.parser")
>>> for i in soup.find_all('img'):
... print(i.get('src'))
...
...
/media/photos/1084/PBWHFH7J1rzhr63o1_400.gif
/media/photos/344/tgrfgregfwe_540.jpg
>>> [i.get('src') for i in soup.find_all('img')]
['/media/photos/1084/PBWHFH7J1rzhr63o1_400.gif', '/media/photos/344/tgrfgregfwe_
540.jpg']
>>>
答案 1 :(得分:0)
我们有xml解析器让我们的事情变得简单。
from xml.dom import minidom
image = "<img src='/media/photos/1084/PBWHFH7J1rzhr63o1_400.gif' class='someclass' />"
xml_object = minidom.parseString(image)
image_tags = image_xml.getElementsByTagName('img')
list_of_srcs = []
for image_tag in image_tags:
list_of_srcs.append(image_tag.getAttributeNode('src').value)
print list_of_srcs