<div id="thumbnailsImagePreview">
<img src="getImage.do?imageSize=Small&imageId=730645&r=150521020" imageindex="0" hspace="0" vspace="0" loaded="false" class="selected">
<img src="getImage.do?imageSize=Small&imageId=7589956&r=150521020" imageindex="1" hspace="0" vspace="0" loaded="false">
<img src="getImage.do?imageSize=Small&imageId=7590018&r=150521020" imageindex="2" hspace="0" vspace="0" loaded="false">
<img src="getImage.do?imageSize=Small&imageId=2803850&r=150521020" imageindex="3" hspace="0" vspace="0" loaded="false">
<img src="getImage.do?imageSize=Small&imageId=2973197&r=150521020" imageindex="4" hspace="0" vspace="0" loaded="false">
<img src="getImage.do?imageSize=Small&imageId=7589888&r=150521020" imageindex="5" hspace="0" vspace="0" loaded="false">
<img src="getImage.do?imageSize=Small&imageId=7877267&r=150521020" imageindex="6" hspace="0" vspace="0" loaded="false">
<img src="getImage.do?imageSize=Small&imageId=7877375&r=150521020" imageindex="7" hspace="0" vspace="0" loaded="false">
<img src="getImage.do?imageSize=Small&imageId=6812892&r=150521020" imageindex="8" hspace="0" vspace="0" loaded="false">
</div>
我正在尝试在此HTML中提取指向img src的链接(对于具有关联imageIndex的链接),但由于它们都保存在div id“thumbnailsImagePreview”中,因此当我使用以下代码行时,我得到一个大块的文本,所以我无法为每个img src链接解析它。
images = soup.find_all('div', attrs = {'id' : 'thumbnailsImagePreview'})
如何获得一系列链接?
当我打印出图像时,这就是我得到的:
[<div id="thumbnailsImagePreview">\n<img class="selected" hspace="0"
imageindex="0" loaded="false" src="getImage.do?
imageSize=Small&imageId=730645&r=150521020" vspace="0"/>\n<img
hspace="0" imageindex="1" loaded="false" src="getImage.do?
imageSize=Small&imageId=7589956&r=150521020" vspace="0"/>\n<img
hspace="0" imageindex="2" loaded="false" src="getImage.do?
imageSize=Small&imageId=7590018&r=150521020" vspace="0"/>\n<img
hspace="0" imageindex="3" loaded="false" src="getImage.do?
imageSize=Small&imageId=2803850&r=150521020" vspace="0"/>\n<img
hspace="0" imageindex="4" loaded="false" src="getImage.do?
imageSize=Small&imageId=2973197&r=150521020" vspace="0"/>\n<img
hspace="0" imageindex="5" loaded="false" src="getImage.do?
imageSize=Small&imageId=7589888&r=150521020" vspace="0"/>\n<img
hspace="0" imageindex="6" loaded="false" src="getImage.do?
imageSize=Small&imageId=7877267&r=150521020" vspace="0"/>\n<img
hspace="0" imageindex="7" loaded="false" src="getImage.do?
imageSize=Small&imageId=7877375&r=150521020" vspace="0"/>\n<img
hspace="0" imageindex="8" loaded="false" src="getImage.do?
imageSize=Small&imageId=6812892&r=150521020" vspace="0"/>\n<img
hspace="0" imageindex="9" loaded="false"
</div>]
答案 0 :(得分:1)
您需要找到内部img
元素并通过将每个元素视为字典来获取src
属性值:
image_srcs = [img['src'] for img in soup.select('#thumbnailsImagePreview img[src]')]
#thumbnailsImagePreview img[src]
这里有一个CSS selector,它会在img
元素下找到所有src
个id="thumbnailsImagePreview"
个属性元素。