如何使用scrapy从div类提取图像/ href url

时间:2018-08-04 05:18:26

标签: python scrapy

我很难从给定的网站代码中提取href网址

<div class="expando expando-uninitialized" style="display: none" data-cachedhtml=" <div class=&quot;media-preview&quot; id=&quot;media-preview-66hch1&quot; style=&quot;max-width: 534px&quot;> <div class=&quot;media-preview-content&quot;> <a href=&quot;https://i.redd.it/nctvpvsnbpsy.jpg&quot; class=&quot;may-blank&quot;> <img class=&quot;preview&quot; src=&quot;https://i.redditmedia.com/UELqh-mbh5mwnXr67PoBbi23nwZuNl2v3flNbkmewQE.jpg?w=534&amp;amp;s=1426be7f811e5d5043760f8882674070&quot; width=&quot;534&quot; height=&quot;768&quot;> </a> </div> </div> " data-pin-condition="function() {return this.style.display != 'none';}"><span class="error">loading...</span></div>

1 个答案:

答案 0 :(得分:0)

可能您可以为此使用正则表达式。这是示例:

s = """<div class="expando expando-uninitialized" style="display: none" data-cachedhtml=" <div class=&quot;media-preview&quot; id=&quot;media-preview-66hch1&quot; style=&quot;max-width: 534px&quot;> <div class=&quot;media-preview-content&quot;> <a href=&quot;https://i.redd.it/nctvpvsnbpsy.jpg&quot; class=&quot;may-blank&quot;> <img class=&quot;preview&quot; src=&quot;https://i.redditmedia.com/UELqh-mbh5mwnXr67PoBbi23nwZuNl2v3flNbkmewQE.jpg?w=534&amp;amp;s=1426be7f811e5d5043760f8882674070&quot; width=&quot;534&quot; height=&quot;768&quot;> </a> </div> </div> " data-pin-condition="function() {return this.style.display != 'none';}"><span class="error">loading...</span></div>"""
re.search('href=&quot;(.*jpg)&quot', s).groups()[0]
# 'https://i.redd.it/nctvpvsnbpsy.jpg'