如何在xpath中获取src链接

时间:2017-10-11 12:39:26

标签: python xpath

这是html

<div class="c" id="M_Fp01sdJgm">
    <div>
        <a class="nk" href="https://weibo.cn/thebs">figre</a>
            <img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5338.gif" alt="V"/>
            <img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>
      <span class="ctt">
                    ":"resampling
                    <span class="kt">resampling</span>
                    ":Cleantech entrepreneurs are splicing genes in the search for greener fuels
                ​</span>&nbsp;
                [<a href="https://weibo.cn/mblog/picAll/Fp01sdJgm?rl=2">2 pieces of the package</a>
                </div>
    <div>
        <a href="https://weibo.cn/mblog/pic/Fp01sdJgm?rl=1">
          <img src="http://wx1.sinaimg.cn/wap180/3ed2e6e8gy1fk7hohl2i5j219s0ps4qp.jpg" alt="images" class="ib" />
        </a>&nbsp;
        <a href="https://weibo.cn/mblog/oripic?id=Fp01sdJgm&amp;u=3ed2e6e8gy1fk7hohl2i5j219s0ps4qp">image</a>&nbsp;
        <a href="https://weibo.cn/attitude/Fp01sdJgm/add?uid=5757914684&amp;rl=1&amp;st=7b15a6">praise[28094]</a>&nbsp;
        <a href="https://weibo.cn/repost/Fp01sdJgm?uid=1054009064&amp;rl=1">transmit[1164]</a>&nbsp;
        <a href="https://weibo.cn/comment/Fp01sdJgm?uid=1054009064&amp;rl=1#cmtfrm" class="cc">comment[4097]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/Fp01sdJgm?rl=1&amp;st=7b15a6">save</a>
        "<!---->&nbsp;"
        <span class="ct">10月05日 20:08&nbsp;from iPhone 7 Plus

我尝试写下面的内容,已经获得了其他字段。但是'img'是空的

def get_user_data(self,start_url):
    html = requests.get(url=start_url,headers=self.headers,cookies=self.cookies).content
    selector = etree.fromstring(html,etree.HTMLParser(encoding='utf-8'))
    all_user = selector.xpath('//div[contains(@class,"c") and contains(@id,"M")]')
    for i in all_user:
        user_id = i.xpath('./div[1]/a[@class="nk"]/@href')
        content = i.xpath('./div[1]/span[1]')[0]
        contents = content.xpath('string(.)')
        if i.xpath('./div[2]'):
            img = selector.xpath('./div[2]/a/img/@src')     #img is None
            praise_num = i.xpath('./div[2]/a[3]/text()')
            transmit_num = i.xpath('./div[2]/a[4]/text()')
        else:
            img = ''
            praise_num = i.xpath('./div[2]/a[3]/text()')
            transmit_num = i.xpath('./div[2]/a[4]/text()')

我该如何写'img'? 然后我可以通过zip()处理它们?因为我要保存mysql

1 个答案:

答案 0 :(得分:2)

试试这个(你的图像在div [1]下面)

img = i.xpath('./div[1]/a/img/@src')