Question

我想从链接中提取整个产品网址＆＃34; http://presskr.com/category/Mobiles--Tablets/35＆＃34;在python中使用scrapy。以下是我用来执行此操作的功能：

<iframe id="edit-text-modal-value_ifr" src='javascript:""' frameborder="0" allowtransparency="true" title="Rich Text AreaPress ALT-F10 for toolbar. Press ALT-0 for help" style="width: 100%; height: 100px; display: block;">
    #document
    <html>
        <head></head>
        <body></body>
    </html>
</iframe>

每个产品div的x路径为：// div [@id =＆＃34; pagination_contents＆＃34;] / div [2] / div [＆＃39; + str（i）+＆＃39 ] /一个/ @ HREF

但我只获得一个链接而不是所有产品＆＃39;网址。

Answer 1

请尝试以下方法。我建议遵循Scrapy指南，只需相应地进行，你不需要太多的手动操作。您的示例非常类似于：http://doc.scrapy.org/en/latest/intro/tutorial.html#extracting-the-data，所以请进一步了解

def parse(self, response):
        for href in response.xpath('//span[@class ="itemlistinginfo"]/a/@href'):
            full_url = urljoin(href.extract())
            item = DmozItem()
            item['link'] = full_url
            yield item

如何在Python中使用Scrapy抓取url

1 个答案: