如何使用动态路由从站点中抓取所有工具
http://growthtools.io/social-media-automation-tools
当我试图
时scrapy shell 'http://growthtools.io/social-media-automation-tools'
我收到了以下结果
2017-01-07 22:43:06 [root] DEBUG: Using default logger
2017-01-07 22:43:06 [root] DEBUG: Using default logger
In [1]: view(response)
和response
对象不包含tools
元素。
In [3]: In [2]: response.css('.toolsList')
Out[3]: []
In [5]: 'toolsList' in response.body
Out[5]: False
谁可以描述我如何解析http://growthtools.io/social-media-automation-tools以及为什么reponse
对象不包含所有页面内容?
答案 0 :(得分:0)
页面加载涉及由Scrapy不是的浏览器执行的JavaScript。您可以使用scrapy-splash
来解决它,它提供了在您的Scrapy项目中使用的中间件。中间件使用您可以通过docker运行的Splash
JS rendering service。
至于在Scrapy Shell中测试它,您可以关注this example to run it from the shell。
适合我:
$ scrapy shell 'http://localhost:8050/render.html?url=http://growthtools.io/social-media-automation-tools'
In [1]: response.css('.toolsList')
Out[1]:
[<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>]