为什么启动请求不渲染

时间:2019-08-14 16:23:52

标签: python visual-studio-code scrapy scrapy-splash

我正在针对VS Code中的刮擦飞溅进行教程练习。我无法解析http://quotes.toscrape.com/js/上的Javascript。我在本地主机8050上运行了Splash,我使用以下命令将它从docker中拉出:

    docker run -p 8050:8050 scrapinghub/splash --disable-private-mode

scrapy-splash安装在scrapy项目的根目录中。

Settings.py是:

    SPLASH_URL = 'http://localhost:8050'

    DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
   'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,

}

   SPIDER_MIDDLEWARES = {
   'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
   }

   DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'

   HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

我在启用和未启用私人浏览的情况下停止并重新启动了Docker映像。我在启动请求中添加了一个等待,什么也没有。我已经从头开始构建了Spider,并在示例中进行了复制。

代码在HTML页面上可以正常工作,但是当我通过SplashRequest使用JS版本时却什么也没有。这是抓取的“ hello world”,我真的很想知道我在哪里/做错了什么。我怀疑它如此愚蠢,显而易见,但是我看不到我在哪里或做错了什么。我正在使用VS Code,所以也许我的设置中有某种原因导致这种情况,但是我正在使用venv。

python

 import scrapy
 from scrape_douglas.scrape_douglas.items import QuoteItem
 from scrapy.selector import Selector
 from scrapy_splash import SplashRequest

class DougrSpider(scrapy.Spider):
    name = 'dougr'
    allowed_domains = ['toscrape.com']
    start_urls = ["http://quotes.toscrape.com/js/"]

def start_requests(self):
    for url in self.start_urls:
        yield SplashRequest(url=url, callback=self.parse,      endpoint='render.html')

def parse(self, response):
    for quote in response.css("div.quote"):
        item ={'author':quote.css("small.author::text").extract_first(),
        'text':quote.css("span.text::text").extract_first(),
        'tag':quote.css("a.tag::text").extract()}
     yield item

这是日志文件:

    2019-08-14 15:08:28+0000 [-] Log opened.

    2019-08-14 15:08:28.371179 [-] Splash version: 3.3.1

    2019-08-14 15:08:28.371458 [-] Qt 5.9.1, PyQt 5.9.2, 
    WebKit 602.1,sip 4.19.4, Twisted 18.9.0, Lua 5.2

    2019-08-14 15:08:28.371539 [-] Python 3.5.2 
    (default, Nov 12 2018,13:43:14) [GCC 5.4.0 20160609]

    2019-08-14 15:08:28.371611 [-] Open files limit: 1048576

    2019-08-14 15:08:28.371658 [-] Can't bump open files limit

    2019-08-14 15:08:28.474871 [-] Xvfb is started: 
    ['Xvfb', ':920282986', '-screen', '0', 
    '1024x768x24', '-nolisten', 'tcp']
    QStandardPaths: XDG_RUNTIME_DIR not set, 
    defaulting to '/tmp/runtimee-root'                                                            r

    2019-08-14 15:08:28.538650 [-] proxy profiles support is enabled, 
    p roxy profiles path: /etc/splash/proxy-profiles                     e

    2019-08-14 15:08:28.538921 [-] memory cache: enabled, 
    private mode:o disabled, js cross-domain access: disabled

    2019-08-14 15:08:28.632635 [-] verbosity=1, slots=20, 
    argument_cach.e_max_entries=500, max-timeout=90.0

    2019-08-14 15:08:28.633557 [-] Web UI: enabled, 
    Lua: enabled (sandbox: enabled)nabled)

    2019-08-14 15:08:28.633904 [-] Site starting on 8050                     
    object at 0x7f75cf214

    2019-08-14 15:08:28.633998 [-] Starting factory 
    <twisted.web.server.Site object at 0x7f75cf214cc0>

    2019-08-14 15:08:28.634273 [-] Server listening on 
    http://0.0.0.0:8050  000] "GET / HTTP/1.1"

    2019-08-14 15:52:58.379703 [-] "xxxx.xx.x.x" - - 
    [14/Aug/2019:15:52:57 +00100101 Firefox/69.0"000] 
    "GET / HTTP/1.1" 200 7679 "-" "Mozilla/5.0 (Windows NT 10.0; Win64;000] 
    "GET /_ui/style.c x64; rv:69.0) 
    Gecko/20100101 Firefox/69.0"10.0; Win64; x64; rv:6

    2019-08-14 15:52:58.415976 [-] "xxx.xx.x.x" - - 
    [14/Aug/2019:15:52:57 +0000] 
    "GET /_ui/style.css HTTP/1.1" 200 2591 "http://localhost:8050/" 
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"

0 个答案:

没有答案