Python-Scrapy Splash无法呈现此页面

时间:2018-08-17 22:04:28

标签: python web-scraping scrapy scrapy-splash

https://www.miamidade.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=08/16/2018

这是我要抓取的页面。当我使用SplashRequest打开它时,我得到了具有相同来源的不同页面。 这些是我对splas的设置:

ROBOTSTXT_OBEY = False
SPLASH_URL = 'http://192.168.99.100:8050'
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 
810,
}
SPIDER_MIDDLEWARES = {
     'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'

我的蜘蛛码:     进口沙皮     从scrapy_splash导入SplashRequest

class RealForeclosure(scrapy.Spider):
    name = 'realForeclosure'
    start_urls = [
    'https://www.miamidade.realforeclose.com/index.cfm? 
zaction=user&zmethod=calendar'
        ]

    def parse(self,response):
        link = 'https://www.miamidade.realforeclose.com/index.cfm? 
 zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE='
        date = response.xpath('//div[@tabindex="0"]/@dayid').extract()[10]
        yield SplashRequest(link+date, callback=self.auction)

    def auction(self, response):
        for i in response.css('.AUCTION_ITEM').extract():
            yield {'item':i}

1 个答案:

答案 0 :(得分:0)

您需要某种延迟才能允许Splash渲染结果:

pool <- pool::dbPool(drv = RSQLite::SQLite(),
                     dbname="data/compfleet.db")