这是我要抓取的页面。当我使用SplashRequest打开它时,我得到了具有相同来源的不同页面。 这些是我对splas的设置:
ROBOTSTXT_OBEY = False
SPLASH_URL = 'http://192.168.99.100:8050'
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware':
810,
}
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
我的蜘蛛码: 进口沙皮 从scrapy_splash导入SplashRequest
class RealForeclosure(scrapy.Spider):
name = 'realForeclosure'
start_urls = [
'https://www.miamidade.realforeclose.com/index.cfm?
zaction=user&zmethod=calendar'
]
def parse(self,response):
link = 'https://www.miamidade.realforeclose.com/index.cfm?
zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE='
date = response.xpath('//div[@tabindex="0"]/@dayid').extract()[10]
yield SplashRequest(link+date, callback=self.auction)
def auction(self, response):
for i in response.css('.AUCTION_ITEM').extract():
yield {'item':i}
答案 0 :(得分:0)
您需要某种延迟才能允许Splash渲染结果:
pool <- pool::dbPool(drv = RSQLite::SQLite(),
dbname="data/compfleet.db")