我正在尝试使用scrapy从网页上抓取产品信息。 这是我的webpage
我看到了以下帖子:
selenium with scrapy for dynamic page
Scraping dynamic content using python-Scrapy
还有很多其他人,然后写了下面的代码:
import scrapy
from scrapy_splash import SplashRequest
class filmnet_Spider(scrapy.Spider):
name = 'filmnet'
start_urls = {'http://filmnet.ir/'}
DOWNLOADER_MIDDLEWARES = {
'filmnet_Spider.SplashCookiesMiddleware': 723,
'filmnet_Spider.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = {
'filmnet_Spider.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'filmnet_Spider.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'filmnet_Spider.SplashAwareFSCacheStorage'
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url, self.parse, meta={
'splash': {
'endpoint': 'render.html',
'args': {'wait': 0.5}
}
})
def parse(self, response):
for filmnetscrap in self.start_urls:
poster = filmnetscrap.xpath('//div[@class="verticalImage organizer"]//img/@src').extract()
print poster
我还写了包含以下内容的设置文件:
SPLASH_URL ='http://localhost:8050/'
但这没用