Question

我正在尝试使用scrapy从网页上抓取产品信息。这是我的webpage

我看到了以下帖子：

Scraping dynamic content using python-Scrapy

还有很多其他人，然后写了下面的代码：

import scrapy
from scrapy_splash import SplashRequest

class filmnet_Spider(scrapy.Spider):
    name = 'filmnet'
    start_urls = {'http://filmnet.ir/'}

    DOWNLOADER_MIDDLEWARES = {
    'filmnet_Spider.SplashCookiesMiddleware': 723,
    'filmnet_Spider.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

    SPIDER_MIDDLEWARES = {
    'filmnet_Spider.SplashDeduplicateArgsMiddleware': 100,
}
    DUPEFILTER_CLASS = 'filmnet_Spider.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'filmnet_Spider.SplashAwareFSCacheStorage'

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(url, self.parse, meta={
                'splash': {
                'endpoint': 'render.html',
                'args': {'wait': 0.5}
                }
            })

    def parse(self, response):

        for filmnetscrap in self.start_urls:

            poster = filmnetscrap.xpath('//div[@class="verticalImage organizer"]//img/@src').extract()
            print poster

我还写了包含以下内容的设置文件：

SPLASH_URL ='http://localhost:8050/'

但这没用

如何使用python scrapy爬行AngularJS网站？

0 个答案: