如何使用python scrapy爬行AngularJS网站?

时间:2018-07-03 06:15:55

标签: python dynamic web-scraping scrapy

我正在尝试使用scrapy从网页上抓取产品信息。 这是我的webpage

我看到了以下帖子:

selenium with scrapy for dynamic page

Scraping dynamic content using python-Scrapy

还有很多其他人,然后写了下面的代码:

import scrapy
from scrapy_splash import SplashRequest

class filmnet_Spider(scrapy.Spider):
    name = 'filmnet'
    start_urls = {'http://filmnet.ir/'}

    DOWNLOADER_MIDDLEWARES = {
    'filmnet_Spider.SplashCookiesMiddleware': 723,
    'filmnet_Spider.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

    SPIDER_MIDDLEWARES = {
    'filmnet_Spider.SplashDeduplicateArgsMiddleware': 100,
}
    DUPEFILTER_CLASS = 'filmnet_Spider.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'filmnet_Spider.SplashAwareFSCacheStorage'

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(url, self.parse, meta={
                'splash': {
                'endpoint': 'render.html',
                'args': {'wait': 0.5}
                }
            })

    def parse(self, response):

        for filmnetscrap in self.start_urls:

            poster = filmnetscrap.xpath('//div[@class="verticalImage organizer"]//img/@src').extract()
            print poster

我还写了包含以下内容的设置文件:

SPLASH_URL ='http://localhost:8050/'

但这没用

0 个答案:

没有答案