Question

我正在抓取本地page_source文件。 Scrapy完全跳过了parse_nextfile()功能。它适用于parse()函数。我不知道为什么会这样？

from scrapy import Spider
from scrapy.loader import ItemLoader
from linkedin.items import LinkedinItem
import glob, os

class ProfilesSpider(Spider):

    name = 'profiles'
    allowed_domains = ["file://127.0.0.1"]
    start_urls = ["file://127.0.0.1/path/to/file/text.txt"]

    def parse_nextfile(self,response):

       #retrieve local files directory
       request(url, callback = self.parse)

    def parse(self, response):
       #scraping the page_source file

Answer 1

# BEGIN SSL RewriteEngine On RewriteCond %{REQUEST_URI} !^/public/ RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^(.*)$ /public/$1 ## EXPIRES CACHING ## <IfModule mod_expires.c> RewriteEngine on RewriteCond %{HTTP_USER_AGENT} libwww-perl.* RewriteRule .* ? [F,L] ExpiresActive On ExpiresByType image/jpg "access 1 month" ExpiresByType image/jpeg "access 1 month" ExpiresByType image/gif "access 1 year" ExpiresByType image/png "access 1 year" ExpiresByType text/css "access 1 month" ExpiresByType text/html "access 1 month" ExpiresByType application/pdf "access 1 month" ExpiresByType text/x-javascript "access 1 month" ExpiresByType application/x-shockwave-flash "access 1 month" ExpiresByType image/x-icon "access 1 year" ExpiresDefault "access 1 month" </IfModule> ## EXPIRES CACHING ##是任何Scrapy请求的默认回调。

如果您需要其他方法来解析请求，则需要在请求中指定parse

Scrapy跳过一种方法

1 个答案: