Question

我有一个非常简单的代码，如下所示。刮擦是可以的，我可以看到生成正确数据的所有NaN语句。在print中，初始化工作正常。但是，Pipeline函数未被调用，因为函数开头的process_item语句永远不会被执行。

蜘蛛：comosham.py

print

项目文件：

import scrapy
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from activityadvisor.items import ComoShamLocation
from activityadvisor.items import ComoShamActivity
from activityadvisor.items import ComoShamRates
import re


class ComoSham(Spider):
    name = "comosham"
    allowed_domains = ["www.comoshambhala.com"]
    start_urls = [
        "http://www.comoshambhala.com/singapore/classes/schedules",
        "http://www.comoshambhala.com/singapore/about/location-contact",
        "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes",
        "http://www.comoshambhala.com/singapore/rates-and-offers/rates-classes/rates-private-classes"
    ]

    def parse(self, response):  
        category = (response.url)[39:44]
        print 'in parse'
        if category == 'class':
            pass
            """self.gen_req_class(response)"""
        elif category == 'about':
            print 'about to call parse_location'
            self.parse_location(response)
        elif category == 'rates':
            pass
            """self.parse_rates(response)"""
        else:
            print 'Cant find appropriate category! check check check!! Am raising Level 5 ALARM - You are a MORON :D'


    def parse_location(self, response):
        print 'in parse_location'       
        item = ComoShamLocation()
        item['category'] = 'location'
        loc = Selector(response).xpath('((//div[@id = "node-2266"]/div/div/div)[1]/div/div/p//text())').extract()
        item['address'] = loc[2]+loc[3]+loc[4]+(loc[5])[1:11]
        item['pin'] = (loc[5])[11:18]
        item['phone'] = (loc[9])[6:20]
        item['fax'] = (loc[10])[6:20]
        item['email'] = loc[12]
        print item['address'],item['pin'],item['phone'],item['fax'],item['email']
        return item

管道文件：

import scrapy
from scrapy.item import Item, Field

class ComoShamLocation(Item):
    address = Field()
    pin = Field()
    phone = Field()
    fax = Field()
    email = Field()
    category = Field()

Answer 1

你的问题是你从来没有真正屈服于这个项目。 parse_location返回要解析的项，但解析永远不会产生该项。

解决方案是替换：

yield self.parse_location(response)

与

echo "<form method='post'>";

更具体地说，如果没有产生任何项目，则不会调用process_item。

Answer 2

在settings.py中使用ITEM_PIPELINES：

ITEM_PIPELINES = ['project_name.pipelines.pipeline_class']

Answer 3

添加上述答案，
1.请记住将以下行添加到settings.py中！ ITEM_PIPELINES = {'[YOUR_PROJECT_NAME].pipelines.[YOUR_PIPELINE_CLASS]': 300} 2.蜘蛛跑完时产生物品！ yield my_item

Answer 4

这解决了我的问题：我在调用Pipeline之前删除所有项目，因此没有调用process_item（）但是调用了open_spider和close_spider。因此，tmy解决方案只是更改了使用此管道的顺序，而不是另一个丢弃项目的管道。

Scrapy Pipeline Documentation.

请记住，只有当有要处理的项目时，Scrapy才会调用Pipeline.process_item（）！

Python，Scrapy，Pipeline：函数“process_item”没有被调用

4 个答案: