Question

我遇到了Scrappy项目对象的问题。目前的问题是，当我刮掉某些字段时，我会像这样保存它们：

item['tag'] = response.xpath("//div[contains(@class, 'video-info-row showLess')]"
                                     "//a[contains(@href, '/video/search?search')]/text()").extract()

每次传递都会抓取多个标记并保存到项目['tag']。然后我将标签上传到我的SQL服务器并获得mySQL语法错误。问题非常明显，因为它试图插入类似：'tag1', u'tag2', u'tag3', u'tag4', u'tag5', u'tag6'的内容。有没有摆脱引号，因为我尝试过.replace（“'”，“”），但它没有用。

Answer 1

您需要为该特定字段设置Join()输出处理器：

import scrapy
from scrapy.contrib.loader.processor import Join

class MyItem(scrapy.Item):
    my_field = scrapy.Field(output_processor=Join(separator=','))

Answer 2

为了建立在alecxe的答案之上，处理器只能使用Item Loaders（http://doc.scrapy.org/en/latest/topics/loaders.html）：

def parse(self, response):
    l = ItemLoader(MyItem(), response)
    l.add_xpath('tag', '//a[@href="/video/search?search"]/text()')
    return l.load_item()

另一种解决方案是简单地使用join方法：

def parse(self, response):
    item = MyItem()
    item['tag'] = ','.join(response.xpath('//a[@href="/video/search?search"]/text()').extract())
    return item

scrapy项目对象出错

2 个答案: