Question

这是我的蜘蛛：

from scrapy.spider import BaseSpider
from scrapy.selector import Selector
from ..items import TutorialItem

class Tutorial1(BaseSpider):
name = "Tut"
allowed_domains = ['nytimes.com']
start_urls = ["http://nytimes.com",] 

def parse(self, response):
    sel = Selector(response)
    sites = sel.xpath('//div[@class="span-ab-layout layout"]')
    items = []

    for site in sites:
        item = TutorialItem()
        item['title'] = map(unicode.strip, site.select('//h2[@class="story-heading"]/a/text()').extract())
        item['time'] = map(unicode.strip, site.select('//time[@class="timestamp"]/text()').extract())
        yield item

这是我的输出：

作者时间   作者：PETER BAKER，作者：JONATHAN M. KATZ和RICHARDPÃ‰REZ-PEÃ'A，作者：NEIL MacFARQUHAR，作者：RON NIXON，作者：RICHARD GOLDSTEIN，LOUISE STORY和ALEJANDRA XANIC von BERTRAB，作者：DAVID CARR，作者A.O. SCOTT，JERÃ‰LONGMAN，编辑委员会成员，JON BECKMANN，CJ HUGHES，作者：JOANNE KAUFMAN美国东部时间上午10点26分，美国东部时间下午1点08分，东部时间上午11点57分，美国东部时间上午8点33分，10：01美国东部时间，美国东部时间下午12:35，美国东部时间下午1:47，美国东部时间上午10:36，美国东部时间上午10：26，美国东部时间上午9:49，美国东部时间下午12：05，美国东部时间上午9:21，东部时间下午12:22 ，东部时间上午11：52，美国东部时间上午8点59分


作者：PETER BAKER，作者：JONATHAN M. KATZ和RICHARDPÃ‰REZ-PEÃ'A，作者：NEIL MacFARQUHAR，作者：RON NIXON，作者：RICHARD GOLDSTEIN，作者：LOUISE STORY和ALEJANDRA XANIC von BERTRAB，作者：DAVID CARR，作者A.O. SCOTT，JERÃ‰LONGMAN，编辑委员会成员，JON BECKMANN，CJ HUGHES，作者：JOANNE KAUFMAN美国东部时间上午10点26分，美国东部时间下午1点08分，东部时间上午11点57分，美国东部时间上午8点33分，10：01美国东部时间，美国东部时间下午12:35，美国东部时间下午1:47，美国东部时间上午10:36，美国东部时间上午10：26，美国东部时间上午9:49，美国东部时间下午12：05，美国东部时间上午9:21，东部时间下午12:22 ，东部时间上午11：52，美国东部时间上午8点59分

我做了缩进，因此很清楚它在哪里重复。

当我打印出我的CSV工作总是出现在一个巨大的行中时，我的问题就出现了。由于某种原因，它也会生成重复的列。任何人都可以帮我解决这个困境吗？

Answer 1

我能够通过试验找到它：

hxs = HtmlXPathSelector(response)

显然，Selector和HtmlPatchSelector之间存在巨大差异

Scrapy csv在多行输出

1 个答案: