Scrapy csv输出每列没有多行

时间:2013-07-10 17:52:20

标签: python csv web-scraping scrapy

我正在尝试抓取一个活动网站,我有附加的代码来抓取事件名称和位置。我将输出写入csv文件,但随后csv文件将所有事件名称相互附加在一行中。

例如,假设我有两个事件Bruno Mars和Maroon 5,以及他们在San Jose,Santa Clara的位置。当前输出是,

event_name event_location

Bruno Mars,Maroon 5 San Jose,Santa Clara

但我希望看到,

event_name event_location

Bruno Mars San Jose

Maroon 5 Santa Clara。

有人可以告诉我为什么这种格式对我来说很奇怪?我在这里附上了代码。然后我使用scrapy crawl event_spider -o output.csv -t csv来运行我的代码。

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

from event_test.items import EventItem


class EventSpider(BaseSpider):
    name = "event_spider"
    allowed_domains = ["eventful.com"]
    start_urls = [
         "http://eventful.com/sanjose/events"
    ]

   def parse(self, response):
     hxs = HtmlXPathSelector(response)
     events = hxs.select("/html/body[@id='events']/div[@id='outer-container']/div[@id='mid-container']/div[@id='inner-container']/div[@id='content']/div[@class='cols-2-1']/div[@class='alpha']/div[@id='top-events']/div[@class='section top-events cage-dbl-border cage-bdr-mdgrey']/div[@id='events-scroll']/div[@id='events-scroll-items']/ul[@id='events-scroll-items-list']/li[@class='top-events-item  ']")
     items = []
     for event in events:
        item = EventItem()
        item['event_name'] = event.select("//h2/a/span/text()").extract()
        item['event_locality'] = event.select("//span[@class='locality']/text()").extract()
        items.append(item)
     return items

1 个答案:

答案 0 :(得分:0)

我简化了蜘蛛中的代码和xpath:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from event_test.items import EventItem


class EventSpider(BaseSpider):
    name = "event_spider"
    allowed_domains = ["eventful.com"]
    start_urls = ["http://eventful.com/sanjose/events"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        events = hxs.select("//li[contains(@class, 'top-events-item')]")
        for event in events:
            item = EventItem()
            item['event_name'] = event.select(".//h2/a/span/text()").extract()[0]
            item['event_locality'] = event.select(".//span[@class='locality']/text()").extract()[0]
            yield item

以下是您将在csv文件中获得的内容:

event_name,event_locality
Under the Influence of Music Tour,Mountain View
Bruno Mars,San Jose
John Mayer: Born & Raised Tour 2013,Mountain View
New Kids on the Block with 98 Degrees and ...,San Jose
Amy Grant,San Jose
Styx,Saratoga
Bob Dylan with Wilco,Mountain View
Kenny Chesney with Eli Young Band,Mountain View
Smash Mouth \/ Sugar Ray \/ Gin Blossoms \...,Saratoga
Creedence Clearwater Revisited \/ 38 Special,Saratoga

希望有所帮助。