Question

我对scrapy很新，所以如果在csv文件中没有结果，我很难找出我做错了什么。我可以在控制台中看到结果。这是我尝试过的：

主文件夹名为＆＃34; realyp＆＃34;。蜘蛛文件名为＆＃34; yp.py＆＃34;和代码：

from scrapy.selector import Selector
from scrapy.spider import BaseSpider
from realyp.items import RealypItem

class MySpider(BaseSpider):
     name="YellowPage"
     allowed_domains=["yellowpages.com"]
     start_urls=["https://www.yellowpages.com/search?search_terms=Coffee%20Shops&geo_location_terms=Los%20Angeles%2C%20CA&page=2"]

     def parse(self, response):
        title = Selector(response)
        page=title.xpath('//div[@class="info"]')
        items = []
        for titles in page:
            item = RealypItem()
            item["name"] = titles.xpath('.//span[@itemprop="name"]/text()').extract()
            item["address"] = titles.xpath('.//span[@itemprop="streetAddress" and @class="street-address"]/text()').extract()
            item["phone"] = titles.xpath('.//div[@itemprop="telephone" and @class="phones phone primary"]/text()').extract()
            items.append(item)
        return items

＆＃34; items.py＆＃34;文件包括：

from scrapy.item import Item, Field
class RealypItem(Item):
    name= Field()
    address = Field()
    phone= Field()

要获取csv输出，我的命令行是：

cd desktop
cd realyp
scrapy crawl YellowPage -o items.csv -t csv

非常感谢任何帮助。

Answer 1

正如@Granitosauros所述，您应该使用yield代替return。产量应该在for循环内。在for循环中，如果路径以//开头，则选择符合以下条件的文档中的所有元素（参见here）。

这是一个适合我的（粗略）代码：

# -*- coding: utf-8 -*-
from scrapy.selector import Selector
from scrapy.spider import BaseSpider
from realyp.items import RealypItem

class MySpider(BaseSpider):
    name="YellowPage"
    allowed_domains=["yellowpages.com"]
    start_urls=["https://www.yellowpages.com/search?search_terms=Coffee%20Shops&geo_location_terms=Los%20Angeles%2C%20CA&page=2"]

    def parse(self, response):
        for titles in response.xpath('//div[@class = "result"]/div'):
            item = RealypItem()
            item["name"] = titles.xpath('div[2]/div[2]/h2 /a/span[@itemprop="name"]/text()').extract()
            item["address"] = titles.xpath('string(div[2]/div[2]/div/p[@itemprop="address"])').extract()
            item["phone"] = titles.xpath('div[2]/div[2]/div/div[@itemprop="telephone" and @class="phones phone primary"]/text()').extract()
            yield item

Scrapy在控制台中运行结果，但CSV输出仍为空白

1 个答案: