如何有序地获得scrapy结果?

时间:2015-05-12 05:45:34

标签: python web-scraping scrapy scrapy-spider

帮助我治疗。我的代码产生了输出,但它没有打印出更正的方式。

我也试过在另一个for循环中但是这不会给出正确的结果,无论如何你在那里发现了什么东西..请给我打电话

代码:

import scrapy

class YelpScrapy(scrapy.Spider):
    name = 'yelp'
    start_urls = ["http://www.yelp.com/search?find_desc=Pet+Grooming+Services&find_loc=Starnberg%2C+Bayern",]

    def print_link(self, link):
        return link

    def parse(self, response):
        website = scrapy.Selector(response)
        items = []

        for obj in website.xpath("//div[@class='main-attributes']"):
            item = YelpItem()

            # Getting name
            item['name'] = obj.xpath("//div[@class='media-story']//h3//a/text()").extract()

            # Getting addresss
            item['address'] = obj.xpath("//div[@class='secondary-attributes']//address").extract()

            items.append(item)

        return items

结果输出如下:

 'name': [u'Tierschutzverein Starnberg u. Umgebung',
              u'M\xfcmmelpension',
              u'Hundesportverein Starnberg e. V.',
              u'Bellness Hundesalon',
              u'California Dog Spa',
              u'Gassi Germering',
              u'Hundesalon Tanaka Beauty & Spa',
              u'Hundesalon Popp',
              u'Neuhauser Hundeladen',
              u'TheraFelis Katja R\xfcssel'],

{'address': [u'<address>\n            Franziskusweg 34<br>82319 Starnberg<br>Germany\n        </address>',
                 u'<address>\n            St.-Michael-Str. 19<br>82319 Starnberg<br>Germany\n        </address>',
                 u'<address>\n            J\xe4gersbrunner Str. 1<br>82319 Starnberg<br>Germany\n        </address>',
                 u'<address>\n            Baierbrunner Str. 1<br>81379 Munich<br>Germany\n        </address>',
                 u'<address>\n            Geigenbergerstr. 51<br>81477 Solln<br>Germany\n        </address>',
                 u'<address>\n            Donnersbergerstr. 30<br>80634 Munich<br>Germany\n        </address>',
                 u'<address>\n            Els\xe4sser Stra\xdfe 24<br>81667 Munich<br>Germany\n        </address>',
                 u'<address>\n            Schluderstr. 40<br>80634 Munich<br>Germany\n        </address>',
                 u'<address>\n            Fliederstr.  23<br>82131 Gauting<br>Germany\n        </address>'],

为什么它不像{{name, address}{name, address}}那样出现。

1 个答案:

答案 0 :(得分:1)

那是因为你的定位器匹配多个元素并且不是特定于上下文的(应该以点开头),修复它:

def parse(self, response):
    for obj in response.css("ul.search-results li"):
        item = YelpItem()

        item['name'] = obj.xpath(".//div[@class='media-story']//h3//a/text()").extract()[0]
        item['address'] = ''.join(obj.xpath(".//div[@class='secondary-attributes']//address/text()").extract()).strip()

        yield item