帮助我治疗。我的代码产生了输出,但它没有打印出更正的方式。
我也试过在另一个for循环中但是这不会给出正确的结果,无论如何你在那里发现了什么东西..请给我打电话
代码:
import scrapy
class YelpScrapy(scrapy.Spider):
name = 'yelp'
start_urls = ["http://www.yelp.com/search?find_desc=Pet+Grooming+Services&find_loc=Starnberg%2C+Bayern",]
def print_link(self, link):
return link
def parse(self, response):
website = scrapy.Selector(response)
items = []
for obj in website.xpath("//div[@class='main-attributes']"):
item = YelpItem()
# Getting name
item['name'] = obj.xpath("//div[@class='media-story']//h3//a/text()").extract()
# Getting addresss
item['address'] = obj.xpath("//div[@class='secondary-attributes']//address").extract()
items.append(item)
return items
结果输出如下:
'name': [u'Tierschutzverein Starnberg u. Umgebung',
u'M\xfcmmelpension',
u'Hundesportverein Starnberg e. V.',
u'Bellness Hundesalon',
u'California Dog Spa',
u'Gassi Germering',
u'Hundesalon Tanaka Beauty & Spa',
u'Hundesalon Popp',
u'Neuhauser Hundeladen',
u'TheraFelis Katja R\xfcssel'],
{'address': [u'<address>\n Franziskusweg 34<br>82319 Starnberg<br>Germany\n </address>',
u'<address>\n St.-Michael-Str. 19<br>82319 Starnberg<br>Germany\n </address>',
u'<address>\n J\xe4gersbrunner Str. 1<br>82319 Starnberg<br>Germany\n </address>',
u'<address>\n Baierbrunner Str. 1<br>81379 Munich<br>Germany\n </address>',
u'<address>\n Geigenbergerstr. 51<br>81477 Solln<br>Germany\n </address>',
u'<address>\n Donnersbergerstr. 30<br>80634 Munich<br>Germany\n </address>',
u'<address>\n Els\xe4sser Stra\xdfe 24<br>81667 Munich<br>Germany\n </address>',
u'<address>\n Schluderstr. 40<br>80634 Munich<br>Germany\n </address>',
u'<address>\n Fliederstr. 23<br>82131 Gauting<br>Germany\n </address>'],
为什么它不像{{name, address}{name, address}}
那样出现。
答案 0 :(得分:1)
那是因为你的定位器匹配多个元素并且不是特定于上下文的(应该以点开头),修复它:
def parse(self, response):
for obj in response.css("ul.search-results li"):
item = YelpItem()
item['name'] = obj.xpath(".//div[@class='media-story']//h3//a/text()").extract()[0]
item['address'] = ''.join(obj.xpath(".//div[@class='secondary-attributes']//address/text()").extract()).strip()
yield item