我无法理解parse_reviews
函数中for循环中xpath表达式的问题所在。为什么没有输出?
import scrapy
class BookingSpider(scrapy.Spider):
name = 'booking-hotel-spider'
allowed_domains = ['booking.com']
start_urls = [
'https://www.booking.com/hotel/ch/vision-apartment-milita-rstrasse.de.html?aid=356980;label=gog235jc-1FCAIoLDgcSAdYA2gsiAEBmAEHuAEHyAEP2AEB6AEB-AECiAIBqAIDuAK7q7DyBcACAQ;sid=9132b14809ec97a2f9b60ecaf2954252;breadcrumb=hotel;srpvid=ca2699ebdc4e00cf&'
]
# get reviews page of a hotel
def parse(self, response):
reviewsurl = response.xpath('//a[@class="hp_nav_reviews_link toggle_review track_review_link_zh"]/@href')
url = response.urljoin(reviewsurl[0].extract())
url = url.replace('blockdisplay4', 'tab-reviews')
yield scrapy.Request(url, callback=self.parse_reviews)
# parse its reviews
def parse_reviews(self, response):
for rev in response.xpath('//li[starts-with(@class,"review_list_new_item")]'):
author = rev.xpath('.//span[@class="bui-avatar-block__title"]/text()').extract()
print(author)
authorcountry = rev.xpath('.//span[@class="bui-avatar-block__subtitle"]/text()').extract()
print(authorcountry)
title = rev.xpath('.//div[@class="bui-grid__column-10"]//h3/text()').extract()
print(title)
编辑: 不良输出:
['Maike', 'Eduard', 'Andrearick', 'Alexander', 'Elena', 'Katia', 'Chris', 'Marianna', 'Kam', 'Rachel', 'Maike', 'Eduard', 'Andrearick', 'Alexander', 'Elena', 'Katia', 'Chris', 'Marianna', 'Kam', 'Rachel', 'Maike', 'Eduard', 'Andrearick', 'Alexander', 'Elena', 'Katia', 'Chris', 'Marianna', 'Kam', 'Rachel']
['Deutschland', 'Deutschland', 'Deutschland', 'Österreich', 'Bulgarien', 'Großbritannien', 'Großbritannien', 'Italien', 'Hongkong', 'Malaysia', 'Deutschland', 'Deutschland', 'Deutschland', 'Österreich', 'Bulgarien', 'Großbritannien', 'Großbritannien', 'Italien', 'Hongkong', 'Malaysia', 'Deutschland', 'Deutschland', 'Deutschland', 'Österreich', 'Bulgarien', 'Großbritannien', 'Großbritannien', 'Italien', 'Hongkong', 'Malaysia']
目前,我找不到title
的正确xpath。
答案 0 :(得分:0)
尝试将您的title
更改为:
title = rev.xpath('.//div[@class="c-review-block__row"]//h3/text()')