Scrapy如何在for循环中使用xpath

时间:2020-03-11 16:31:48

标签: xpath scrapy

我无法理解parse_reviews函数中for循环中xpath表达式的问题所在。为什么没有输出?

import scrapy

class BookingSpider(scrapy.Spider):
    name = 'booking-hotel-spider'
    allowed_domains = ['booking.com']
    start_urls = [
        'https://www.booking.com/hotel/ch/vision-apartment-milita-rstrasse.de.html?aid=356980;label=gog235jc-1FCAIoLDgcSAdYA2gsiAEBmAEHuAEHyAEP2AEB6AEB-AECiAIBqAIDuAK7q7DyBcACAQ;sid=9132b14809ec97a2f9b60ecaf2954252;breadcrumb=hotel;srpvid=ca2699ebdc4e00cf&'
    ]

    # get reviews page of a hotel
    def parse(self, response):

        reviewsurl = response.xpath('//a[@class="hp_nav_reviews_link toggle_review track_review_link_zh"]/@href')
        url = response.urljoin(reviewsurl[0].extract())
        url = url.replace('blockdisplay4', 'tab-reviews')
        yield scrapy.Request(url, callback=self.parse_reviews)

    # parse its reviews
    def parse_reviews(self, response):

        for rev in response.xpath('//li[starts-with(@class,"review_list_new_item")]'):
            author = rev.xpath('.//span[@class="bui-avatar-block__title"]/text()').extract()
            print(author)
            authorcountry = rev.xpath('.//span[@class="bui-avatar-block__subtitle"]/text()').extract()
            print(authorcountry)
            title = rev.xpath('.//div[@class="bui-grid__column-10"]//h3/text()').extract()
            print(title)

编辑: 不良输出:

['Maike', 'Eduard', 'Andrearick', 'Alexander', 'Elena', 'Katia', 'Chris', 'Marianna', 'Kam', 'Rachel', 'Maike', 'Eduard', 'Andrearick', 'Alexander', 'Elena', 'Katia', 'Chris', 'Marianna', 'Kam', 'Rachel', 'Maike', 'Eduard', 'Andrearick', 'Alexander', 'Elena', 'Katia', 'Chris', 'Marianna', 'Kam', 'Rachel']
['Deutschland', 'Deutschland', 'Deutschland', 'Österreich', 'Bulgarien', 'Großbritannien', 'Großbritannien', 'Italien', 'Hongkong', 'Malaysia', 'Deutschland', 'Deutschland', 'Deutschland', 'Österreich', 'Bulgarien', 'Großbritannien', 'Großbritannien', 'Italien', 'Hongkong', 'Malaysia', 'Deutschland', 'Deutschland', 'Deutschland', 'Österreich', 'Bulgarien', 'Großbritannien', 'Großbritannien', 'Italien', 'Hongkong', 'Malaysia']

目前,我找不到title的正确xpath。

1 个答案:

答案 0 :(得分:0)

尝试将您的title更改为:

title = rev.xpath('.//div[@class="c-review-block__row"]//h3/text()')