Question

我正在使用scrapy将具有以下信息的项目列表提取到数组中：

<div class="row">
    <div class="col-md-4">
        <p class="title">title info</p>
        <p class="content">txt info</p>
    </div>
    <div class="col-md-4">
        <p class="title">title info</p>
        <p class="content">txt info</p>
    </div>
</div>

以某种方式我的语法似乎是错误的：

>>> response.xpath('//div[@class="row"]/div[@class="col-md-4"]/p/text()').extract()
[]

该项目前面可能还有另一个行类

Answer 1

您要抓取https://www.watchmaster.com/de/bvlgari/automatic/bb38sl-auto/UELG3X5E7R页。

例如，要从页面收集详细信息，最好像下面这样添加额外的父选择器：response.css("div#watch-details-tab div.row div ::text").extract()，以避免从相似的结构中收集数据。

如果您需要按功能收集它，请尝试：

for row in response.css('div#watch-details-tab div.row div'):
    k = row.css('p.title::text').get()
    v = row.css('p.content::text').get()
    # and then your logic for this data

如何从html

1 个答案: