Question

我正在尝试抓取第一页上有多个博客条目的网页到目前为止，这是我的代码：

for rel in response.xpath('//*[@id="content"]/div[*]/div/comment()[2]'):
    item = Example()
    item['title'] = rel.xpath('//*[@id="content"]/div[*]/div/div/input/@value').extract()
    item['link'] = rel.xpath('//*[@id="content"]/div[*]/div/div/span[4]/a/@href').extract()
    yield item

问题是，如果我使用"*"，我会收到一个链接和一个标题，其中包含所有条目。
但我希望每个条目都有一个标题和链接我是Python和scrapy的新手，并且不知道如何重新获得单个条目第一个条目以"2"开头，下一个条目以+3开头，直至结束于29.（2,5,8 .... 29）

Answer 1

让我建议更明确的XPath。类似的东西应该更接近你的目标：

for rel in response.xpath('//div[@class="beschreibung"]'):
    item['title'] = rel.xpath(".//strong[contains(text(),"Release")]/following-sibling::*[1]/@value").extract()
    item['link'] = rel.xpath('.//span[@style="display:inline;"]//a[contains(text(),"Share")]/@href').extract()
    yield item

Scrapy - response.xpath将项目分开

1 个答案: