Question

我正试图抓取标题为 https://time.com/

我只选择“摘要”标题下的那些文章链接

我尝试使用此代码选择嵌套的div

for url in response.xpath('//div[@class="column text-align-left visible-desktop visible-mobile last-column"]/div[@class="column-tout"]/a/@href').extract():

但是没有用

有人可以帮忙提取那些具体文章

Answer 1

您可以按内容找到div，然后获取全部following-sibling：

for url in response.xpath('//div[.="The Brief"]/following-sibling::div//a/@href').extract():

粗糙嵌套的Div选择

1 个答案: