Question

我想获取网页的全文，很不幸，我的抓取器也正在捕获CSS代码，我如何完成下面的代码以删除CSS样式代码：

page = " ".join(response.xpath('//body//descendant-or-self::*[not(self::script)]/text()').extract())

Answer 1

尝试

//body//descendant-or-self::*[not(self::script or self::style)]

我已测试并且可以正常工作，但不包括STYLE和SCRIPT标签