Question

以下代码：

<a class="title" href="the link">
Low price
<strong>computer</strong>
you should not miss
</a>

我使用这个xpath代码进行scrapy：

response.xpath('.//a[@class="title"]//text()[normalize-space()]').extract()

我得到了以下结果：

u'\n                  \n                  Low price ', u'computer', u' you should not miss'

为什么\n之前low price的两个normalize-space()和许多空格都没有被u'Low price computer you should not miss'删除？

另一个问题：如何将3个部分合并为一个被抓取的项目<?php final class Test {} /** ZEND_ACC_CLASS is defined as 0, just looks nicer ... **/ uopz_flags(Test::class, null, ZEND_ACC_CLASS); $reflector = new ReflectionClass(Test::class); var_dump($reflector->isFinal()); ?>？

Answer 1

请试试这个：

'normalize-space(.//a[@class="title"])'

Answer 2

我已经遇到了相同的问题，请尝试以下操作：

ThisWorkbook

Answer 3

您对normalize-space（）的调用是谓词。这意味着您选择的文本节点（有效布尔值）normalize-space()为真。您没有选择normalize-space的结果：为此您需要

.//a[@class="title"]//text()/normalize-space()

（需要XPath 2.0）

问题的第二部分：只需使用

string(.//a[@class="title"])

（假设scrapy-spider允许您使用返回字符串的XPath表达式，而不是返回节点的表达式。）

Xpath：为什么normalize-space无法删除空格并且\ n？

3 个答案: