Question

我正在使用Scrapy，XPath和Python来抓取网站。当我得到结果时，它有\ r \ n。谷歌搜索已经产生我需要在我的XPath上使用normalize-space（）。当我这样做时，见下文，它不起作用。

item ['runs'] = stats.select((normalize-space('//tr[@class="cell1"]/td[3]/text()')[count])).extract()

我得到一个“全局名称规范化未定义错误。

有什么想法吗？

Answer 1

normalize-space是XPath的一部分，而不是Python。所以在Python或其他一些库中没有这样的函数。它的正确用法是这样的（仅用于样本）：

stats.select('''//tr[normalize-space(td/text()) = 'User Name']''').extract()

只是为了删除python中一个字符串的空格，你可以使用str方法。例如： strip将删除前导和尾随空格。

>>> '\r\n\rsample\r\n'.strip()
'sample'

类似于normalize-space：

>>> ' '.join('\r\ns  am  \r\n ple\r\n'.split())
's am ple'