html:
<h2 class="reward__pledge-amount">
Pledge $1 or more
<div class="reward__currency-conversion">
<h5 class="regular grey-dark">
About <span>$1.00 USD</span>
</h5>
</div>
</h2>
<p class="reward__backer-count">
<span class="ksr-icon__backer-badge"></span>
2 backers
</p>
scrapy shell:
sites = sel.css(".reward__info")
for site in sites:
a = site.xpath("./h2[@class='reward__pledge-amount']/text()").extract()
b = site.xpath("./p[@class='reward__backer-count']/text()").extract()
print a
print b
break
结果:
[u'\nPledge $1 or more\n', u'\n']
[u'\n', u'\n2 backers\n']
如您所见,text()
会返回一个列表
我认为这是因为<h2>
中有<div>
,<p>
有<span>
如何在没有子节点文本的情况下获取text()
和<h2>
下的<p>
???
像:
[u'\nPledge $1 or more\n']
[u'\n2 backers\n']
答案 0 :(得分:0)
您可以尝试在normalize-space()
的XPath谓词中使用text()
来过滤掉空文本节点,例如:
a = site.xpath("./h2[@class='reward__pledge-amount']/text()[normalize-space()]").extract()
b = site.xpath("./p[@class='reward__backer-count']/text()[normalize-space()]").extract()