我需要解析以下片段:
<span> Lekhwiya v <strong class="winner-strong">Zobahan</strong></span>
或
<span> <strong class="winner-strong">Sepahan</strong> v Al Nasr (UAE)</span>
正确为Lekhwiya v Zobahan
和&lt; Sepahan v Al' Nasr'(UAE)
。
我试图解析为:
team_1 = block.xpath('.//span/text()').extract()[:2]
team_1 = team_1[0].strip() + team_1[1].strip()
team_2 = block.xpath('.//span/strong/text()').extract()[0]
item['match'] = team_2.strip() + ' ' + team_1 if team_1[0] == 'v' else team_1 + ' ' + team_2.strip()
至于我,这是一个丑陋的解决方案。这样做的最佳方法是什么?
答案 0 :(得分:1)
您可以使用XPath的string()函数,或normalize-space()
偶数:
In [1]: text = '''
...: <span> Lekhwiya v <strong class="winner-strong">Zobahan</strong></span>
...: <span> <strong class="winner-strong">Sepahan</strong> v Al Nasr (UAE)</span>
...: '''
In [2]: import scrapy
In [3]: selector = scrapy.Selector(text=text, type="html")
In [4]: for span in selector.xpath('//span'):
...: print(span.xpath('string(.)').extract_first())
...:
Lekhwiya v Zobahan
Sepahan v Al Nasr (UAE)
In [5]: for span in selector.xpath('//span'):
print(span.xpath('normalize-space(.)').extract_first())
...:
Lekhwiya v Zobahan
Sepahan v Al Nasr (UAE)