我现在在这个问题上挣扎了很长时间。这是我需要从中提取Annual div
而不是Annual Div yield
的表格。
<table class="horizontalTable col1of3 lastCol">
<tbody>
<tr class="first">
<th>Annual div <span class="sub">(TTM)</span></th>
<td>5.49 <span class="currencyCode">GBX</span></td>
</tr>
<tr>
<th>Annual div yield <span class="sub">(TTM)</span></th>
<td>6.04%</td>
</tr>
<tr>
<th>Div ex-date</th>
<td><span class="nowrap">Sep 25 2013</span></td>
</tr>
<tr class="last">
<th>Div pay-date</th>
<td><span class="nowrap">Nov 22 2013</span></td>
</tr>
</tbody>
</table>
我编写了这个XPath查询,但它带来了Annual div
和Annual div yield
Annual_div = sel.xpath('//table[contains(@class, "horizontalTable col1of3")]/tbody/tr[th[contains(.,"Annual div")]]').extract()
结果:
<tr class="first"><th>Annual div <span class="sub">(TTM)</span></th><td>5.49 <span class="currencyCode">GBX</span></td></tr>', u'<tr><th>Annual div yield <span class="sub">(TTM)</span></th><td>5.83%</td></tr>
当我在精确文本上写匹配时,结果不会产生任何结果:
Annual_div = sel.xpath('//table[contains(@class, "horizontalTable col1of3")]/tbody/tr[th[text()="Annual div"]]').extract()
似乎与跨度(TTM)有关我不知道如何连接年度div(TTM)以得出完全匹配。
请帮帮我。
此致
答案 0 :(得分:1)
要通过完全匹配进行比较,您最后会遗漏一个空格。这应该有效:
//table[contains(@class, "horizontalTable col1of3")]/tbody/tr[th/text() = "Annual div "]]
但是,如果要删除前导和尾随空格,可以使用nornmalize-space()
,如下所示:
//table[contains(@class, "horizontalTable col1of3")]/tbody/tr[normalize-space(th/text()) = 'Annual div']
答案 1 :(得分:1)
一种选择是使用XPath's normalize-space()
function。
例如:
Annual_div = sel.xpath('//table[contains(@class, "horizontalTable col1of3")]/tbody/tr[normalize-space(th)="Annual div (TTM)"]').extract()
或:
Annual_div = sel.xpath('//table[contains(@class, "horizontalTable col1of3")]/tbody/tr[normalize-space(th/text())="Annual div"]')