我的这个表格包含源代码HERE:
我希望获得所有行,我可以使用:
使用string-join($doc//*[@id='salaries']/tbody/tr/normalize-space(.), '
')
的预期最终输出是:
1985-86 Los Angeles Lakers NBA $2,030,000
1987-88 Los Angeles Lakers NBA $2,000,000
1988-89 Los Angeles Lakers NBA $3,000,000
我的问题是,如何从最终输出中删除第三列(在此示例中命名为NBA)以获取此信息:
1985-86 Los Angeles Lakers $2,030,000
1987-88 Los Angeles Lakers $2,000,000
1988-89 Los Angeles Lakers $3,000,000
ps:我不确定该列总是在那个地方,但是锚中包含'联盟'a[contains(@href, 'league')]
答案 0 :(得分:2)
要排除第三列,请使用
tbody/tr/td[position()!=3]
要排除包含league
的链接,您可以使用
tbody/tr/td[not(contains(a/@href,'league'))]
答案 1 :(得分:2)
此XPath 2.0表达式:
for $i in 1 to count(/tbody/tr),
$r in /tbody/tr[$i],
$s in string-join($r/td[not(position() eq 3)]/normalize-space(.), ' ')
return
concat($s, '
')
在提供的XML文档上评估时:
<tbody>
<tr class="" data-row="0">
<td align="left">1985-86</td>
<td align="left"><a href="/teams/LAL/1986.html">Los Angeles Lakers</a></td>
<td align="left"><a href="/leagues/NBA_1986.html">NBA</a></td>
<td align="right" csk="2030000">$2,030,000</td>
</tr>
<tr class="" data-row="1">
<td align="left">1987-88</td>
<td align="left"><a href="/teams/LAL/1988.html">Los Angeles Lakers</a></td>
<td align="left"><a href="/leagues/NBA_1988.html">NBA</a></td>
<td align="right" csk="2000000">$2,000,000</td>
</tr>
<tr class="" data-row="2">
<td align="left">1988-89</td>
<td align="left"><a href="/teams/LAL/1989.html">Los Angeles Lakers</a></td>
<td align="left"><a href="/leagues/NBA_1989.html">NBA</a></td>
<td align="right" csk="3000000">$3,000,000</td>
</tr>
</tbody>
会产生想要的正确结果:
1985-86 Los Angeles Lakers $2,030,000
1987-88 Los Angeles Lakers $2,000,000
1988-89 Los Angeles Lakers $3,000,000
如果无法保证要排除的列位置,请使用:
for $i in 1 to count(/tbody/tr),
$r in /tbody/tr[$i],
$s in string-join($r/td[not(starts-with(a/@href,'/leagues'))]
/normalize-space(.), ' ')
return
concat($s, '
')