我使用Xpath抓取网页,我需要将存款写为数字。 存款需要("每月租金" x"预付租金金额") 结果应该是:在这种情况下15450
<table>
<tr>
<td>monthly rent: </td>
<td>5.150,00</td>
</tr>
<tr>
<td>deposit: </td>
<td>3 mdr.</td>
</tr>
</table>
我目前正在使用以下XPath来查找信息:
//td[contains(.,'Depositum') or contains(.,'Husleje ')]/following-sibling::td/text()
但我不知道如何删除&#34; mdr。&#34;从存款,以及如何将数字乘以数字,只返回1数字到数据库。
答案 0 :(得分:3)
您可以使用以下与XPath 1.0及更高版本兼容的查询:
substring-before(//td[contains(.,'deposit:')]/following-sibling::td/text(), ' mdr.') * translate(//td[contains(.,'monthly rent:')]/following-sibling::td/text(), ',.', '') div 100
输出:
15450
逐步说明:
// Get the deposit and remove mdr. from it using substring-before
substring-before(//td[contains(.,'deposit:')]/following-sibling::td/text(), ' mdr.')
// Arithmetic multiply operator
*
// The number format 5.150,00 can't be used for arithmetic calculations.
// Therefore we get the monthly rent and remove . and , chars from it.
// Note that this is equal to multiply it by factor 100. That's why we divide
// by 100 later on.
translate(//td[contains(.,'monthly rent:')]/following-sibling::td/text(), ',.', '')
// Divide by 100
div 100
您可以参考List of Functions and Operators supported by XPath 1.0 and 2.0
答案 1 :(得分:0)
Pure XPath解决方案:
translate(
/table/tr/td[contains(., 'monthly rent')]/following-sibling::td[1],
',.',
'.'
)
*
substring-before(
/table/tr/td[contains(., 'deposit')]/following-sibling::td[1],
' mdr'
)
似乎我得到的解决方案非常类似于hek2mgl的正确答案,但不需要用100除以(逗号转换为点,删除点)和包含数字数据的<td>
元素具有位置谓词如果实际表不像给定示例那么简单,则为了避免匹配更多元素。 XPath数字格式要求小数点分隔符为点而不是千位分隔符。