从给定内容

时间:2015-05-09 18:49:05

标签: python html xml xpath html-parsing

这是表格格式的HTML:

<tr><td style="width: 150px;">Development Name:</td><td><b>Bellewoods</b></td></tr>
<tr><td style="width: 150px;">Property Type:</td><td><b>Executive Condominium</b></td></tr>
<tr><td style="width: 150px;">Developer:</td><td><b>Qingjian Realty (Woodlands) Pte Ltd</b></td></tr>
<tr><td style="width: 150px;">Tenure:</td><td><b>99-year Leasehold</b></td></tr>
<tr><td style="width: 150px;"># of Floors:</td><td><b>30</b></td></tr>
<tr><td style="width: 150px;"># of Units:</td><td><b>561</b></td></tr>

我想在csv binary中提取这些列:

Development Name,
Property Type,
Developer,
Tenure,
Floors,
Units

我正在使用此XPath,但它不起作用:

'//tr//td[@style="width: 150px;" and text()="Development Name:"]//td//b'

1 个答案:

答案 0 :(得分:1)

检查第一个td的文字并获取following td sibling

//tr/td[. = "Development Name:"]/following-sibling::td/b/text()