xpath跟随兄弟并将表中的元素分组

时间:2016-05-18 20:50:00

标签: python-2.7 xpath

我有以下page的html。

<tbody><tr>
<td align="center" class="column_heading" width="200" title="The following are the Endorsements for the above license.">Endorsements</td><td align="center" class="column_heading" width="150" title="See Authorization Level Codes with their description at the bottom of the page.">Authorization Level(s) *</td></tr>
<tr><td align="center" bgcolor="#8AFF8A" class="section_detail">Health Education</td>
<td align="center" bgcolor="#FFFFCC" class="section_detail">HS</td></tr><tr><td align="center" bgcolor="#8AFF8A" class="section_detail">Physical Education</td>
<td align="center" bgcolor="#FFFFCC" class="section_detail">ML/HS
</td></tr></tbody>

<tbody><tr>
<td align="center" class="column_heading" width="200" title="The following are the Endorsements for the above license.">Endorsements</td><td align="center" class="column_heading" width="150" title="See Authorization Level Codes with their description at the bottom of the page.">Authorization Level(s) *</td></tr>
<tr><td align="center" bgcolor="#8AFF8A" class="section_detail">School Counselor</td>
<td align="center" bgcolor="#FFFFCC" class="section_detail">ML/HS C
</td></tr></tbody>

我想将第一个EndorsementsAuthorizations下的信息放入一个拉链在一起的列表中,并将其与第二个表区分开来。

在列表中它看起来像这样: ['Health Education', 'HS', Physical Education', 'ML/HS\r'], ['School Counselor', 'ML/HS C\r']

我现在得到的是: ['Health Education', 'HS'], ['Physical Education', 'ML/HS\r'], ['School Counselor', 'ML/HS C\r']

我的代码的简短版本是:

test2 = tree.xpath(".//tr[td = 'Endorsements']/following-sibling::tr")
endorse1.append(test2)

2 个答案:

答案 0 :(得分:1)

一种方法是使用everything=[] for tr in tree.xpath("//tr[td[@class='section_detail']]"): row={} row['endorsement']=tr.xpath("td[@bgcolor='#8AFF8A']") row['auth']=tr.xpath("td[@bgcolor='#FFFFCC']") everything.append(row) 背景颜色,尝试将其剪掉,当您打印时,它应该以元组的形式返回您想要的信息。

some()

答案 1 :(得分:1)

您希望按每个表/ tbody对结果进行分组,因此首先获取tbody的列表,然后为每个tbody找到目标td文本,例如:

>>> tables = tree.xpath("//tbody[tr/td = 'Endorsements']")
>>> result = [t.xpath("tr[td = 'Endorsements']/following-sibling::tr/td/text()") \
...             for t in tables]
... 
>>> print result
[['Health Education', 'HS', 'Physical Education', 'ML/HS'], ['School Counselor', 'ML/HS C']]