我有以下page的html。
<tbody><tr>
<td align="center" class="column_heading" width="200" title="The following are the Endorsements for the above license.">Endorsements</td><td align="center" class="column_heading" width="150" title="See Authorization Level Codes with their description at the bottom of the page.">Authorization Level(s) *</td></tr>
<tr><td align="center" bgcolor="#8AFF8A" class="section_detail">Health Education</td>
<td align="center" bgcolor="#FFFFCC" class="section_detail">HS</td></tr><tr><td align="center" bgcolor="#8AFF8A" class="section_detail">Physical Education</td>
<td align="center" bgcolor="#FFFFCC" class="section_detail">ML/HS
</td></tr></tbody>
<tbody><tr>
<td align="center" class="column_heading" width="200" title="The following are the Endorsements for the above license.">Endorsements</td><td align="center" class="column_heading" width="150" title="See Authorization Level Codes with their description at the bottom of the page.">Authorization Level(s) *</td></tr>
<tr><td align="center" bgcolor="#8AFF8A" class="section_detail">School Counselor</td>
<td align="center" bgcolor="#FFFFCC" class="section_detail">ML/HS C
</td></tr></tbody>
我想将第一个Endorsements
和Authorizations
下的信息放入一个拉链在一起的列表中,并将其与第二个表区分开来。
在列表中它看起来像这样:
['Health Education', 'HS', Physical Education', 'ML/HS\r'], ['School Counselor', 'ML/HS C\r']
。
我现在得到的是:
['Health Education', 'HS'], ['Physical Education', 'ML/HS\r'], ['School Counselor', 'ML/HS C\r']
。
我的代码的简短版本是:
test2 = tree.xpath(".//tr[td = 'Endorsements']/following-sibling::tr")
endorse1.append(test2)
答案 0 :(得分:1)
一种方法是使用everything=[]
for tr in tree.xpath("//tr[td[@class='section_detail']]"):
row={}
row['endorsement']=tr.xpath("td[@bgcolor='#8AFF8A']")
row['auth']=tr.xpath("td[@bgcolor='#FFFFCC']")
everything.append(row)
背景颜色,尝试将其剪掉,当您打印时,它应该以元组的形式返回您想要的信息。
some()
答案 1 :(得分:1)
您希望按每个表/ tbody对结果进行分组,因此首先获取tbody
的列表,然后为每个tbody
找到目标td
文本,例如:
>>> tables = tree.xpath("//tbody[tr/td = 'Endorsements']")
>>> result = [t.xpath("tr[td = 'Endorsements']/following-sibling::tr/td/text()") \
... for t in tables]
...
>>> print result
[['Health Education', 'HS', 'Physical Education', 'ML/HS'], ['School Counselor', 'ML/HS C']]