我正在使用一些快速脚本来读取表数据。页面上有多个表,它们看起来是动态加载的ajax,没有id可以使用xpath。我之前需要单元格的日期和我知道的下一个单元格<td><span style="">First Last</span></td>
之后的单元格中的文本将被修复。我需要确定的问题表是。
<table cellspacing="0" class="collections">
<thead>
<tr>
<td colspan="4" class="actionsWrapper">
<table cellpadding="0" cellspacing="0" width="100%">
<thead></thead>
<tbody>
<tr>
<td><span style="display: none;"><b>Current Group: </b> <span><select class="standard_input"></select></span> </span><span>(<font color="red"><span>2</span></font> Notes)</span><span style="display: none;"> <a href="javascript: void(null)"><font size="-2">Edit Group</font></a> | <span><a href="group_manager.php?type=12"><font id="create_group" size="-2">Create Group</font></a></span></span></td>
<td>
<div style="display: none;"><img src="include/images/loading_page.gif" height="70%"> <span style="font-size: .8em; font-weight: bold;">Retrieving Data...</span></div>
</td>
<td class="searchWrapper">
<table cellpadding="0" cellspacing="0">
<thead></thead>
<tbody>
<tr>
<td><input type="TEXT" class="keyword icon magnifying-glass unfocused"></td>
</tr>
<tr>
<td><span id="notesWrapper" style="display: none;"><label for="notesToggle">Search notes</label><input type="CHECKBOX" class="inpt_checkbox standard_input" id="notesToggle"></span></td>
</tr>
</tbody>
<tfoot></tfoot>
</table>
</td>
</tr>
</tbody>
<tfoot></tfoot>
</table>
</td>
</tr>
</thead>
<tbody>
<tr class="header">
<td class="utils"></td>
<td class="pointer bold" style="width: 200px;">Date</td>
<td class="pointer bold">Note</td>
<td class="pointer bold openArrow">Author</td>
</tr>
<tr class="data" style="cursor: default;">
<td class="actions"><input type="CHECKBOX" class="checkbox" style="display: none;"><a class="icon trashcan" title="Delete Note">Delete Note</a></td>
<td style="width: 200px;"><span style="">8/24/2011 12:00 PM</span></td>
<td><span style="">First Last</span></td>
<td><span style="">No answer - went to answering machine</span></td>
</tr>
<tr class="detailWrapper" style="display: none;"></tr>
<tr class="data" style="cursor: default;">
<td class="actions"><input type="CHECKBOX" class="checkbox" style="display: none;"><a class="icon trashcan" title="Delete Note">Delete Note</a></td>
<td style="width: 200px;"><span style="">8/26/2011 11:08 AM</span></td>
<td><span style="">First Last</span></td>
<td><span style="">Philip hardly comes into this store</span></td>
</tr>
<tr class="detailWrapper" style="display: none;"></tr>
</tbody>
<tfoot>
<tr style="display: none;"></tr>
<tr>
<td colspan="4">
<table width="100%" style="margin-top:5px;">
<tbody>
<tr>
<td align="left">
<div class="navigationPanel" style="display: none;"><a style="color: rgb(156, 156, 155); cursor: default;"><<</a> <a style="color: rgb(156, 156, 155); cursor: default;"><</a> Page: <input type="TEXT" class="inpt_text standard_input" size="2"><span> of 1 </span> <a style="cursor: default; color: rgb(156, 156, 155);">></a> <a style="cursor: default; color: rgb(156, 156, 155);">>></a></div>
</td>
<td align="right">
Entries Per Page:
<select>
<option value="10" selected="">10</option>
<option value="25">25</option>
<option value="50">50</option>
</select>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td colspan="4" align="left" style="margin-left: 2px;"><textarea style="width: 70%;"></textarea><input type="BUTTON" class="btn2" value="Add" style="width: 50px; margin-left: 10px;"></td>
</tr>
<tr style="display: none;">
<td colspan="4" class="groupActionsWrapper">
<div class="stepbar">Group Actions</div>
<br>
<table width="100%">
<tbody>
<tr>
<td style="padding-top:2px;width: 50px" align="right" valign="top">With </td>
<td style="width: 100px;" align="left" valign="top">
<select>
<option value="0">Selected</option>
<option value="1">All in group</option>
</select>
</td>
<td></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tfoot>
</table>
答案 0 :(得分:0)
如果您不熟悉模块re和/或html.parser,则可采用以下多种方式之一:
line_prev = ''
with open('29740695.htm') as f:
for line in f:
if line != ' <td><span style="">First Last</span></td>\n':
line_prev = line
continue
print(line_prev)
print(f.readline())