按已知单元格查找表格,并检索前后单元格的内容

时间:2015-04-20 06:30:50

标签: python html selenium selenium-webdriver

我正在使用一些快速脚本来读取表数据。页面上有多个表,它们看起来是动态加载的ajax,没有id可以使用xpath。我之前需要单元格的日期和我知道的下一个单元格<td><span style="">First Last</span></td>之后的单元格中的文本将被修复。我需要确定的问题表是。

<table cellspacing="0" class="collections">
    <thead>
        <tr>
            <td colspan="4" class="actionsWrapper">
                <table cellpadding="0" cellspacing="0" width="100%">
                    <thead></thead>
                    <tbody>
                        <tr>
                            <td><span style="display: none;"><b>Current Group: </b> <span><select class="standard_input"></select></span>&nbsp;&nbsp; </span><span>(<font color="red"><span>2</span></font> Notes)</span><span style="display: none;">&nbsp;&nbsp;&nbsp;<a href="javascript: void(null)"><font size="-2">Edit Group</font></a> |  <span><a href="group_manager.php?type=12"><font id="create_group" size="-2">Create Group</font></a></span></span></td>
                            <td>
                                <div style="display: none;"><img src="include/images/loading_page.gif" height="70%"> <span style="font-size: .8em; font-weight: bold;">Retrieving Data...</span></div>
                            </td>
                            <td class="searchWrapper">
                                <table cellpadding="0" cellspacing="0">
                                    <thead></thead>
                                    <tbody>
                                        <tr>
                                            <td><input type="TEXT" class="keyword icon magnifying-glass unfocused"></td>
                                        </tr>
                                        <tr>
                                            <td><span id="notesWrapper" style="display: none;"><label for="notesToggle">Search notes</label><input type="CHECKBOX" class="inpt_checkbox standard_input" id="notesToggle"></span></td>
                                        </tr>
                                    </tbody>
                                    <tfoot></tfoot>
                                </table>
                            </td>
                        </tr>
                    </tbody>
                    <tfoot></tfoot>
                </table>
            </td>
        </tr>
    </thead>
    <tbody>
        <tr class="header">
            <td class="utils"></td>
            <td class="pointer bold" style="width: 200px;">Date</td>
            <td class="pointer bold">Note</td>
            <td class="pointer bold openArrow">Author</td>
        </tr>
        <tr class="data" style="cursor: default;">
            <td class="actions"><input type="CHECKBOX" class="checkbox" style="display: none;"><a class="icon trashcan" title="Delete Note">Delete Note</a></td>
            <td style="width: 200px;"><span style="">8/24/2011 12:00 PM</span></td>
            <td><span style="">First Last</span></td>
            <td><span style="">No answer - went to answering machine</span></td>
        </tr>
        <tr class="detailWrapper" style="display: none;"></tr>
        <tr class="data" style="cursor: default;">
            <td class="actions"><input type="CHECKBOX" class="checkbox" style="display: none;"><a class="icon trashcan" title="Delete Note">Delete Note</a></td>
            <td style="width: 200px;"><span style="">8/26/2011 11:08 AM</span></td>
            <td><span style="">First Last</span></td>
            <td><span style="">Philip hardly comes into this store</span></td>
        </tr>
        <tr class="detailWrapper" style="display: none;"></tr>
    </tbody>
    <tfoot>
        <tr style="display: none;"></tr>
        <tr>
            <td colspan="4">
                <table width="100%" style="margin-top:5px;">
                    <tbody>
                        <tr>
                            <td align="left">
                                <div class="navigationPanel" style="display: none;"><a style="color: rgb(156, 156, 155); cursor: default;">&lt;&lt;</a>  <a style="color: rgb(156, 156, 155); cursor: default;">&lt;</a>  Page: <input type="TEXT" class="inpt_text standard_input" size="2"><span> of 1 </span>  <a style="cursor: default; color: rgb(156, 156, 155);">&gt;</a>  <a style="cursor: default; color: rgb(156, 156, 155);">&gt;&gt;</a></div>
                            </td>
                            <td align="right">
                                Entries Per Page: 
                                <select>
                                    <option value="10" selected="">10</option>
                                    <option value="25">25</option>
                                    <option value="50">50</option>
                                </select>
                            </td>
                        </tr>
                    </tbody>
                </table>
            </td>
        </tr>
        <tr>
            <td colspan="4" align="left" style="margin-left: 2px;"><textarea style="width: 70%;"></textarea><input type="BUTTON" class="btn2" value="Add" style="width: 50px; margin-left: 10px;"></td>
        </tr>
        <tr style="display: none;">
            <td colspan="4" class="groupActionsWrapper">
                <div class="stepbar">Group Actions</div>
                <br>
                <table width="100%">
                    <tbody>
                        <tr>
                            <td style="padding-top:2px;width: 50px" align="right" valign="top">With </td>
                            <td style="width: 100px;" align="left" valign="top">
                                <select>
                                    <option value="0">Selected</option>
                                    <option value="1">All in group</option>
                                </select>
                            </td>
                            <td></td>
                        </tr>
                    </tbody>
                </table>
            </td>
        </tr>
    </tfoot>
</table>

1 个答案:

答案 0 :(得分:0)

如果您不熟悉模块re和/或html.parser,则可采用以下多种方式之一:

line_prev = ''
with open('29740695.htm') as f:
    for line in f:
        if line != '            <td><span style="">First Last</span></td>\n':
            line_prev = line
            continue
        print(line_prev)
        print(f.readline())