td元素的硒抛出错误

时间:2014-11-03 09:07:54

标签: python html selenium scrapy

我正在尝试处理以下HTML:

<tr style="background-color:LightCyan;">
            <td>
            <a href="/ecy/publications/SummaryPages/1409009.html" id="SubContent_GridViewPublicationList_A1_0">L & L Exxon: Interim Action Plan and SEPA DNS Available for Review and Comment</a>
        </td><td>14-09-009</td><td>October 2014</td>
        </tr><tr style="background-color:White;">
            <td>
            <a href="/ecy/publications/SummaryPages/1407028.html" id="SubContent_GridViewPublicationList_A1_1">Public Comment Notice: RockTenn Notice of Construction and SEPA Determination of Non-significance</a>
        </td><td>14-07-028</td><td>October 2014</td>
        </tr><tr style="background-color:LightCyan;">
            <td>
            <a href="/ecy/publications/SummaryPages/1406013.html" id="SubContent_GridViewPublicationList_A1_2">Rule Implementation Plan - Chapter 197-11 WAC, State Environmental Policy Act (SEPA) Rules</a>
        </td><td>14-06-013</td><td>April 2014</td>
        </tr><tr style="background-color:White;">
            <td>
            <a href="/ecy/publications/SummaryPages/1406012.html" id="SubContent_GridViewPublicationList_A1_3">Concise Explanatory Statement - Chapter 197-11 WAC, State Environmental Policy Act (SEPA) Rules</a>
        </td><td>14-06-012</td><td>April 2014</td>
        </tr><tr style="background-color:LightCyan;">
            <td>
            <a href="/ecy/publications/SummaryPages/1406011.html" id="SubContent_GridViewPublicationList_A1_4">Rule Adoption Notice</a>
        </td><td>14-06-011</td><td>April 2014</td>
        </tr><tr style="background-color:White;">
            <td>
            <a href="/ecy/publications/SummaryPages/1406010.html" id="SubContent_GridViewPublicationList_A1_5">Final Cost - Benefit and Least Burdensome Alternative Analyses</a>
        </td><td>14-06-010</td><td>April 2014</td>
        </tr><tr style="background-color:LightCyan;">
            <td>
            <a href="/ecy/publications/SummaryPages/1410050.html" id="SubContent_GridViewPublicationList_A1_6">Final Environmental Impact Statement: Management of <i>Zostera Japonica</i> on Commercial Clam Beds in Willapa Bay, Washington</a>
        </td><td>14-10-050</td><td>March 2014</td>
        </tr><tr style="background-color:White;">
            <td>
            <a href="/ecy/publications/SummaryPages/1307049.html" id="SubContent_GridViewPublicationList_A1_7">Public Comment Notice: Weyerhaeuser, Longview Notice of Construction Order SEPA Determination of Non-Significance</a>
        </td><td>13-07-049</td><td>December 2013</td>
        </tr><tr style="background-color:LightCyan;">
            <td>
            <a href="/ecy/publications/SummaryPages/1306004.html" id="SubContent_GridViewPublicationList_A1_8">Focus on SEPA Rulemaking - Updating the State Environmental Policy Act</a>
        </td><td>13-06-004</td><td>March 2013</td>
        </tr><tr style="background-color:White;">
            <td>
            <a href="/ecy/publications/SummaryPages/1309112.html" id="SubContent_GridViewPublicationList_A1_9">Port of Tacoma Kaiser: Interim Cleanup Plans and SEPA Forms Available for Public Comment</a>
        </td><td>13-09-112</td><td>January 2013</td>
        </tr><tr style="background-color:LightCyan;">
            <td>
            <a href="/ecy/publications/SummaryPages/1206021.html" id="SubContent_GridViewPublicationList_A1_10">Final Cost-Benefit and Least Burdensome Alternative Analyses Chapter 197-11 WAC</a>
        </td><td>12-06-021</td><td>December 2012</td>
        </tr><tr style="background-color:White;">
            <td>
            <a href="/ecy/publications/SummaryPages/1206020.html" id="SubContent_GridViewPublicationList_A1_11">SEPA Rule Adoption Notice</a>
        </td><td>12-06-020</td><td>December 2012</td>
        </tr><tr style="background-color:LightCyan;">
            <td>
            <a href="/ecy/publications/SummaryPages/1206017.html" id="SubContent_GridViewPublicationList_A1_12">SEPA Rule Implementation Plan</a>
        </td><td>12-06-017</td><td>December 2012</td>
        </tr><tr style="background-color:White;">
            <td>
            <a href="/ecy/publications/SummaryPages/1206016.html" id="SubContent_GridViewPublicationList_A1_13">SEPA Rule - Concise Explanatory Statement </a>
        </td><td>12-06-016</td><td>December 2012</td>
        </tr><tr style="background-color:LightCyan;">
            <td>
            <a href="/ecy/publications/SummaryPages/1206013.html" id="SubContent_GridViewPublicationList_A1_14">Preliminary Cost-Benefit and Least Burdensome Alternative Analyses, Chapter 197-11 WAC SEPA Rules</a>
        </td><td>12-06-013</td><td>November 2012</td>
        </tr><tr style="background-color:White;">
            <td>
            <a href="/ecy/publications/SummaryPages/1206009.html" id="SubContent_GridViewPublicationList_A1_15">Rule Proposal Notice, State Environmental Protection Act (SEPA)</a>
        </td><td>12-06-009</td><td>October 2012</td>
        </tr>

我试图使用selenium python提取td元素。我写了代码:

def parse(self, response):
        self.driver.get("https://fortress.wa.gov/ecy/publications/UIPages/PublicationList.aspx?IndexTypeName=Topic&NameValue=SEPA+(State+Environmental+Policy+Act)&DocumentTypeName=Publication")
        # dropdown=Select(self.driver.find_element_by_id("industrydrop"))
        # dropdown.select_by_index(4)
        # sleep(10)
        items = []
        sel = Selector(response)
        sHelper = StringHelper.getStrinHelperObject()
        dHelper = DateHelper.getDateHelperObject()
        sites = self.driver.find_elements_by_css_selector("table#SubContent_GridViewPublicationList tr")
        count = 0
        for site in sites:
            item = EKSpiderItem()
            item['docNumber'] = sHelper.processMyString(site.find_element_by_css_selector("td:nth-child(2)").text)
            item['title'] = sHelper.processMyString(site.find_element_by_css_selector("td:nth-child(1)").text)
            item['publicationDate'] = sHelper.processMyString(site.find_element_by_css_selector("td:nth-child(3)").text)
items.append(item)
return items.

但该程序正在抛出错误,如

Message: u'Unable to locate element: {"method":"css selector","selector":"td:nth-child(2)"}' 

我尝试了How to use find_element_by_link_text() properly to not raise NoSuchElementException?Unable to locate using find element by link的不同解决方案但是在这种情况下没有任何工作。

真诚地感谢任何帮助。感谢。

1 个答案:

答案 0 :(得分:0)

sites的第一个元素没有您要找的内容:

(Pdb) sites[0].text
u'Title (link to summary) Number Date (released or updated)'

将时间设置为:self.driver.implicitly_wait(0)

要么跳过第一个元素,要么处理它:

    for site in sites:
            try:
                  results = site.find_element_by_css_selector("td:nth-child(2)").text
                  print(results)
                  if "Unable to locate element" in results:
                            raise Exception(results) 
            except Exception,e:
                    print(e)
                    continue
    import pdb;pdb.set_trace()