我正在尝试处理以下HTML:
<tr style="background-color:LightCyan;">
<td>
<a href="/ecy/publications/SummaryPages/1409009.html" id="SubContent_GridViewPublicationList_A1_0">L & L Exxon: Interim Action Plan and SEPA DNS Available for Review and Comment</a>
</td><td>14-09-009</td><td>October 2014</td>
</tr><tr style="background-color:White;">
<td>
<a href="/ecy/publications/SummaryPages/1407028.html" id="SubContent_GridViewPublicationList_A1_1">Public Comment Notice: RockTenn Notice of Construction and SEPA Determination of Non-significance</a>
</td><td>14-07-028</td><td>October 2014</td>
</tr><tr style="background-color:LightCyan;">
<td>
<a href="/ecy/publications/SummaryPages/1406013.html" id="SubContent_GridViewPublicationList_A1_2">Rule Implementation Plan - Chapter 197-11 WAC, State Environmental Policy Act (SEPA) Rules</a>
</td><td>14-06-013</td><td>April 2014</td>
</tr><tr style="background-color:White;">
<td>
<a href="/ecy/publications/SummaryPages/1406012.html" id="SubContent_GridViewPublicationList_A1_3">Concise Explanatory Statement - Chapter 197-11 WAC, State Environmental Policy Act (SEPA) Rules</a>
</td><td>14-06-012</td><td>April 2014</td>
</tr><tr style="background-color:LightCyan;">
<td>
<a href="/ecy/publications/SummaryPages/1406011.html" id="SubContent_GridViewPublicationList_A1_4">Rule Adoption Notice</a>
</td><td>14-06-011</td><td>April 2014</td>
</tr><tr style="background-color:White;">
<td>
<a href="/ecy/publications/SummaryPages/1406010.html" id="SubContent_GridViewPublicationList_A1_5">Final Cost - Benefit and Least Burdensome Alternative Analyses</a>
</td><td>14-06-010</td><td>April 2014</td>
</tr><tr style="background-color:LightCyan;">
<td>
<a href="/ecy/publications/SummaryPages/1410050.html" id="SubContent_GridViewPublicationList_A1_6">Final Environmental Impact Statement: Management of <i>Zostera Japonica</i> on Commercial Clam Beds in Willapa Bay, Washington</a>
</td><td>14-10-050</td><td>March 2014</td>
</tr><tr style="background-color:White;">
<td>
<a href="/ecy/publications/SummaryPages/1307049.html" id="SubContent_GridViewPublicationList_A1_7">Public Comment Notice: Weyerhaeuser, Longview Notice of Construction Order SEPA Determination of Non-Significance</a>
</td><td>13-07-049</td><td>December 2013</td>
</tr><tr style="background-color:LightCyan;">
<td>
<a href="/ecy/publications/SummaryPages/1306004.html" id="SubContent_GridViewPublicationList_A1_8">Focus on SEPA Rulemaking - Updating the State Environmental Policy Act</a>
</td><td>13-06-004</td><td>March 2013</td>
</tr><tr style="background-color:White;">
<td>
<a href="/ecy/publications/SummaryPages/1309112.html" id="SubContent_GridViewPublicationList_A1_9">Port of Tacoma Kaiser: Interim Cleanup Plans and SEPA Forms Available for Public Comment</a>
</td><td>13-09-112</td><td>January 2013</td>
</tr><tr style="background-color:LightCyan;">
<td>
<a href="/ecy/publications/SummaryPages/1206021.html" id="SubContent_GridViewPublicationList_A1_10">Final Cost-Benefit and Least Burdensome Alternative Analyses Chapter 197-11 WAC</a>
</td><td>12-06-021</td><td>December 2012</td>
</tr><tr style="background-color:White;">
<td>
<a href="/ecy/publications/SummaryPages/1206020.html" id="SubContent_GridViewPublicationList_A1_11">SEPA Rule Adoption Notice</a>
</td><td>12-06-020</td><td>December 2012</td>
</tr><tr style="background-color:LightCyan;">
<td>
<a href="/ecy/publications/SummaryPages/1206017.html" id="SubContent_GridViewPublicationList_A1_12">SEPA Rule Implementation Plan</a>
</td><td>12-06-017</td><td>December 2012</td>
</tr><tr style="background-color:White;">
<td>
<a href="/ecy/publications/SummaryPages/1206016.html" id="SubContent_GridViewPublicationList_A1_13">SEPA Rule - Concise Explanatory Statement </a>
</td><td>12-06-016</td><td>December 2012</td>
</tr><tr style="background-color:LightCyan;">
<td>
<a href="/ecy/publications/SummaryPages/1206013.html" id="SubContent_GridViewPublicationList_A1_14">Preliminary Cost-Benefit and Least Burdensome Alternative Analyses, Chapter 197-11 WAC SEPA Rules</a>
</td><td>12-06-013</td><td>November 2012</td>
</tr><tr style="background-color:White;">
<td>
<a href="/ecy/publications/SummaryPages/1206009.html" id="SubContent_GridViewPublicationList_A1_15">Rule Proposal Notice, State Environmental Protection Act (SEPA)</a>
</td><td>12-06-009</td><td>October 2012</td>
</tr>
我试图使用selenium python提取td元素。我写了代码:
def parse(self, response):
self.driver.get("https://fortress.wa.gov/ecy/publications/UIPages/PublicationList.aspx?IndexTypeName=Topic&NameValue=SEPA+(State+Environmental+Policy+Act)&DocumentTypeName=Publication")
# dropdown=Select(self.driver.find_element_by_id("industrydrop"))
# dropdown.select_by_index(4)
# sleep(10)
items = []
sel = Selector(response)
sHelper = StringHelper.getStrinHelperObject()
dHelper = DateHelper.getDateHelperObject()
sites = self.driver.find_elements_by_css_selector("table#SubContent_GridViewPublicationList tr")
count = 0
for site in sites:
item = EKSpiderItem()
item['docNumber'] = sHelper.processMyString(site.find_element_by_css_selector("td:nth-child(2)").text)
item['title'] = sHelper.processMyString(site.find_element_by_css_selector("td:nth-child(1)").text)
item['publicationDate'] = sHelper.processMyString(site.find_element_by_css_selector("td:nth-child(3)").text)
items.append(item)
return items.
但该程序正在抛出错误,如
Message: u'Unable to locate element: {"method":"css selector","selector":"td:nth-child(2)"}'
我尝试了How to use find_element_by_link_text() properly to not raise NoSuchElementException?和Unable to locate using find element by link的不同解决方案但是在这种情况下没有任何工作。
真诚地感谢任何帮助。感谢。
答案 0 :(得分:0)
sites
的第一个元素没有您要找的内容:
(Pdb) sites[0].text
u'Title (link to summary) Number Date (released or updated)'
将时间设置为:self.driver.implicitly_wait(0)
要么跳过第一个元素,要么处理它:
for site in sites:
try:
results = site.find_element_by_css_selector("td:nth-child(2)").text
print(results)
if "Unable to locate element" in results:
raise Exception(results)
except Exception,e:
print(e)
continue
import pdb;pdb.set_trace()