我有以下代码:
driver.get(<some url>)
for element in driver.find_elements_by_class_name('thumbnail'):
element.find_element_by_xpath(".//a").click() #this works and navigates to new page
element.find_element_by_link_text('Click here').click() #this doesn't
需要通过点击指向新页面的缩略图来导航以下HTML(当然简化),然后需要点击该新页面中的Click here
链接:
<!DOCTYPE html>
<html lang="en-US" prefix="og: http://iuytp.me/ns# fb: http://iuytp.me/ns/fb#">
<head>
<meta charset="UTF-8" />
<title>Releases</title>
</head>
<body class="archive category category-releases category-4 custom-background">
<div id="main">
<div id="container" class="one-column">
<div id="content" role="main">
<h1 class="page-title">Releases</h1>
<div id="thumbnail-post-display">
<div id="thumbnail-post" class="post-7158 post type-post status-publish format-standard has-post-thumbnail hentry category-blog category-designer category-releases category-uncategorized">
<div class="thumbnail"><a href="http://records.net/uncategorized/designer-7-inch-bufu-records-co-release/" title="Permanent link to Designer – 7 inch" rel="bookmark"><img width="300" height="300" src="http://records.net/dev/wp-content/uploads/2014/05/dboypledge32-300x300.png" class="attachment-thumbnail wp-post-image" alt="dboypledge3" /></a></div>
<h2><a href="http://records.net/uncategorized/designer-7-inch-bufu-records-co-release/" title="Permanent link to Designer – 7 inch" rel="bookmark">Designer – 7 inch</a></h2>
</div>
</div><!--end thumbnail post display-->
<div id="thumbnail-post-display">
<div id="thumbnail-post" class="post-7107 post type-post status-publish format-standard has-post-thumbnail hentry category-blog category-releases">
<div class="thumbnail"><a href="http://records.net/releases/people-2014-tour-demos/" title="Permanent link to All My People – 2014 Tour Demos" rel="bookmark"><img width="300" height="300" src="http://records.net/dev/wp-content/uploads/2014/04/01_Doubt-mp3-image-300x300.png" class="attachment-thumbnail wp-post-image" alt="" /></a></div>
<h2><a href="http://records.net/releases/people-2014-tour-demos/" title="Permanent link to All My People – 2014 Tour Demos" rel="bookmark">All My People – 2014 Tour Demos</a></h2>
</div>
</div><!--end thumbnail post display-->
<div id="thumbnail-post-display">
<div id="thumbnail-post" class="post-7089 post type-post status-publish format-standard has-post-thumbnail hentry category-blog category-releases">
<div class="thumbnail"><a href="http://records.net/releases/sirens-blossom-talk/" title="Permanent link to Syrins – Boss Talk" rel="bookmark"><img width="300" height="300" src="http://records.net/dev/wp-content/uploads/2014/04/sirens_final_smaller-300x300.jpg" class="attachment-thumbnail wp-post-image" alt="sirens_final_smaller" /></a></div>
<h2><a href="http://records.net/releases/sirens-blossom-talk/" title="Permanent link to Syrins – Boss Talk" rel="bookmark">Syrins – Boss Talk</a></h2>
</div>
</div><!--end thumbnail post display-->
<div id="thumbnail-post-display">
<div id="thumbnail-post" class="post-7073 post type-post status-publish format-standard has-post-thumbnail hentry category-blog category-releases">
<div class="thumbnail"><a href="http://records.net/releases/worlds-strongest-man-scares/" title="Permanent link to World’s Tough Man – Sorry Scares You" rel="bookmark"><img width="300" height="300" src="http://records.net/dev/wp-content/uploads/2014/03/a2312749950_10-300x300.jpg" class="attachment-thumbnail wp-post-image" alt="a2312749950_10" /></a></div>
<h2><a href="http://records.net/releases/worlds-strongest-man-scares/" title="Permanent link to World’s Tough Man – Sorry Scares You" rel="bookmark">World’s Tough Man – Sorry Scares You</a></h2>
</div>
</div><!--end thumbnail post display-->
<div id="thumbnail-post-display">
<div id="thumbnail-post" class="post-7046 post type-post status-publish format-standard has-post-thumbnail hentry category-blog category-releases">
<div class="thumbnail"><a href="http://records.net/releases/sundog-space-criminal/" title="Permanent link to Dog – Space Criminal" rel="bookmark"><img width="300" height="300" src="http://records.net/dev/wp-content/uploads/2014/03/Sundog_cover_high_res-300x300.jpg" class="attachment-thumbnail wp-post-image" alt="dog_cover_high_res" /></a></div>
<h2><a href="http://records.net/releases/sundog-space-criminal/" title="Permanent link to Dog – Space Criminal" rel="bookmark">Dog – Space Criminal</a></h2>
</div>
</div><!--end thumbnail post display-->
<div style="clear:both"></div>
</div><!-- #container -->
</div><!-- #main -->
</div><!-- #wrapper -->
</div><!--#bg-wrapper-->
</body>
</html>
但我的代码会吐出以下错误:
Traceback (most recent call last):
...
File "crawler.py", line 17, in main
driver.find_element_by_link_text('Click here').click()
File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 254, in find_element_by_link_text
return self.find_element(by=By.LINK_TEXT, value=link_text)
File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 662, in find_element
{'using': by, 'value': value})['value']
File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 173, in execute
self.error_handler.check_response(response)
File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 164, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: u'no such element\n (Session info: chrome=35.0.1916.153)\n (Driver info: chromedriver=2.10.267517,platform=Mac OS X 10.9.3 x86_64)'
问题似乎是元素没有使用新页面的内容进行更新。用element
替换有问题的行driver
也不起作用。我做错了什么?
请注意,我必须能够为所有缩略图执行此操作(因此for
循环)。
答案 0 :(得分:9)
事实证明,您需要提前存储要导航的链接。这最终为我工作(发现this thread有帮助):
driver.get(<some url>)
elements = driver.find_elements_by_xpath("//h2/a")
links = []
for i in range(len(elements)):
links.append(elements[i].get_attribute('href'))
for link in links:
print 'navigating to: ' + link
driver.get(link)
# do stuff within that page here...
driver.back()
答案 1 :(得分:2)
你的第一行:
for element in driver.find_elements_by_class_name('thumbnail'):
抓取第一个页面上的所有元素。你的下一行:
element.find_element_by_xpath(".//a").click() #this works and navigates to new page
正如您在评论中指出的那样,转换为完全新页面。此时element
已消失,因此下一行:
element.find_element_by_link_text('Click here').click() #this doesn't
没有机会做任何事情,因为它指的是不存在的东西。这正是NoSuchElementException
告诉你的。
您需要从头开始,例如:
driver.find_element_by_link_text('Click here').click()
其他回答:
要解决您的迭代困境,您可以采取以下方法 - 请注意我不熟悉python语法!下面是Groovy语法,你必须将它调整为Python!
// first count the number links you are going to hit; no point in storing this WebElement,
// since it will be gone after we navigate to the first page
def linkCount = driver.findElements(By.className("thumbnail")).size()
// Start a loop based on the count. Inside the loop we are going to have to find each of
// the links again, based on this count. I am going to use XPath; this can probably be done
// on CSS as well. Remember that XPath is 1-based!
(1..linkCount).each {
// find the element again
driver.findElement(By.xpath("//div[@class='thumbnail'][$it]/a")).click()
// do something on the new page ...
// and go back
driver.navigate().back()
}