如何在Selenium中导航到新网页?

时间:2014-07-16 08:36:07

标签: python-2.7 selenium

我有以下代码:

driver.get(<some url>)
for element in driver.find_elements_by_class_name('thumbnail'):
    element.find_element_by_xpath(".//a").click() #this works and navigates to new page
    element.find_element_by_link_text('Click here').click() #this doesn't

需要通过点击指向新页面的缩略图来导航以下HTML(当然简化),然后需要点击该新页面中的Click here链接:

<!DOCTYPE html>
<html lang="en-US" prefix="og: http://iuytp.me/ns# fb: http://iuytp.me/ns/fb#">
<head>
<meta charset="UTF-8" />
<title>Releases</title>
</head>

<body class="archive category category-releases category-4 custom-background">
    <div id="main">
        <div id="container" class="one-column">
            <div id="content" role="main">

                <h1 class="page-title">Releases</h1>

            <div id="thumbnail-post-display">
        <div id="thumbnail-post" class="post-7158 post type-post status-publish format-standard has-post-thumbnail hentry category-blog category-designer category-releases category-uncategorized">
            <div class="thumbnail"><a href="http://records.net/uncategorized/designer-7-inch-bufu-records-co-release/" title="Permanent link to Designer &#8211; 7 inch" rel="bookmark"><img width="300" height="300" src="http://records.net/dev/wp-content/uploads/2014/05/dboypledge32-300x300.png" class="attachment-thumbnail wp-post-image" alt="dboypledge3" /></a></div>
            <h2><a href="http://records.net/uncategorized/designer-7-inch-bufu-records-co-release/" title="Permanent link to Designer &#8211; 7 inch" rel="bookmark">Designer &#8211; 7 inch</a></h2>
        </div>
    </div><!--end thumbnail post display-->

            <div id="thumbnail-post-display">
        <div id="thumbnail-post" class="post-7107 post type-post status-publish format-standard has-post-thumbnail hentry category-blog category-releases">
            <div class="thumbnail"><a href="http://records.net/releases/people-2014-tour-demos/" title="Permanent link to All My People &#8211; 2014 Tour Demos" rel="bookmark"><img width="300" height="300" src="http://records.net/dev/wp-content/uploads/2014/04/01_Doubt-mp3-image-300x300.png" class="attachment-thumbnail wp-post-image" alt="" /></a></div>
            <h2><a href="http://records.net/releases/people-2014-tour-demos/" title="Permanent link to All My People &#8211; 2014 Tour Demos" rel="bookmark">All My People &#8211; 2014 Tour Demos</a></h2>
        </div>
    </div><!--end thumbnail post display-->

            <div id="thumbnail-post-display">
        <div id="thumbnail-post" class="post-7089 post type-post status-publish format-standard has-post-thumbnail hentry category-blog category-releases">
            <div class="thumbnail"><a href="http://records.net/releases/sirens-blossom-talk/" title="Permanent link to Syrins &#8211; Boss Talk" rel="bookmark"><img width="300" height="300" src="http://records.net/dev/wp-content/uploads/2014/04/sirens_final_smaller-300x300.jpg" class="attachment-thumbnail wp-post-image" alt="sirens_final_smaller" /></a></div>
            <h2><a href="http://records.net/releases/sirens-blossom-talk/" title="Permanent link to Syrins &#8211; Boss Talk" rel="bookmark">Syrins &#8211; Boss Talk</a></h2>
        </div>
    </div><!--end thumbnail post display-->

            <div id="thumbnail-post-display">
        <div id="thumbnail-post" class="post-7073 post type-post status-publish format-standard has-post-thumbnail hentry category-blog category-releases">
            <div class="thumbnail"><a href="http://records.net/releases/worlds-strongest-man-scares/" title="Permanent link to World&#8217;s Tough Man &#8211; Sorry Scares You" rel="bookmark"><img width="300" height="300" src="http://records.net/dev/wp-content/uploads/2014/03/a2312749950_10-300x300.jpg" class="attachment-thumbnail wp-post-image" alt="a2312749950_10" /></a></div>
            <h2><a href="http://records.net/releases/worlds-strongest-man-scares/" title="Permanent link to World&#8217;s Tough Man &#8211; Sorry Scares You" rel="bookmark">World&#8217;s Tough Man &#8211; Sorry Scares You</a></h2>
        </div>
    </div><!--end thumbnail post display-->

            <div id="thumbnail-post-display">
        <div id="thumbnail-post" class="post-7046 post type-post status-publish format-standard has-post-thumbnail hentry category-blog category-releases">
            <div class="thumbnail"><a href="http://records.net/releases/sundog-space-criminal/" title="Permanent link to Dog &#8211; Space Criminal" rel="bookmark"><img width="300" height="300" src="http://records.net/dev/wp-content/uploads/2014/03/Sundog_cover_high_res-300x300.jpg" class="attachment-thumbnail wp-post-image" alt="dog_cover_high_res" /></a></div>
            <h2><a href="http://records.net/releases/sundog-space-criminal/" title="Permanent link to Dog &#8211; Space Criminal" rel="bookmark">Dog &#8211; Space Criminal</a></h2>
        </div>
    </div><!--end thumbnail post display-->

<div style="clear:both"></div>


        </div><!-- #container -->

    </div><!-- #main -->
</div><!-- #wrapper -->

</div><!--#bg-wrapper-->

</body>
</html>

但我的代码会吐出以下错误:

Traceback (most recent call last):
  ...
  File "crawler.py", line 17, in main
    driver.find_element_by_link_text('Click here').click()
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 254, in find_element_by_link_text
    return self.find_element(by=By.LINK_TEXT, value=link_text)
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 662, in find_element
    {'using': by, 'value': value})['value']
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 173, in execute
    self.error_handler.check_response(response)
  File "/Library/Python/2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 164, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: u'no such element\n  (Session info: chrome=35.0.1916.153)\n  (Driver info: chromedriver=2.10.267517,platform=Mac OS X 10.9.3 x86_64)' 

问题似乎是元素没有使用新页面的内容进行更新。用element替换有问题的行driver也不起作用。我做错了什么?

请注意,我必须能够为所有缩略图执行此操作(因此for循环)。

2 个答案:

答案 0 :(得分:9)

事实证明,您需要提前存储要导航的链接。这最终为我工作(发现this thread有帮助):

driver.get(<some url>)
elements = driver.find_elements_by_xpath("//h2/a")

links = []
for i in range(len(elements)):
    links.append(elements[i].get_attribute('href'))

for link in links:
    print 'navigating to: ' + link
    driver.get(link)

    # do stuff within that page here...

    driver.back()

答案 1 :(得分:2)

你的第一行:

for element in driver.find_elements_by_class_name('thumbnail'):

抓取第一个页面上的所有元素。你的下一行:

element.find_element_by_xpath(".//a").click() #this works and navigates to new page
正如您在评论中指出的那样,

转换为完全页面。此时element已消失,因此下一行:

element.find_element_by_link_text('Click here').click() #this doesn't

没有机会做任何事情,因为它指的是不存在的东西。这正是NoSuchElementException告诉你的。

您需要从头开始,例如:

driver.find_element_by_link_text('Click here').click()

其他回答:

要解决您的迭代困境,您可以采取以下方法 - 请注意我不熟悉python语法!下面是Groovy语法,你必须将它调整为Python!

// first count the number links you are going to hit; no point in storing this WebElement,
// since it will be gone after we navigate to the first page
def linkCount = driver.findElements(By.className("thumbnail")).size()
// Start a loop based on the count. Inside the loop we are going to have to find each of
// the links again, based on this count. I am going to use XPath; this can probably be done
// on CSS as well. Remember that XPath is 1-based!
(1..linkCount).each {

    // find the element again
    driver.findElement(By.xpath("//div[@class='thumbnail'][$it]/a")).click()

    // do something on the new page ...

    // and go back
    driver.navigate().back()
}