切换页面时,请浏览selenium webdriver对象列表

时间:2016-05-19 21:27:08

标签: jquery python selenium web-scraping

我在尝试将selenium的逻辑组合在一起时会遇到一些麻烦,例如无序列表,单击一个移动到另一个页面然后移回原始页面的链接继续列表中的下一个项目。 起初我有错误的元素是陈旧的,但能够找到一些看起来当浏览器破坏当前页面上的元素时,它们会变得陈旧。因此,我试图通过以下方式解决这个问题:

list3 = driver.find_elements_by_xpath("//*[@id='treeContainer']//a[starts-with(@id, 's')]")
tempList3 = {}
for entry in list3:
    tempList3[entry.get_attribute("id")] = entry.text
    surveyNum = entry.get_attribute("id")
    print(entry.text, entry.tag_name)
    subList3 = driver.find_elements_by_css_selector("#listContainer > ul > *")
    print("sublist", len(subList3))
    tempMem = {}
    for each in subList3:
        print(each.get_attribute("id"), each.text)
        tempMem[each.get_attribute("id")] = each.text
        reportNum = each.get_attribute("id")
        execute_click(driver, "#listContainer > ul a")
        element = WebDriverWait(driver, 20).until(
            lambda s: s.execute_script("return jQuery.active == 0"))
        if element:
            element = WebDriverWait(driver, 5).until(
            EC.element_to_be_clickable((By.LINK_TEXT, "Export to CSV")))
            element.click()
        element = WebDriverWait(driver, 20).until(
            lambda s: s.execute_script("return jQuery.active == 0"))
        if element:
            csvRadio = driver.find_element_by_css_selector("#exportValuesLabelsCSV3.radio")
            csvRadio.click()
        else:
            continue
        csvDownload = driver.find_element_by_css_selector(
            "#butExportToCSV > table > tbody > tr > td:nth-child(1) > div > button")
        csvDownload.click()
        element = WebDriverWait(driver, 20).until(
           EC.text_to_be_present_in_element((By.CSS_SELECTOR, '#progress_csv'),
                                                  "Export completed! Please click here if nothing happens"))
        driver.find_element_by_xpath("//*[@id='emptySel']/a").click()
        subList3.clear()
        subList3 = driver.find_elements_by_css_selector("#listContainer > ul > *")
        for items in subList3:
            print("subList3", items.text)
            if reportNum in tempMem:
                if tempMem.get(reportNum) in items.text:
                    subList3.remove(items)
                    print("Item removed, items left:", len(subList3))
                else:
                    continue
            continue
        else:
            continue
    tempMem.clear()
    list3.clear()
    list3 = driver.find_elements_by_xpath("//*[@id='treeContainer']//a[starts-with(@id, 's')]")
    for listed in list3:
        print("list3", listed.text)
        if surveyNum in tempList3:
            if tempList3.get(surveyNum) in listed.text:
                list3.remove(listed)
            else:
                continue
        else:
            continue
    continue
tempList3.clear()

运行之后,我没有收到任何错误,但似乎它没有循环......我忽略了什么?

1 个答案:

答案 0 :(得分:1)

如果没有完整的上下文,我很难遵循您的整体代码。也就是说,根据您的初步描述,我会使用css路径和nth-child()的:

让我们说你想找到Hacker New主页上的每个链接,然后逐个点击每个链接,点击链接点击之间的后退按钮。

Hacker New的HTML看起来像这样(截至2016年5月):

<tbody>
      <tr class="athing">
            <td align="right" valign="top" class="title”>…</td>
            <td valign="top" class="votelinks”>…</td>
            <td class="title">
                  <span class="deadmark"></span>
                  <a href="https://github.com/BYVoid/Batsh">A language …</a>
                  <span class="sitebit comhead”>…</span>
            </td>
      </tr>
      <tr>…</tr>
      <tr class="spacer" style="height:5px"></tr>
      <tr class="athing">
            <td align="right" valign="top" class="title”>…</td>
            <td valign="top" class="votelinks”>…</td>
            <td class="title">
                  <span class="deadmark"></span>
                  <a href="https://chrome.googleblog.com/2016/05/the-google-play-store-coming-to.html">Play Store…</a>
                  <span class="sitebit comhead”>…</span>
            </td>
      </tr>
      …
</tbody>

查找锚元素并逐个单击它们的示例代码:

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://news.ycombinator.com/")

try:
    # The path to the links you want
    base_css_path = "tr.athing td.title > a"

    # Find and get a count for the number of links you will be clicking
    # Note the plural 'elements'
    num_elems = len(driver.find_elements_by_css_selector(base_css_path))

    # CSS path for finding individual elements
    ind_css_path = "tbody tr:nth-child({0}) td.title > a"

    # Starting with an index of 1, we want every 3rd tr child
    # Looking at Hacker News' structure, we know there are 3 total tr elements
    # associated with each tr we actually want, so we must multiply our total
    # element count by 3, and then use a step size of 3
    for index in range(1, num_elems * 3, 3):
        # Use the direct css path to acquire the specific element and click it
        driver.find_element_by_css_selector(ind_css_path.format(index)).click()

        # Redirect happens
        # Do whatever you need to do here

        # Return to the previous page
        driver.back()
finally:
    driver.quit()