Question

我在实现等同于do while循环的东西时遇到了问题。

问题说明

我正在抓一个网站，结果页面是分页的，即

1, 2, 3, 4, 5, .... NEXT

我正在使用NEXT链接存在的测试条件来遍历页面。如果有一个结果页面，则没有NEXT链接，所以我只会抓第一页。如果有多个页面，则最后一页也没有NEXT链接。所以刮板功能也适用于该页面。抓取功能称为findRecords()

所以我使用以下方法隔离了我的NEXT链接：

next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")

所以我想运行一个执行刮擦的循环至少一次（当有一个或多个结果页面时）。我也使用click（）函数单击NEXT按钮。我到目前为止的代码是：

while True:
    findRecords()
    next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
    if not next_link:
        break
    next_link.click()

这不起作用。嗯，它有效并且它会刮擦但是当它到达最后一页时，它会给我一个NoSuchElementException，如下所示：

追踪（最近一次通话）：文件“try.py”，第47行，in next_link = driver.find_element（By.XPATH，“// a [contains（text（），'Next'）] [@ style ='text-decoration：underline; cursor：pointer;']”）在find_element中输入文件“/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py”，第752行 '价值'：价值}）['价值'] 文件“/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py”，第236行，执行 self.error_handler.check_response（响应）在check_response中输入文件“/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py”，第192行提出exception_class（消息，屏幕，堆栈跟踪） selenium.common.exceptions.NoSuchElementException：消息：没有这样的元素：无法定位元素：{“method”：“xpath”，“selector”：“// a [contains（text（），'Next'）] [@ style ='text-decoration：underline; cursor：pointer;']“} （会议信息：chrome = 53.0.2785.89）（驱动程序信息：chromedriver = 2.20.353124（035346203162d32c80f1dce587c8154a1efa0c3b），platform = Linux 3.13.0-92-generic x86_64）

我知道该元素在最后一页上不存在，因为正如我之前所说，最后一页上不存在NEXT元素。

那么我如何修复我的while循环，以便能够在条件不正确时刮掉单个页面结果和/或最后一页，并且优雅地突破while循环而不会给我这个可怕的错误？

PS：除了上面的while循环，我还尝试了以下内容：

is_continue = True
while is_continue:
    findRecords()
    next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
    if next_link:
        is_continue = True
        next_link.click()
    else:
        is_continue = False

如果有任何帮助，这里也是我的刮刀功能findRecords()：

def findRecords():
    filename = "sam_" + letter + ".csv"
    bsObj = BeautifulSoup(driver.page_source, "html.parser")
    tableList = bsObj.find_all("table", {"class":"width100 menu_header_top_emr"}) 
    tdList = bsObj.find_all("td", {"class":"menu_header width100"})

    for table,td in zip(tableList,tdList):
            a = table.find_all("span", {"class":"results_body_text"})
            b = td.find_all("span", {"class":"results_body_text"})
            with open(filename, "a") as csv_file:
                csv_file.write(', '.join(tag.get_text().strip() for tag in a+b) +'\n')

Answer 1

当你在find_elements中搜索下一个链接更改代码时，如果存在Next，则返回大小为1的列表，否则大小为0的列表，但没有例外。

next_link = driver.find_elements(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")

您需要使用逻辑来从此列表中访问Next webelement。

Answer 2

您应该尝试使用find_elements，它会返回WebElement列表或空列表。所以只需检查其长度如下： -

while True:
    findRecords()
    next_link = driver.find_elements(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
    if len(next_link) == 0:
        break
    next_link[0].click()

在Python中实现修改的do-while循环，即至少在循环结束时执行一次和另一次？

2 个答案: