我正在废弃一些网站并且它动态地工作。我去了一个网站的所有页面,同时我想要列表中所有页面的所有页面源数据。 这是我的代码移动到所有页面并获取其页面源。但是在功能结束时没有任何东西正在打印或返回。 我为其他网站做了这个工作,但不是这里。 请帮我解决这个问题。 谢谢
def get_html(driver):
output = []
keep_going = True
while keep_going:
# Pull page HTML
try:
output.append(driver.page_source)
except TimeoutException:
pass
try:
# Check to see if a "next page" link exists
keep_going = driver.find_element_by_class_name(
'next ').is_displayed()
except NoSuchElementException:
keep_going = False
if keep_going == True:
try:
driver.wait.until(EC.element_to_be_clickable(
(By.CLASS_NAME, 'next '))).click()
time.sleep(3)
except TimeoutException:
keep_going = False
else:
keep_going = False
print(str(len(output)))
return (output)
raw_data = get_html(driver)
print(str(len(raw_data)) listing found")
这是我得到的错误输出。
> Entering search term number 1 out of 1 Traceback (most recent call
> last): File "E:/Harshitha/python learning/python/New/rough1.py",
> line 114, in <module>
> raw_data = get_html(driver) File "E:/Harshitha/python learning/python/New/rough1.py", line 65, in get_html
> output = (driver.page_source).encode('utf-8') File "C:\Users\Harshitha\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py",
> line 670, in page_source
> return self.execute(Command.GET_PAGE_SOURCE)['value'] File "C:\Users\Harshitha\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py",
> line 312, in execute
> self.error_handler.check_response(response) File "C:\Users\Harshitha\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py",
> line 237, in check_response
> raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: chrome not
> reachable (Session info: chrome=63.0.3239.132) (Driver info:
> chromedriver=2.34.522940
> (1a76f96f66e3ca7b8e57d503b4dd3bccfba87af1),platform=Windows NT
> 10.0.16299 x86_64)
答案 0 :(得分:1)
我使用page_source
函数:
driver.page_source;