Question

我们正在尝试访问HTML页面并使用Python获取其内容。当它归结为帧加载时，我们面临一些问题。代码是：

URL = "http://192.168.1.48/_pnt_log.html"
    username = "11111"
    password ="1"

    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    login_data = urllib.urlencode({'username' : username, 'j_password' : password})
    try:
        opener.open('http://192.168.1.48/_top.html', login_data)
        resp = opener.open('http://192.168.1.48/_dept.html?dn=1')

收到的HTML如下：

<html>
<head>
 <meta http-equiv="content-type" content="text/html;charset=iso-8859-1">
 <title>Remote UI<Additional Functions>:  : imageRUNNER2520</title>
</head>
<frameset cols="175,*" bordercolor="white" border="0" framespacing="0" frameborder="0">
 <frame src="index06_02.html" name="Menu" scrolling="AUTO" noresize>
 <frame src="dept.html?dn=1" name="body" noresize>
 <noframes>
  <body bgcolor="white">
  </body>
 </noframes>
</frameset>
</html>

我希望dept.html?dn=1上的内容未加载此请求。有没有办法让像broswer这样的内容呢？

Answer 1

Finnaly＆＃34;问题＆＃34;是关于canon打印机页面如何保留cookie以及如何用urllib2打开会话。

我解决了selenium python lib使用的问题。 http://selenium-python.readthedocs.io/

使用selenium我从浏览器中获取html并滑过permisions问题，因为我通过浏览器在同一个会话上工作。

from selenium import webdriver

##OPEN BROSWER##
driver = webdriver.Firefox()
##LOGIN##
driver.get("http://192.168.1.48/_top.html")
driver.find_element_by_name('user_name').send_keys("11111")
driver.find_element_by_name('pwd').send_keys("1")
driver.find_element_by_xpath("/html/body/form/center/p[1]/table/tbody/tr[2]/td/table/tbody/tr[3]/td/table/tbody/tr/td/table/tbody/tr[13]/td[3]/a/img").click()
driver.get("http://192.168.1.48/dept.html?dn=1")
##GET HTML##
elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

##SAVE HTML##
f = open('/home/itsoum/PrinterProject/html_source_code.html', 'w')
f.write(source_code.encode('utf-8'))
f.close()

driver.quit()

使用Python

1 个答案: