机械化

Question

从网站上获取数据/表单，我尝试了 mechanize and selenium ，都失败了。

机械化

脚本如下所示，

import sys
import mechanize
url ='xxx'
response2=br.open(url)
request = br.request
print (response2.info())
print (response2.read())

输出：

Cache-Control: no-store, must-revalidate, no-cache, max-age=0
Content-Type: text/html
Connection: close
Vary: Accept-Encoding
Pragma: no-cache
Expires: -1
CacheControl: no-cache
X-UA-Compatible: IE=edge
Content-Type: text/html; charset=utf-8

... more content ...

<noscript>Please enable JavaScript to view the page content.</noscript>
</head><body>
</body></html>

硒

所以我想也许我可以用se来运行js，比如

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
url= 'xxx'
driver.get(url)

print driver.context
print driver.title

print driver.page_source
driver.close()

但我再次失败，结果几乎相同：

...
<noscript>Please enable JavaScript to view the page content.</noscript>
...

我只想从网站上获取正确的内容/表单，并将submit或post数据/表单发送到服务器，以模拟网络浏览器访问操作。

我现在没有想法，我不太了解硒的工作原理，等待你的帮助，提前谢谢。

Answer 1

试试这个：
使用以下配置文件启用闪存。

from selenium.webdriver.firefox.firefox_profile import FirefoxProfile

firefoxProfile = FirefoxProfile()

## Enable Flash

firefoxProfile.set_preference('dom.ipc.plugins.enabled.libflashplayer.so',
                          'true')

driver = webdriver.Firefox(firefoxProfile)

如果，它仍然无法使用chromedriver而不是firefox，它似乎默认在chromedriver中工作。

https://chromedriver.storage.googleapis.com/index.html?path=2.30/

获取网页，但需要javascript来查看页面内容

机械化

硒

1 个答案: