selenium.common.exceptions.WebDriverException:消息:TypeError:p [0]未定义

时间:2017-12-13 00:25:10

标签: javascript python selenium

我正在尝试开发网络抓取工具。我有一个python脚本和一个javascript代码.Python脚本调用javascript代码。我的javascript代码从网页中检索相关内容。并将此内容返回给python脚本。当我们在浏览器上手动运行它时,Javascript代码工作正常。 这是我的js代码:

var doc = ""
var path1 = document.getElementsByClassName("entry-header")[0]
doc = doc + path1.innerText
doc = doc + "\n"
var path2 = document.getElementsByClassName("entry-content")[0]
var cont = path2.getElementsByTagName("p")
for (var i=0; i<cont.length; i++)
{
   doc = doc+cont[i].innerText
   doc = doc+ "\n"
}

res()

function res()
{
  return doc
}

这是我的python代码:

from selenium import webdriver
js = open("generalized.js", "r").read()
driver = webdriver.Firefox()
browser = webdriver.Firefox()
browser.get("http://www.geeksforgeeks.org/branch-and-bound-set-1-       introduction-with-01-knapsack/")
result = driver.execute_script(js)
print result

但是当通过python调用时,它会给我以下错误。

Traceback (most recent call last):
File "sample.py", line 7, in <module>
result = driver.execute_script(js)
File "/home/sagar/anaconda2/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 543, in execute_script
'args': converted_args})['value']
File "/home/sagar/anaconda2/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 308, in execute
self.error_handler.check_response(response)
File "/home/sagar/anaconda2/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: TypeError: p[0] is undefined

请帮我解决这个问题。或者还有其他网页抓取方式吗?

1 个答案:

答案 0 :(得分:0)

出于某种原因,您启动了两个浏览器,但在浏览器中执行了打开空白页面的脚本。这对我有用:

from selenium import webdriver
import time

js = open("generalized.js", "r").read()

browser = webdriver.Firefox()
browser.get("http://www.geeksforgeeks.org/branch-and-bound-set-1-introduction-with-01-knapsack/")

time.sleep(1)  # try to replace with an Explicit Wait
result = browser.execute_script(js)
print(result)

使用顶级return doc的修改后的脚本:

var doc = "";
var path1 = document.getElementsByClassName("entry-header")[0];
doc = doc + path1.innerText;
doc = doc + "\n";
var path2 = document.getElementsByClassName("entry-content")[0];
var cont = path2.getElementsByTagName("p");
for (var i=0; i<cont.length; i++)
{
   doc = doc+cont[i].innerText;
   doc = doc+ "\n"
}

return doc;