我一直在谷歌应用引擎中尝试在python 2.7上使用hx5lib和lxml。但是,当我运行以下代码时,它给出了一个错误,指出“NameError:全局名称'etree'未定义”。是不是可以在谷歌应用引擎上使用lxml.etree?还是我错过了什么?
的app.yaml
application: testsite
version: 1
runtime: python27
api_version: 1
threadsafe: false
handlers:
- url: /.*
script: index.py
libraries:
- name: lxml
version: "2.3" # I thought this would allow me to use lxml.etree
index.py
from testhandler import TestHandler
application = webapp.WSGIApplication([('/', TestHandler)], debug=True)
testhandler.py
import urllib2
import html5lib
from html5lib import treebuilders
try:
from lxml import etree
print("running with lxml.etree")
except ImportError:
try:
# Python 2.5
import xml.etree.cElementTree as etree
print("running with cElementTree on Python 2.5+")
except ImportError:
try:
# Python 2.5
import xml.etree.ElementTree as etree
print("running with ElementTree on Python 2.5+")
except ImportError:
try:
# normal cElementTree install
import cElementTree as etree
print("running with cElementTree")
except ImportError:
try:
# normal ElementTree install
import elementtree.ElementTree as etree
print("running with ElementTree")
except ImportError:
print("Failed to import ElementTree from any known place")
from google.appengine.ext import webapp
class TestHandler(webapp.RequestHandler):
def get(self):
f = urllib2.urlopen("http://www.yahoo.com/").read()
doc = html5lib.parse(f, treebuilder='lxml')
elems = doc.xpath("//*[local-name() = 'a']")
self.response.out.write(len(elems))
错误
running with cElementTree on Python 2.5+
Status: 500 Internal Server Error
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Content-Length: 769
<pre>Traceback (most recent call last):
File "/usr/local/bin/google_appengine/google/appengine/ext/webapp/_webapp25.py", line 701, in __call__
handler.get(*groups)
File "/home/test/testhandler.py", line 38, in get
parser = html5lib.HTMLParser(tree= treebuilders.getTreeBuilder('lxml'))
File "/home/test/html5lib/html5parser.py", line 68, in __init__
self.tree = tree(namespaceHTMLElements)
File "/home/test/html5lib/treebuilders/etree_lxml.py", line 176, in __init__
builder = etree_builders.getETreeModule(etree, fullTree=fullTree)
NameError: global name 'etree' is not defined
</pre>
添加
不,我尝试了几种创建doc对象的方法,但没有运气。其中一种方法,我尝试导入from lxml.html import document_fromstring
,这给了我这个错误。
Traceback (most recent call last):
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 4143, in _HandleRequest
self._Dispatch(dispatcher, self.rfile, outfile, env_dict)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 4049, in _Dispatch
base_env_dict=env_dict)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 616, in Dispatch
base_env_dict=base_env_dict)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 3120, in Dispatch
self._module_dict)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 3024, in ExecuteCGI
reset_modules = exec_script(handler_path, cgi_path, hook)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2887, in ExecuteOrImportScript
exec module_code in script_module.__dict__
File "/home/yoo/eclipse_workspace/website_checker/src/index.py", line 5, in <module>
from handlers.updatecheck import UpdateCheckHandler
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
return func(self, *args, **kwargs)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2503, in load_module
return self.FindAndLoadModule(submodule, fullname, search_path)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
return func(self, *args, **kwargs)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2375, in FindAndLoadModule
description)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
return func(self, *args, **kwargs)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2318, in LoadModuleRestricted
description)
File "/home/test/updatecheck.py", line 4, in <module>
from lxml.html import document_fromstring
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
return func(self, *args, **kwargs)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2503, in load_module
return self.FindAndLoadModule(submodule, fullname, search_path)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
return func(self, *args, **kwargs)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2375, in FindAndLoadModule
description)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
return func(self, *args, **kwargs)
File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2318, in LoadModuleRestricted
description)
File "/usr/lib/python2.7/dist-packages/lxml/html/__init__.py", line 12, in <module>
from lxml import etree
ImportError: cannot import name etree
根据错误,似乎app引擎不允许我出于某种原因加载etree模块。我想在xxml中使用xpath,但我不能花太多时间弄清楚这里发生了什么,也没有足够的python知识。所以我试着找一个'simpletree'版本的方法。
f = urllib2.urlopen("http://www.yahoo.com/").read()
p = html5lib.HTMLParser()
doc = p.parse(f)
# do something with doc.childNodes
self.response.out.write(len(doc.childNodes))
这不是一个好方法,但至少当我在实时谷歌应用引擎上测试时它起作用了。
答案 0 :(得分:1)
您是否在本地安装了lxml?之前我遇到了同样的错误 - 导入失败。你可以在这里下载lxml:http://pypi.python.org/pypi/lxml/
lxml适用于GAE,这很棒。但是现在确实没有任何关于此的文档或示例。
答案 1 :(得分:1)
在Windows上,我遇到了这个问题,这是因为python27发行版不包含lxml。您可以使用easy_install脚本,但是您必须编译给我带来麻烦的源代码。
使用我在Google论坛上发现的这篇文章:
https://groups.google.com/forum/?fromgroups=#!topic/comp.lang.python/Q8YeOIbn5Ds
但是,如果你想节省尝试从源代码构建它的痛苦,只需安装一个预编译的二进制文件,例如: http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml
只需从上述网站下载可执行文件并运行* .exe,它就会停止所有必需的代码。
答案 2 :(得分:0)
尝试
import lxml
位于testhandler的顶部
答案 3 :(得分:0)
使用pip进行安装:pip install lxml