我在Python上使用Google App Engine,我正在尝试获取GZipped XML文件并使用LXML的iterparse解析它。我使用了lxml.de中的示例来创建以下代码:
import gzip, base64, StringIO
from lxml import etree
from google.appengine.ext import webapp
from google.appengine.api.urlfetch import fetch
class Catalog(webapp.RequestHandler):
user = xxx
password = yyy
catalog = fetch('url',
headers={"Authorization":
"Basic %s" % base64.b64encode(user + ':' + password)}, deadline=600)
items = etree.iterparse(StringIO.StringIO(catalog), tag='product')
for _, element in items:
print('%s -- %s' % (element.findtext('name'), element[1].text))
element.clear()
当我运行它时,它会给我以下错误:
for _, element in coupons:
File "iterparse.pxi", line 491, in lxml.etree.iterparse.__next__ (src/lxml\lxml.etree.c:98565)
File "iterparse.pxi", line 543, in lxml.etree.iterparse._read_more_events (src/lxml\lxml.etree.c:99086)
File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml\lxml.etree.c:74791)
XMLSyntaxError: Specification mandate value for attribute object, line 1, column 53
这个错误是什么意思?我猜测XML文件格式不正确,但我不知道在哪里寻找问题。任何帮助将不胜感激!
答案 0 :(得分:1)
通过不同地处理fetch / gzip部分,激活异步请求和使用webapp2来解决问题。当使用所有这一切时它起作用:)这是代码:
from google.appengine.api.urlfetch import fetch
import gzip, webapp2, base64, StringIO, datetime
from credentials import CJCredentials
from lxml import etree
class Catalog(webapp2.RequestHandler):
def get(self):
user = xxx
password = yyy
url = 'some_url'
catalogResponse = fetch(url, headers={
"Authorization": "Basic %s" % base64.b64encode(user + ':' + password)
}, deadline=10000000)
f = StringIO.StringIO(catalogResponse.content)
c = gzip.GzipFile(fileobj=f)
content = c.read()
xml = StringIO.StringIO(content)
tree = etree.iterparse(xml, tag='product')
for event, element in tree:
print element.name