从文件下载创建xsd文档

时间:2015-05-25 22:15:01

标签: python amazon-s3 xsd lxml

我正在尝试加载存储在s3上的xsd文档。它给了我以下错误

>>> from lxml import etree
>>> xsd_url = 'https://s3-us-west-1.amazonaws.com/premiere-avails/movie.xsd.xml'
>>> node=etree.fromstring(requests.get(xsd_url).text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 3092, in lxml.etree.fromstring (src/lxml/lxml.etree.c:70473)
  File "parser.pxi", line 1823, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:106272)
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

我确认该文件实际上是正确的并且在本地加载。我怎么能从s3加载它?

2 个答案:

答案 0 :(得分:1)

使用类型bytes

.content
>>> from lxml import etree
>>> xsd_url = 'https://s3-us-west-1.amazonaws.com/premiere-avails/movie.xsd.xml'
>>> node = etree.fromstring(requests.get(xsd_url).content))

问题是您的xml文件指定了编码,因此解码此编码的是xml解析器的工作。但是,您的代码使用.text,要求requests解码编码。

这是正确的做法,但XML解析器不喜欢给出已解码的东西,然后被告知如何解码它,因此抛出你看到的异常。修复?没有requests解码它。

答案 1 :(得分:-1)

您可以使用urllib2并尝试执行以下操作:

    xsd_url = 'https://s3-us-west-1.amazonaws.com/premiere-avails/movie.xsd.xml'
    xsd_contents = urllib2.urlopen(xsd_url).read()
    xmlschema_doc = etree.fromstring(xsd_contents)