Question

scanf()

我正在尝试编写一个程序，将XML文件下载到缓存中，然后使用from urllib.request import urlopen from lxml import objectify打开它们。如果我使用objectify下载文件，那么我可以使用urlopen()阅读它们就好了：

objectify.fromstring()

但是，如果我下载它们并将它们写入文件，我最终会在r = urlopen(my_url) o = objectify.fromstring(r.read())不喜欢的文件顶部添加编码声明。即：

objectify

结果为# download the file my_file = 'foo.xml' r = urlopen(my_url) # save locally with open(my_file, 'wb') as fp: fp.write(r.read()) # open saved copy with open(my_file, 'r') as fp: o1 = objectify.fromstring(fp.read())

如果我使用ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.那么这样做很好 - 我可以通过改变所有客户端代码来改为使用objectify.parse(fp)，但我觉得这不是正确的方法。我在本地存储了其他parse()工作正常的XML文件 - 基于粗略的评论，他们似乎有.fromstring()编码。

我只是不知道这里的解决方案是什么 - 我应该在保存文件时更改编码吗？我应该剥离编码声明吗？我应该用utf-8条款填写我的代码吗？请指教。

Answer 1

需要以二进制模式而不是文本模式打开文件。

open(my_file, 'rb') # b stands for binary

如异常所示：... Please use bytes input ...

python：下载和缓存XML文件 - 如何处理编码声明？

1 个答案: