Python使用lxml和fileinput

时间:2013-12-18 08:38:09

标签: python file-io lxml

有一个简单的xml

<?xml version="1.0" encoding="UTF-8" ?>
<root>
    <child>abc</child>
</root>

我想从文件中解析它,这很有效:

with open('tst.xml') as test_xml:
    for _, element in lxml.etree.iterparse(test_xml, tag='child'):
        print element.text # prints abc as expected

但是,我尝试修改脚本然后允许它从文件或stdin解析xml并且没有成功:

fi = fileinput.input('tst.xml')
for _, element in lxml.etree.iterparse(fi, tag='child'):
    print element.text

# File "iterparse.pxi", line 371, in lxml.etree.iterparse.__init__ (src/lxml/lxml.etree.c:97283)
# File "apihelpers.pxi", line 1411, in lxml.etree._encodeFilename (src/lxml/lxml.etree.c:22515)
# TypeError: Argument must be string or unicode.

我不确定我做错了什么。 FileInput对象在python中不是类文件对象吗?

2 个答案:

答案 0 :(得分:1)

如果没有深入调查,似乎异常的原因是FileInput类不提供read方法。 为了实现我的目标,我现在最终编写了自己的包装器:

class FileInput(object):
    def __init__(self, filename=None, *args, **kwargs):
        self.file = open(filename, *args, **kwargs) if filename and filename != "-" else sys.stdin

    def __enter__(self):
        return self.file

    def __exit__(self, type, value, traceback):
        if self.file is not sys.stdin:
            self.file.close()

    def __getattr__(self, name):
        return getattr(self.file, name)

我会等待更好的答案。

答案 1 :(得分:0)

你不应该尝试使用fileinput模块,而是直接这样做:

if filename == '-':   # or, if we don't have a filename argument
    f = sys.stdin
else:
    f = open(filename, 'r')