在python中解析目录中的每个文件?

时间:2015-11-18 02:31:52

标签: python xml parsing xml-parsing

所以我有这段代码:

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()

for segment in root.iter("s"):
    for word in segment.iter("w"):
        print word.text,
    print "\n"

这将解析xml文件test.xml并打印解析后的输出。但是,我有大量的这些xml文件需要在目录中进行解析。如何修改代码以使其遍历目录中的每个文件并将此函数应用于该文件?

谢谢!

2 个答案:

答案 0 :(得分:0)

这应该有效:

def printParsed(filename):
    tree = ET.parse(filename)
    root = tree.getroot()

    for segment in root.iter("s"):
        for word in segment.iter("w"):
            print word.text,
        print "\n"

if __name__ == "__main__":
    from os import listdir
    from os.path import isfile, join
    mypath ='path/to/your/xml/files'
    onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
    for f in onlyfiles:
        # only does stuff if the file ends in xml
        if f[-3:] = '.xml':
            printParsed(f)

您可以将文件保存为parser.py,然后像python parser.py一样运行。如果需要,您也可以删除if __name__ == "__main__"部分。

答案 1 :(得分:0)

使用os.listdir(path)

它返回目录中所有文件的列表。

代码:

import xml.etree.ElementTree as ET
import os
listofxml = os.listdir("./")
    for xml in listofxml:
        tree = ET.parse(xml)
        root = tree.getroot()

        for segment in root.iter("s"):
                for word in segment.iter("w"):
                        print word.text,
                print "\n"

如果不是所有文件都是xml,那么你可以拆分并检查:

import xml.etree.ElementTree as ET
import os
listofxml = os.listdir("./")
    for xml in listofxml:
        format = xml.split('.')
        if format[-1] == 'xml':
            tree = ET.parse(xml)
            root = tree.getroot()

            for segment in root.iter("s"):
                    for word in segment.iter("w"):
                            print word.text,
                    print "\n"