读取特定文件夹中的多个XML文件 - Python

时间:2012-08-23 11:34:25

标签: python xml file directory

我需要你们的帮助。

我是编程新手,所以不要期待我的代码。

这就是问题,我需要在文件夹中解析一堆XML文件并将其写在.xls或.csv上。到目前为止,我已经解析了xml并将其写入.txt,但是我使用它的文件位于程序所在的文件夹中。

以下是代码:

from xml.dom import minidom

from datetime import *

ano = int(input("Year: "))

mes = int(input("Month: "))

dia = int(input("Day: "))

dt_obj = datetime(ano, mes, dia)

date_str = dt_obj.strftime("%Y-%m-%d")

#Extracting the information from the XML nodes

xmldoc = minidom.parse("NAME OF THE FILE.XML")

NFe = xmldoc.getElementsByTagName("NFe")[0]

infNFe = NFe.getElementsByTagName("infNFe")[0]

ide = infNFe.getElementsByTagName("ide")[0]

nNF = ide.getElementsByTagName("nNF")[0].firstChild.data

dEmi = ide.getElementsByTagName("dEmi")[0].firstChild.data

serie = ide.getElementsByTagName("serie")[0].firstChild.data

emit = infNFe.getElementsByTagName("emit")[0]

cnpj = emit.getElementsByTagName("CNPJ")[0].firstChild.data

nfeProc = xmldoc.getElementsByTagName("nfeProc")[0]

chNFe = nfeProc.getElementsByTagName("chNFe")[0].firstChild.data


try:

    # This will create a new file or **overwrite an existing file**.

    f = open(date_str+".txt", "w")
    try:
        f.write("CNPJ: "+cnpj) # Write a string to a file
        f.writelines("\nNUMERO DA NOTA: "+nNF)
        f.write("\nDATA DE EMISSAO: "+dEmi)
        f.write("\nSERIE: "+serie)
        f.write("\nCHAVE ELETRONICA: "+chNFe)
    finally:
        f.close()
 except IOError:
    pass 

我已经成功读取XML,解析它并从我需要的节点写入信息。

我现在需要的是阅读一堆文件夹并在.XLS上书写

任何?

2 个答案:

答案 0 :(得分:0)

试试这个尺寸。

from xml.dom import minidom
from datetime import *

ano = int(input("Year: "))
mes = int(input("Month: "))
dia = int(input("Day: "))
dt_obj = datetime(ano, mes, dia)
date_str = dt_obj.strftime("%Y-%m-%d")

#Extracting the information from the XML nodes

def get_files(d):
        return [os.path.join(d, f) for f in os.listdir(d) if os.path.isfile(os.path.join(d,f))]

def parse(files):
    for xml_file in files:
        xmldoc = minidom.parse(xml_file)
        NFe = xmldoc.getElementsByTagName("NFe")[0]
        infNFe = NFe.getElementsByTagName("infNFe")[0]
        ide = infNFe.getElementsByTagName("ide")[0]
        nNF = ide.getElementsByTagName("nNF")[0].firstChild.data
        dEmi = ide.getElementsByTagName("dEmi")[0].firstChild.data
        serie = ide.getElementsByTagName("serie")[0].firstChild.data
        emit = infNFe.getElementsByTagName("emit")[0]
        cnpj = emit.getElementsByTagName("CNPJ")[0].firstChild.data
        # now whatever you want...

parse(get_files(DIRECTORY))

DIRECTORY是XML文件所在的位置。

因为这只是代码的一部分,所以您需要自己填写其余代码。你还没有准确地提供你想写的东西,或者你想要写的格式......

可以帮助您编写CSV文件:

# csv_lovation is a location os a *.csv file, and contents is a list of lists:
# ( [ ["row1 item1", "row1 item2", "row1 item3"], ["row2 item1", "row2 item2", "row2 item3"] ] )
def write_csv(csv_location, contents):
    with open(csv_location, "w") as file_writer:
        file_writer.write("Header,Items,Here\n") #if you have no need for a header, remove this line.
            for line in contents:
                file_writer.write("%s\n" % ",".join(line))

答案 1 :(得分:0)

如果xml文件位于单个文件夹中,您可以执行以下操作:

import os
import sys

def select_files_in_folder(dir, ext):
    for file in os.listdir(dir):
        if file.endswith('.%s' % ext):
            yield os.path.join(dir, file)

for file in select_files_in_folder(sys.argv[1], 'xml'):
    process_xml_file(file)

或者,如果文件可以在子文件夹中,请使用:

def select_files_in_subfolders(dir, ext):
    for root, dirs, files in os.walk(dir):
        for file in files:
            if file.endswith('.%s' % ext):
                yield os.path.join(dir, file)