Question

我正在尝试将XML文件转换为csv。我已经在下面的代码中做到了这一点。但是，我也试图将文件名包含在摘录中，但是我无法在此代码中包含该文件名。

df = pd.DataFrame()
for file in allFiles:
    def iter_docs(cis):
        for docall in cis:
            doc_dict = {}
            for doc in docall:
                tag = [elem.tag for elem in doc]
                txt = [elem.text for elem in doc]
                if len(tag) > 0:
                    doc_dict.update(dict(zip(tag, txt)))
                else:
                    doc_dict[doc.tag] = doc.text
             yield doc_dict
     etree = ET.parse(file)
     df = df.append(pd.DataFrame(list(iter_docs(etree.getroot()))))

Answer 1

尝试

df = df.append(pd.DataFrame([file] + list(iter_docs(etree.getroot()))))

获取添加了文件名的列

顺便说一句，这种方法会给您带来不好的表现。

一种更好的方法是将df收集在一个列表中，并在最后将其转换为大的df。

list_of_df = []

for file in allFiles:
    def iter_docs(cis):

    # your code

    list_of_df.append(pd.DataFrame([file] + list(iter_docs(etree.getroot()))))

# at the end 
df = pd.concat(list_of_df)

包含文件名以成为python中xml到csv转换的一部分

1 个答案: