自动化xml解析以输出单个csv df

时间:2020-10-27 09:37:25

标签: python xml pandas xml-parsing

我发现了这段非常棒的代码,正是我将XML转换为数据帧所需的代码。

def parse_XML(xml_file, df_cols): 
"""Parse the input XML file and store the result in a pandas 
DataFrame with the given columns. 

The first element of df_cols is supposed to be the identifier 
variable, which is an attribute of each node element in the 
XML data; other features will be parsed from the text content 
of each sub-element. 
"""

xtree = ET.parse(xml_file)
xroot = xtree.getroot()
rows = []


for node in xroot:
    res = []
    res.append(xroot.attrib[str(col_names[0])])
    for el in df_cols[1:]: 
        if node is not None and node.find(el) is not None:
            res.append(node.find(el).text)
        else: 
            res.append('None')
    rows.append({df_cols[i]: res[i] 
                 for i, _ in enumerate(df_cols)})

out_df = pd.DataFrame(rows, columns=df_cols)
for cols in out_df.columns[1:5]:
    out_df[str(cols)].replace(to_replace=['None'],value=out_df[str(cols)][0],inplace=True)
out_df.drop([0],inplace=True)

return out_df

我如何使用此代码来生成一个脚本来运行我的目录,如下所示:

for subdir, dirs, files in os.walk(os.getcwd()):
for file in files:
    if file.endswith(".xml"):

收集树中的所有xml,生成1个大df并将其保存为csv。

0 个答案:

没有答案