我发现了这段非常棒的代码,正是我将XML转换为数据帧所需的代码。
def parse_XML(xml_file, df_cols):
"""Parse the input XML file and store the result in a pandas
DataFrame with the given columns.
The first element of df_cols is supposed to be the identifier
variable, which is an attribute of each node element in the
XML data; other features will be parsed from the text content
of each sub-element.
"""
xtree = ET.parse(xml_file)
xroot = xtree.getroot()
rows = []
for node in xroot:
res = []
res.append(xroot.attrib[str(col_names[0])])
for el in df_cols[1:]:
if node is not None and node.find(el) is not None:
res.append(node.find(el).text)
else:
res.append('None')
rows.append({df_cols[i]: res[i]
for i, _ in enumerate(df_cols)})
out_df = pd.DataFrame(rows, columns=df_cols)
for cols in out_df.columns[1:5]:
out_df[str(cols)].replace(to_replace=['None'],value=out_df[str(cols)][0],inplace=True)
out_df.drop([0],inplace=True)
return out_df
我如何使用此代码来生成一个脚本来运行我的目录,如下所示:
for subdir, dirs, files in os.walk(os.getcwd()):
for file in files:
if file.endswith(".xml"):
收集树中的所有xml,生成1个大df并将其保存为csv。