我成功地将XML数据转换为pd.DataFrame,但是在一个DataFrame的列中却遇到了问题,该列是字典并且没有展开。
我正在处理以下XML数据(摘录):
xml = '''
<FMPReport link="Privs_XML.xml" creationTime="10:03:45 AM" creationDate="5/28/2019" type="Report" version="17.0.6">
<File name="PrivilegeSet">
<PrivilegesCatalog>
<PrivilegeSet comment="access to everything" id="1" allowModifyPassword="True" managedExtended="True" menu="All" idleDisconnect="False" overrideValidationWarning="True" exporting="True" printing="True" name="Full Access">
<Records value="CreateEditDelete"/>
<Layouts value="Modifiable" allowCreation="True"/>
<ValueLists value="Modifiable" allowCreation="True"/>
<Scripts value="Modifiable" allowCreation="True"/>
</PrivilegeSet>
<PrivilegeSet comment="write access to all records, no design access" id="2" allowModifyPassword="True" managedExtended="False" menu="All" idleDisconnect="True" overrideValidationWarning="False" exporting="True" printing="True" name="Data Entry Only">
<Records value="CreateEditDelete"/>
<Layouts value="ViewOnly" allowCreation="False"/>
<ValueLists value="ViewOnly" allowCreation="False"/>
<Scripts value="ExecutableOnly" allowCreation="False"/>
</PrivilegeSet>
</PrivilegesCatalog>
</File>
</FMPReport>'''
我从XML数据迭代了PrivilegeSet的子元素,并将信息收集到pd.DataFrame中。我的代码产生了三列,名为“属性”的列显示为字典,没有展开为扩展列。在创建DataFrame之后,我定义了列名称。
import pandas as pd
import xml.etree.ElementTree as ET
root = ET.fromstring(xml)
df_cols = ['Name', 'Tag', 'Attribute']
out_df = pd.DataFrame(columns=df_cols)
for parent in root.iter('PrivilegeSet'):
for child in parent:
pname = parent.attrib.get('name')
ctag = child.tag
cattrib = child.attrib
out_df = out_df.append(pd.Series([pname, ctag, cattrib],
index = df_cols),
ignore_index=True)
我希望“名称”和“标签”列保持不变,并且“属性”列展开。