无法将XML子属性展开为Python Pandas DataFrame

时间:2019-05-30 08:42:53

标签: python xml pandas dataframe

我成功地将XML数据转换为pd.DataFrame,但是在一个DataFrame的列中却遇到了问题,该列是字典并且没有展开。

我正在处理以下XML数据(摘录):

xml = '''
<FMPReport link="Privs_XML.xml" creationTime="10:03:45 AM" creationDate="5/28/2019" type="Report" version="17.0.6">
    <File name="PrivilegeSet">
        <PrivilegesCatalog>
            <PrivilegeSet comment="access to everything" id="1" allowModifyPassword="True" managedExtended="True" menu="All" idleDisconnect="False" overrideValidationWarning="True" exporting="True" printing="True" name="Full Access">
                <Records value="CreateEditDelete"/>
                <Layouts value="Modifiable" allowCreation="True"/>
                <ValueLists value="Modifiable" allowCreation="True"/>
                <Scripts value="Modifiable" allowCreation="True"/>
            </PrivilegeSet>
            <PrivilegeSet comment="write access to all records, no design access" id="2" allowModifyPassword="True" managedExtended="False" menu="All" idleDisconnect="True" overrideValidationWarning="False" exporting="True" printing="True" name="Data Entry Only">
                <Records value="CreateEditDelete"/>
                <Layouts value="ViewOnly" allowCreation="False"/>
                <ValueLists value="ViewOnly" allowCreation="False"/>
                <Scripts value="ExecutableOnly" allowCreation="False"/>
            </PrivilegeSet>
        </PrivilegesCatalog>
    </File>
</FMPReport>'''

我从XML数据迭代了PrivilegeSet的子元素,并将信息收集到pd.DataFrame中。我的代码产生了三列,名为“属性”的列显示为字典,没有展开为扩展列。在创建DataFrame之后,我定义了列名称。

import pandas as pd
import xml.etree.ElementTree as ET
root = ET.fromstring(xml)

df_cols = ['Name', 'Tag', 'Attribute']
out_df = pd.DataFrame(columns=df_cols)

for parent in root.iter('PrivilegeSet'):
    for child in parent:
        pname = parent.attrib.get('name')
        ctag = child.tag
        cattrib = child.attrib

        out_df = out_df.append(pd.Series([pname, ctag, cattrib],
                                         index = df_cols),
                               ignore_index=True)

我希望“名称”和“标签”列保持不变,并且“属性”列展开。

0 个答案:

没有答案