extract xml to pandas dataframe with unknown number of nodes

时间:2019-04-17 02:18:00

标签: python xml pandas dataframe

The below code sample works if there is only one node. However, our use case we dont know how many nodes we will receive

Convert a xml to pandas data frame python

Sample as below. How we can parse this into dataframe In particular, we dont know how manby we will received in the feed file

<?xml version = '1.0' encoding = 'UTF-8'?>
<EVENT spec="IDL:com/RfcCallEvents:1.0#Z_BAPI_UPDT_SERV_NOTIFICATION">
   <eventHeader>
      <objectName/>
      <objectKey/>
      <eventName/>
      <eventId/>
   </eventHeader>
   <TAB_DETAIL_DATA>
      <ZNEWFLAG>X</ZNEWFLAG>
      <FENUM>2</FENUM>
      <BAUTL>661-01727</BAUTL>
      <OTEIL/>
      <FECOD>KBB</FECOD>
      <URCOD>B08</URCOD>
      <ZCOMPMDF>A</ZCOMPMDF>
      <ZOPREPL/>
      <ZWRNCOV>LP</ZWRNCOV>
      <ZWRNREF/>
      <ZNEWPS>C07XMAAEJCLD</ZNEWPS>
      <ZOLDPN/>
      <ZOLDPD/>
      <ZOLDPS>C07XMAACJCLD</ZOLDPS>
      <MAILINFECOD/>
      <ZUNITPR/>
      <ZNEWPD/>
      <ZNEWPN/>
      <ZABUSE/>
      <ZRPS>S</ZRPS>
      <ZEXKGB/>
      <ZKGBMM/>
      <ZINSTS>000</ZINSTS>
      <ZACKBB/>
      <ZCHKOVR/>
      <ZSNDB/>
      <ZNOTAFISCAL/>
      <ZCONSGMT/>
      <ZPRTCONS/>
      <ZZRTNTRNO/>
      <ZZRTNCAR/>
      <ZZINSPECT/>
      <ZZPR_OPT/>
   </TAB_DETAIL_DATA>
   <TAB_DETAIL_DATA>
      <ZNEWFLAG>X</ZNEWFLAG>
      <FENUM>1</FENUM>
      <BAUTL>661-01727</BAUTL>
      <OTEIL/>
      <FECOD>KBB</FECOD>
      <URCOD>B08</URCOD>
      <ZCOMPMDF>A</ZCOMPMDF>
      <ZOPREPL/>
      <ZWRNCOV>LP</ZWRNCOV>
      <ZWRNREF/>
      <ZNEWPS>C07XMAAEJCLD</ZNEWPS>
      <ZOLDPN/>
      <ZOLDPD/>
      <ZOLDPS>C07XMAACJCLD</ZOLDPS>
      <MAILINFECOD/>
      <ZUNITPR/>
      <ZNEWPD/>
      <ZNEWPN/>
      <ZABUSE/>
      <ZRPS>S</ZRPS>
      <ZEXKGB/>
      <ZKGBMM/>
      <ZINSTS>000</ZINSTS>
      <ZACKBB/>
      <ZCHKOVR/>
      <ZSNDB/>
      <ZNOTAFISCAL/>
      <ZCONSGMT/>
      <ZPRTCONS/>
      <ZZRTNTRNO/>
      <ZZRTNCAR/>
      <ZZINSPECT/>
      <ZZPR_OPT/>
   </TAB_DETAIL_DATA>
   <TAB_HEADER_DATA>
      <QMNUM>030334920069</QMNUM>
      <ZGSXREF>CONSUMER</ZGSXREF>
      <ZVANTREF>G338005317</ZVANTREF>
      <ZSHIPER/>
      <ZSHPRNO/>
      <ZRVREF/>
      <ZTECHID>4HQ2OD6C19</ZTECHID>
      <ZADREPAIR/>
      <ZZKATR7/>
   </TAB_HEADER_DATA>
</EVENT>

1 个答案:

答案 0 :(得分:0)

我怀疑您需要将xml-data解析为多个数据帧,例如如下:

import xmltodict # install this module first
data = """<?xml version = '1.0' encoding = 'UTF-8'?>
<EVENT spec="IDL:com/RfcCallEvents:1.0#Z_BAPI_UPDT_SERV_NOTIFICATION">
   <eventHeader>
      <objectName/>
      <objectKey/>
      <eventName/>
      <eventId/>
   </eventHeader>
   <TAB_DETAIL_DATA>
      <ZNEWFLAG>X</ZNEWFLAG>
      <FENUM>2</FENUM>
      <BAUTL>661-01727</BAUTL>
      <OTEIL/>
      <FECOD>KBB</FECOD>
      <URCOD>B08</URCOD>
      <ZCOMPMDF>A</ZCOMPMDF>
      <ZOPREPL/>
      <ZWRNCOV>LP</ZWRNCOV>
      <ZWRNREF/>
      <ZNEWPS>C07XMAAEJCLD</ZNEWPS>
      <ZOLDPN/>
      <ZOLDPD/>
      <ZOLDPS>C07XMAACJCLD</ZOLDPS>
      <MAILINFECOD/>
      <ZUNITPR/>
      <ZNEWPD/>
      <ZNEWPN/>
      <ZABUSE/>
      <ZRPS>S</ZRPS>
      <ZEXKGB/>
      <ZKGBMM/>
      <ZINSTS>000</ZINSTS>
      <ZACKBB/>
      <ZCHKOVR/>
      <ZSNDB/>
      <ZNOTAFISCAL/>
      <ZCONSGMT/>
      <ZPRTCONS/>
      <ZZRTNTRNO/>
      <ZZRTNCAR/>
      <ZZINSPECT/>
      <ZZPR_OPT/>
   </TAB_DETAIL_DATA>
   <TAB_DETAIL_DATA>
      <ZNEWFLAG>X</ZNEWFLAG>
      <FENUM>1</FENUM>
      <BAUTL>661-01727</BAUTL>
      <OTEIL/>
      <FECOD>KBB</FECOD>
      <URCOD>B08</URCOD>
      <ZCOMPMDF>A</ZCOMPMDF>
      <ZOPREPL/>
      <ZWRNCOV>LP</ZWRNCOV>
      <ZWRNREF/>
      <ZNEWPS>C07XMAAEJCLD</ZNEWPS>
      <ZOLDPN/>
      <ZOLDPD/>
      <ZOLDPS>C07XMAACJCLD</ZOLDPS>
      <MAILINFECOD/>
      <ZUNITPR/>
      <ZNEWPD/>
      <ZNEWPN/>
      <ZABUSE/>
      <ZRPS>S</ZRPS>
      <ZEXKGB/>
      <ZKGBMM/>
      <ZINSTS>000</ZINSTS>
      <ZACKBB/>
      <ZCHKOVR/>
      <ZSNDB/>
      <ZNOTAFISCAL/>
      <ZCONSGMT/>
      <ZPRTCONS/>
      <ZZRTNTRNO/>
      <ZZRTNCAR/>
      <ZZINSPECT/>
      <ZZPR_OPT/>
   </TAB_DETAIL_DATA>
   <TAB_HEADER_DATA>
      <QMNUM>030334920069</QMNUM>
      <ZGSXREF>CONSUMER</ZGSXREF>
      <ZVANTREF>G338005317</ZVANTREF>
      <ZSHIPER/>
      <ZSHPRNO/>
      <ZRVREF/>
      <ZTECHID>4HQ2OD6C19</ZTECHID>
      <ZADREPAIR/>
      <ZZKATR7/>
   </TAB_HEADER_DATA>
</EVENT>"""

dct = xmltodict.parse(data)

def make_df(name="TAB_DETAIL_DATA", dct=dct):
    df = pd.DataFrame()
    if isinstance(dct['EVENT'][name], list):
        for j in dct['EVENT'][name]:
            _ = pd.DataFrame({'value': [y for x, y in j.items()]}, index=j.keys())
            df = pd.concat([df, _])
    else:
        df = pd.DataFrame({'value': [y for x, y in dct['EVENT'][name].items()]}, index=dct['EVENT'][name].keys())
    return df

现在,您可以尝试使用解析器了:

make_df(name="TAB_HEADER_DATA") # produces single df

enter image description here

make_df(name="TAB_DETAIL_DATA") # concatenates all content occurred in TAB_DETAIL_DATA sections, returns  single df