如何在'内部输出功能?循环并使用它构建数据帧?

时间:2017-05-25 19:36:09

标签: python dataframe xml-parsing iterator api-design

我正在编写一个脚本来进行一些探索性分析。该脚本引用了ID的API,API响应XML输出(没有子对象

脚本:

import requests
import xml.etree.ElementTree as et


xml ='''    
<?xml version="1.0" encoding="UTF-8"?>
<YM>
   <Version>xxx</Version>
   <ApiKey>xxx</ApiKey>
   <CallID>xxx</CallID>
   <></>
   <SaPasscode>xxxx</SaPasscode>
   <Call Method = "GetIDs">

   </Call>
</YM>
'''        
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
r = requests.post('url', data=xml, headers=headers) 

示例输出:

<Members>
<Sa.Members.All.GetIDs>
<YourMembership_Response>
<ID>1234</ID>
<ID>4321</ID>
</Members>
</Sa.Members.All.GetIDs>
</YourMembership_Response>

我使用这些ID并将它们插入到另一个API调用中,以通过迭代函数将同一脚本中的ID获取更多信息,该函数将上述API调用中的ID解析为另一个获取每个API调用的API调用ID:

脚本:

def xml_event_info(eventID):       
    xml ='''        
    <?xml version="1.0" encoding="UTF-8"?>
    <YourMembership>
       <Version>xxx</Version>
       <ApiKey>xxx</ApiKey>
       <CallID>xxx</CallID>
       <></>
       <SaPasscode>xxx</SaPasscode>
       <Call Method = "Profile.Get">
           <ID>{}</ID>
       </Call>
    </YourMembership>        
    '''        
    headers = {'Content-Type': 'application/x-www-form-urlencoded'}
    r = requests.post('url', 
                      data=xml.format(eventID), headers=headers)        
    print(r.text)      




# BUILD XML TREE OBJECT    
dom = et.fromstring(r.text)

# PARSE EVENT ID TEXT AND PASS INTO FUNCTION
for i in dom.iterfind('.//ID'):
     xml_event_info(i.text)

示例输出(然后显示更多XML对象):

<?xml version="1.0" encoding="utf-8" ?>

<Response>
<ErrCode>xxx</ErrCode>
<ExtendedErrorInfo>xxx</ExtendedErrorInfo>
<Profile.Get>
<ID>xxxx</ID>
<WebsiteID>xxxx</WebsiteID>
<EmailBounced>xxx</EmailBounced>
<NamePrefix>xxx</NamePrefix>
<FirstName>xxx</FirstName>
</Profile.Get>
</Response>

我想从第二个API调用中获取上述示例及其许多XML属性,并将它们映射到pandas数据帧。我遇到的问题是,当我尝试在此处找到的for循环内部使用函数(xml_event_info(i.text))时,它保存第二个API调用输出:

# PARSE EVENT ID TEXT AND PASS INTO FUNCTION
for i in dom.iterfind('.//ID'):
     xml_event_info(i.text)

我正在尝试映射xml以映射到数据框中并且我不断收到错误&#39; TypeError:Parse()参数1必须是字符串或只读缓冲区,而不是无&#39; < /强>

如何将多个API调用的XML输出解析为pandas数据帧,其中每个XML标记都是数据帧的标题

Example:

---|ErrCode|ExtendedInfo|ID|FirstName----

我所指的完成工作的脚本和网站可在此处找到(http://www.austintaylor.io/lxml/python/pandas/xml/dataframe/2016/07/08/convert-xml-to-pandas-dataframe/

脚本:

def xml2df():
    tree = et.fromstring(xml_event_info(i.text))
    root = tree.getroot()
    all_records = []
    headers = []
    for i, child in enumerate(root):
        record = []
        for subchild in child:
            record.append(subchild.text)
            if subchild.tag not in headers:
                headers.append(subchild.tag)
        all_records.append(record)
    return pd.DataFrame(all_records, columns=headers)

完整的脚本:

import requests
import xml.etree.ElementTree as et
import pandas as pd
from lxml import etree

xml ='''    
<?xml version="1.0" encoding="UTF-8"?>
<YourMembership>
   <Version>xxx</Version>
   <ApiKey>xxxx</ApiKey>
   <CallID>xxx</CallID>
   <></>
   <SaPasscode>xxx</SaPasscode>
   <Call Method = "Events.All.GetIDs">
       <StartDate>2017/01/1</StartDate>
       <EndDate>2017/01/31</EndDate>
   </Call>
</YourMembership>
'''        
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
r = requests.post('url', data=xml, headers=headers)


def xml_event_info(eventID):       
    xml ='''        
    <?xml version="1.0" encoding="UTF-8"?>
    <YourMembership>
       <Version>xxx</Version>
       <ApiKey>xxx</ApiKey>
       <CallID>xxx</CallID>
       <></>
       <SaPasscode>xxx</SaPasscode>
       <Call Method = "Event.Get">
           <EventID>{}</EventID>
       </Call>
    </YourMembership>        
    '''        
    headers = {'Content-Type': 'application/x-www-form-urlencoded'}
    r = requests.post('url', 
                      data=xml.format(eventID), headers=headers)        
    print(r.text)
    return r.text      




# BUILD XML TREE OBJECT    
dom = et.fromstring(r.text)

# PARSE EVENT ID TEXT AND PASS INTO FUNCTION
for i in dom.iterfind('.//EventID'):
     y = xml_event_info(i.text)

     for xml in y: 
         tree = et.fromstring(y)
         root = tree.getchildren()
         all_records = []
         headers = []
         for i , child in enumerate(root):
             record = []
             for subchild in child:
                 record.append(subchild.text)
                 if subchild.tag not in headers:
                     headers.append(subchild.tag)
                 all_records.append(record)
                 #print all_records
                 print pd.DataFrame(all_records, columns=headers)

编辑:

TLDR:

如何使下面函数的输出映射到数据框中,xml元素作为数据框的标题:

import requests
import xml.etree.ElementTree as et
import pandas as pd

xml ='''    
<?xml version="1.0" encoding="UTF-8"?>
<YourMembership>
   <Version>xxx</Version>
   <ApiKey>xxxx</ApiKey>
   <CallID>xxx</CallID>
   <></>
   <SaPasscode>xxxx</SaPasscode>
   <Call Method = "GetIDs">

   </Call>
</YourMembership>
'''        
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
r = requests.post('url', data=xml, headers=headers)

def xml_event_info(eventID):       
    xml ='''        
    <?xml version="1.0" encoding="UTF-8"?>
    <YourMembership>
       <Version>xxx</Version>
       <ApiKey>xxx</ApiKey>
       <CallID>xxx</CallID>
       <></>
       <SaPasscode>xxx</SaPasscode>
       <Call Method = "Profile.Get">
           <ID>{}</ID>
       </Call>
    </YourMembership>        
    '''        
    headers = {'Content-Type': 'application/x-www-form-urlencoded'}
    r = requests.post('url', 
                      data=xml.format(eventID), headers=headers)        
    print(r.text)      

输出:

<?xml version="1.0" encoding="utf-8" ?>

<Response>
<ErrCode>xxx</ErrCode>
<ExtendedErrorInfo>xxx</ExtendedErrorInfo>
<Profile.Get>
<ID>xxxx</ID>
<WebsiteID>xxxx</WebsiteID>
<EmailBounced>xxx</EmailBounced>
<NamePrefix>xxx</NamePrefix>
<FirstName>xxx</FirstName>
</Profile.Get>
</Response>

1 个答案:

答案 0 :(得分:1)

HwndSource函数没有返回任何内容,只需在结尾处添加xml_event_info(eventID)语句,然后重试。

return