我正在编写一个脚本来进行一些探索性分析。该脚本引用了ID的API,API响应XML输出(没有子对象)
脚本:
import requests
import xml.etree.ElementTree as et
xml ='''
<?xml version="1.0" encoding="UTF-8"?>
<YM>
<Version>xxx</Version>
<ApiKey>xxx</ApiKey>
<CallID>xxx</CallID>
<></>
<SaPasscode>xxxx</SaPasscode>
<Call Method = "GetIDs">
</Call>
</YM>
'''
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
r = requests.post('url', data=xml, headers=headers)
示例输出:
<Members>
<Sa.Members.All.GetIDs>
<YourMembership_Response>
<ID>1234</ID>
<ID>4321</ID>
</Members>
</Sa.Members.All.GetIDs>
</YourMembership_Response>
我使用这些ID并将它们插入到另一个API调用中,以通过迭代函数将同一脚本中的ID获取更多信息,该函数将上述API调用中的ID解析为另一个获取每个API调用的API调用ID:
脚本:
def xml_event_info(eventID):
xml ='''
<?xml version="1.0" encoding="UTF-8"?>
<YourMembership>
<Version>xxx</Version>
<ApiKey>xxx</ApiKey>
<CallID>xxx</CallID>
<></>
<SaPasscode>xxx</SaPasscode>
<Call Method = "Profile.Get">
<ID>{}</ID>
</Call>
</YourMembership>
'''
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
r = requests.post('url',
data=xml.format(eventID), headers=headers)
print(r.text)
# BUILD XML TREE OBJECT
dom = et.fromstring(r.text)
# PARSE EVENT ID TEXT AND PASS INTO FUNCTION
for i in dom.iterfind('.//ID'):
xml_event_info(i.text)
示例输出(然后显示更多XML对象):
<?xml version="1.0" encoding="utf-8" ?>
<Response>
<ErrCode>xxx</ErrCode>
<ExtendedErrorInfo>xxx</ExtendedErrorInfo>
<Profile.Get>
<ID>xxxx</ID>
<WebsiteID>xxxx</WebsiteID>
<EmailBounced>xxx</EmailBounced>
<NamePrefix>xxx</NamePrefix>
<FirstName>xxx</FirstName>
</Profile.Get>
</Response>
我想从第二个API调用中获取上述示例及其许多XML属性,并将它们映射到pandas数据帧。我遇到的问题是,当我尝试在此处找到的for循环内部使用函数(xml_event_info(i.text))
时,它保存第二个API调用输出:
# PARSE EVENT ID TEXT AND PASS INTO FUNCTION
for i in dom.iterfind('.//ID'):
xml_event_info(i.text)
我正在尝试映射xml以映射到数据框中并且我不断收到错误&#39; TypeError:Parse()参数1必须是字符串或只读缓冲区,而不是无&#39; < /强>
如何将多个API调用的XML输出解析为pandas数据帧,其中每个XML标记都是数据帧的标题
Example:
---|ErrCode|ExtendedInfo|ID|FirstName----
我所指的完成工作的脚本和网站可在此处找到(http://www.austintaylor.io/lxml/python/pandas/xml/dataframe/2016/07/08/convert-xml-to-pandas-dataframe/)
脚本:
def xml2df():
tree = et.fromstring(xml_event_info(i.text))
root = tree.getroot()
all_records = []
headers = []
for i, child in enumerate(root):
record = []
for subchild in child:
record.append(subchild.text)
if subchild.tag not in headers:
headers.append(subchild.tag)
all_records.append(record)
return pd.DataFrame(all_records, columns=headers)
完整的脚本:
import requests
import xml.etree.ElementTree as et
import pandas as pd
from lxml import etree
xml ='''
<?xml version="1.0" encoding="UTF-8"?>
<YourMembership>
<Version>xxx</Version>
<ApiKey>xxxx</ApiKey>
<CallID>xxx</CallID>
<></>
<SaPasscode>xxx</SaPasscode>
<Call Method = "Events.All.GetIDs">
<StartDate>2017/01/1</StartDate>
<EndDate>2017/01/31</EndDate>
</Call>
</YourMembership>
'''
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
r = requests.post('url', data=xml, headers=headers)
def xml_event_info(eventID):
xml ='''
<?xml version="1.0" encoding="UTF-8"?>
<YourMembership>
<Version>xxx</Version>
<ApiKey>xxx</ApiKey>
<CallID>xxx</CallID>
<></>
<SaPasscode>xxx</SaPasscode>
<Call Method = "Event.Get">
<EventID>{}</EventID>
</Call>
</YourMembership>
'''
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
r = requests.post('url',
data=xml.format(eventID), headers=headers)
print(r.text)
return r.text
# BUILD XML TREE OBJECT
dom = et.fromstring(r.text)
# PARSE EVENT ID TEXT AND PASS INTO FUNCTION
for i in dom.iterfind('.//EventID'):
y = xml_event_info(i.text)
for xml in y:
tree = et.fromstring(y)
root = tree.getchildren()
all_records = []
headers = []
for i , child in enumerate(root):
record = []
for subchild in child:
record.append(subchild.text)
if subchild.tag not in headers:
headers.append(subchild.tag)
all_records.append(record)
#print all_records
print pd.DataFrame(all_records, columns=headers)
编辑:
TLDR:
如何使下面函数的输出映射到数据框中,xml元素作为数据框的标题:
import requests
import xml.etree.ElementTree as et
import pandas as pd
xml ='''
<?xml version="1.0" encoding="UTF-8"?>
<YourMembership>
<Version>xxx</Version>
<ApiKey>xxxx</ApiKey>
<CallID>xxx</CallID>
<></>
<SaPasscode>xxxx</SaPasscode>
<Call Method = "GetIDs">
</Call>
</YourMembership>
'''
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
r = requests.post('url', data=xml, headers=headers)
def xml_event_info(eventID):
xml ='''
<?xml version="1.0" encoding="UTF-8"?>
<YourMembership>
<Version>xxx</Version>
<ApiKey>xxx</ApiKey>
<CallID>xxx</CallID>
<></>
<SaPasscode>xxx</SaPasscode>
<Call Method = "Profile.Get">
<ID>{}</ID>
</Call>
</YourMembership>
'''
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
r = requests.post('url',
data=xml.format(eventID), headers=headers)
print(r.text)
输出:
<?xml version="1.0" encoding="utf-8" ?>
<Response>
<ErrCode>xxx</ErrCode>
<ExtendedErrorInfo>xxx</ExtendedErrorInfo>
<Profile.Get>
<ID>xxxx</ID>
<WebsiteID>xxxx</WebsiteID>
<EmailBounced>xxx</EmailBounced>
<NamePrefix>xxx</NamePrefix>
<FirstName>xxx</FirstName>
</Profile.Get>
</Response>
答案 0 :(得分:1)
HwndSource
函数没有返回任何内容,只需在结尾处添加xml_event_info(eventID)
语句,然后重试。
return