这是API输出:
{
"resultLength":133710,
"resultList" : [
{
"date" : 1510872659568,
"requestParameters" : [
"datesAsStringsFormat=dd-MMM-yyyy",
"datesAsStrings=true",
"outputFormat=xlsx",
"requestId=14e7aa1f-680f-49d0-8e76-cfd797b9b6b6"
],
"score" : 1,
"totalRequestTime" : 1261,
"userId" : 167895
},
{
"date" : 1510872659679,
"requestParameters" : [
"datesAsStringsFormat=dd-MMM-yyyy",
"datesAsStrings=true",
"outputFormat=xlsx",
"requestId=14e7aa1f-680f-49d0-8e76-cfd797b9be78"
],
"score" : 1,
"totalRequestTime" : 1255,
"userId" : 452669
}
]
}
我正在尝试为每个userId提取requestId,但由于某种原因我无法提取。这就是我试过的:
req = requests.get(url=url, auth=(user,password))
out = req.json()
results = out['resultList']
solr_df = pd.DataFrame()
for record in results:
requestId = pd.DataFrame(record['requestParameters'][3],columns=['requestId'])
df = pd.DataFrame(requestId)
df['userId'] = record['userId']
solr_df = solr_df.append(df)
然而它会引发ValueError: DataFrame constructor not properly called!
你能协助纠正错误吗?
如果只为每个requestParameter提取requestId行太难了,也许你可以在for语句运行后帮助删除所有与requestId无关的行?
编辑:
当我为record['requestParameters']
运行它时,它会成功运行,但它会为每个userId提取所有requestParameters行。
我只是尝试切片:portfolioId = pd.DataFrame(record['requestParameters'][-1:0])
和它确实返回结果(与单个索引调用不同),但它适用于outputFormat和requestId行。
然后我尝试了portfolioId = pd.DataFrame(record['requestParameters'][1:2])
并返回了requestId和datesAsStrings的结果。
看起来requestParameters行没有一致索引(不确定为什么可能),还有办法从列表中获取特定行吗?
谢谢
答案 0 :(得分:0)
根据文件:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html
req = requests.get(url=url, auth=(user,password))
out = req.json()
results = out['resultList']
df = pd.DataFrame()
for record in results:
requestId = pd.DataFrame({"requestId": record['requestParameters'][3]})
df = pd.DataFrame(requestId)
df['userId'] = record['userId']
df = df.append(df)
答案 1 :(得分:0)
要提取ID,您可以尝试:
req = requests.get(url=url, auth=(user,password))
out = req.json()
results = out['resultList']
ids = [i["requestParameters"][-1] for i in results["resultList"]]
输出:
['requestId=14e7aa1f-680f-49d0-8e76-cfd797b9b6b6', 'requestId=14e7aa1f-680f-49d0-8e76-cfd797b9be78']
或者,如果您只想要数值:
import re
req = requests.get(url=url, auth=(user,password))
out = req.json()
results = out['resultList']
final_val = [re.findall('(?<=requestId\=)[\w-]+', i["requestParameters"][-1])[0] for i in results["resultList"]]
输出:
['14e7aa1f-680f-49d0-8e76-cfd797b9b6b6', '14e7aa1f-680f-49d0-8e76-cfd797b9be78']
答案 2 :(得分:0)
为什么不尝试在Json Editor Online(http://www.jsoneditoronline.org/)中解析json字符串?如果您的json字符串有效,它可能会有所帮助。
您在第12行("totalRequestTime" : 1261
)后缺少逗号。我认为这可能是第一个问题。
答案 3 :(得分:0)
我能够回答我自己的问题;因为requestParameters中的行没有正确索引(解释为什么会发生这种情况会很好),我不得不从requestParamters中提取所有行,只过滤那些包含requestId的行,然后在&#34; =&之后提取所有内容。 #34;
solr_df = pd.DataFrame()
for record in results:
df = pd.DataFrame(record['requestParameters'],columns=['requestId'])
df['userEmail'] = record['userEmail']
solr_df = solr_df.append(df)
solr_df = solr_df[solr_df.requestId.str.contains('requestId')]
solr_df['requestId'] = solr_df['requestId'].str.split('=').str.get(1)