从列表中获取特定行失败

时间:2017-11-17 00:03:25

标签: python

这是API输出:

{
    "resultLength":133710,
    "resultList" : [
            {
            "date" :  1510872659568,
            "requestParameters" : [
              "datesAsStringsFormat=dd-MMM-yyyy",
              "datesAsStrings=true",
              "outputFormat=xlsx",
              "requestId=14e7aa1f-680f-49d0-8e76-cfd797b9b6b6"
            ],
            "score" :  1,
            "totalRequestTime" :  1261,
            "userId" :  167895
            },
            {
            "date" :  1510872659679,
            "requestParameters" : [
              "datesAsStringsFormat=dd-MMM-yyyy",
              "datesAsStrings=true",
              "outputFormat=xlsx",
              "requestId=14e7aa1f-680f-49d0-8e76-cfd797b9be78"
            ],
            "score" :  1,
            "totalRequestTime" :  1255,
            "userId" :  452669
            }
    ]
}

我正在尝试为每个userId提取requestId,但由于某种原因我无法提取。这就是我试过的:

req = requests.get(url=url, auth=(user,password))
out = req.json()
results = out['resultList']

solr_df = pd.DataFrame()
for record in results:
    requestId = pd.DataFrame(record['requestParameters'][3],columns=['requestId'])
    df = pd.DataFrame(requestId)
    df['userId'] = record['userId']
    solr_df = solr_df.append(df)

然而它会引发ValueError: DataFrame constructor not properly called! 你能协助纠正错误吗?

如果只为每个requestParameter提取requestId行太难了,也许你可以在for语句运行后帮助删除所有与requestId无关的行?

编辑:

当我为record['requestParameters']运行它时,它会成功运行,但它会为每个userId提取所有requestParameters行。

我只是尝试切片:portfolioId = pd.DataFrame(record['requestParameters'][-1:0])它确实返回结果(与单个索引调用不同),但它适用于outputFormat和requestId行。 然后我尝试了portfolioId = pd.DataFrame(record['requestParameters'][1:2])并返回了requestId和datesAsStrings的结果。

看起来requestParameters行没有一致索引(不确定为什么可能),还有办法从列表中获取特定行吗?

谢谢

4 个答案:

答案 0 :(得分:0)

根据文件:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html

req = requests.get(url=url, auth=(user,password))
out = req.json()
results = out['resultList']

df = pd.DataFrame()
for record in results:
    requestId = pd.DataFrame({"requestId": record['requestParameters'][3]})
    df = pd.DataFrame(requestId)
    df['userId'] = record['userId']
    df = df.append(df)

答案 1 :(得分:0)

要提取ID,您可以尝试:

req = requests.get(url=url, auth=(user,password))
out = req.json()
results = out['resultList']
ids = [i["requestParameters"][-1] for i in results["resultList"]]

输出:

['requestId=14e7aa1f-680f-49d0-8e76-cfd797b9b6b6', 'requestId=14e7aa1f-680f-49d0-8e76-cfd797b9be78']

或者,如果您只想要数值:

import re
req = requests.get(url=url, auth=(user,password))
out = req.json()
results = out['resultList']
final_val = [re.findall('(?<=requestId\=)[\w-]+', i["requestParameters"][-1])[0] for i in results["resultList"]]

输出:

['14e7aa1f-680f-49d0-8e76-cfd797b9b6b6', '14e7aa1f-680f-49d0-8e76-cfd797b9be78']

答案 2 :(得分:0)

为什么不尝试在Json Editor Online(http://www.jsoneditoronline.org/)中解析json字符串?如果您的json字符串有效,它可能会有所帮助。

您在第12行("totalRequestTime" : 1261)后缺少逗号。我认为这可能是第一个问题。

答案 3 :(得分:0)

我能够回答我自己的问题;因为requestParameters中的行没有正确索引(解释为什么会发生这种情况会很好),我不得不从requestParamters中提取所有行,只过滤那些包含requestId的行,然后在&#34; =&之后提取所有内容。 #34;

solr_df = pd.DataFrame()
for record in results:
    df = pd.DataFrame(record['requestParameters'],columns=['requestId'])
    df['userEmail'] = record['userEmail']
    solr_df = solr_df.append(df)

solr_df = solr_df[solr_df.requestId.str.contains('requestId')] 
solr_df['requestId'] = solr_df['requestId'].str.split('=').str.get(1)