大熊猫:将DataFrame与嵌套数组结合或合并JSON输出

时间:2018-08-03 15:40:48

标签: python json python-2.7 pandas dictionary

我正在使用一个标准数据框,并使用嵌套数组创建摘要数据的各种子集数据框。然后,我需要以给我预期的JSON输出的方式组合子集数据帧。 (我使用MaxU的答案来格式化大部分代码; Convert Pandas Dataframe to nested JSON

我的标准数据框的前几行(如有必要,我可以提供此示例中的所有58行):df

    ID         PRI_AFF   PRI_DEP      LOA    STATE
0   5571             M              Basic        A
1   5030             T  14700000     Blue        A
2   5030             T  14700000     Blue        A
3   5030             T  14700000     Blue        A
4   4014             T  14700000     Blue        A
5   2230             T  14700000      UFM        A
6   2230             T  14700000      UFM        A
7   2150             F  95011000   Bronze        A
8   2150             F  95011000   Bronze        A
9   2150             F  95011000   Bronze        A
10  2150             F  95011000   Bronze        A

在这里,我使用以下Python:

 PAFF_df = pd.DataFrame(df.groupby(['PRI_DEP','PRI_AFF'])['ID'].nunique().unstack().reset_index().fillna(0))
 LOA_df = pd.DataFrame(df.groupby(['PRI_DEP','LOA'])['ID'].nunique().unstack().reset_index().fillna(0))
 ST_df = pd.DataFrame(df.groupby(['PRI_DEP','STATE'])['ID'].nunique().unstack().reset_index().fillna(0))

 Nested_PAFF_df = (PAFF_df.groupby(['PRI_DEP'], as_index=True)
      .apply(lambda x: x[['A','E','F','L','M','T']].to_dict('r'))
      .reset_index()
      .rename(columns={0:'Primary_Affiliation'}))

 Nested_LOA_df = (LOA_df.groupby(['PRI_DEP'], as_index=True)
      .apply(lambda x: x[['Basic','Blue','Bronze','Invalid','UFM']].to_dict('r'))
      .reset_index()
      .rename(columns={0:'LOA'}))

 Nested_ST_df = (ST_df.groupby(['PRI_DEP'], as_index=True)
      .apply(lambda x: x[['A','E']].to_dict('r'))
      .reset_index()
      .rename(columns={0:'STATE'}))

哪个可以使用.to_json(orient ='records')

给我合适的嵌套JSON

主要关联JSON:

[{"PRI_DEP":" ","Primary_Affiliation":[{"A":0.0,"E":0.0,"F":0.0,"M":2.0,"L":0.0,"T":0.0}]},{"PRI_DEP":"14700000","Primary_Affiliation":[{"A":0.0,"E":3.0,"F":0.0,"M":1.0,"L":1.0,"T":19.0}]},{"PRI_DEP":"95011000","Primary_Affiliation":[{"A":0.0,"E":0.0,"F":1.0,"M":0.0,"L":0.0,"T":0.0}]},{"PRI_DEP":"Null","Primary_Affiliation":[{"A":0.0,"E":1.0,"F":0.0,"M":0.0,"L":0.0,"T":0.0}]},{"PRI_DEP":"ST010000","Primary_Affiliation":[{"A":1.0,"E":0.0,"F":0.0,"M":0.0,"L":0.0,"T":1.0}]}] 

LOA JSON:

[{"PRI_DEP":" ","LOA":[{"Blue":0.0,"UFM":0.0,"Invalid":0.0,"Bronze":1.0,"Basic":1.0}]},{"PRI_DEP":"14700000","LOA":[{"Blue":14.0,"UFM":5.0,"Invalid":1.0,"Bronze":4.0,"Basic":0.0}]},{"PRI_DEP":"95011000","LOA":[{"Blue":0.0,"UFM":0.0,"Invalid":0.0,"Bronze":1.0,"Basic":0.0}]},{"PRI_DEP":"Null","LOA":[{"Blue":0.0,"UFM":0.0,"Invalid":0.0,"Bronze":1.0,"Basic":0.0}]},{"PRI_DEP":"ST010000","LOA":[{"Blue":0.0,"UFM":0.0,"Invalid":1.0,"Bronze":0.0,"Basic":1.0}]}] 

状态JSON:

[{"PRI_DEP":" ","STATE":[{"A":2.0,"E":0.0}]},{"PRI_DEP":"14700000","STATE":[{"A":23.0,"E":1.0}]},{"PRI_DEP":"95011000","STATE":[{"A":1.0,"E":0.0}]},{"PRI_DEP":"Null","STATE":[{"A":1.0,"E":0.0}]},{"PRI_DEP":"ST010000","STATE":[{"A":2.0,"E":0.0}]}] 

现在,我想以某种方式通过PRI_DEP将这些全部表示为一个JSON。

因此所需的JSON将是这样的(已更新,以便于阅读):

[{"PRI_DEP":" ",
    "Primary_Affiliation":
        [{"A":0.0,"E":0.0,"F":0.0,"M":2.0,"L":0.0,"T":0.0}],
    "LOA": 
        [{"Blue":0.0,"UFM":0.0,"Invalid":0.0,"Bronze":1.0,"Basic":1.0}],
    "STATE":
        [{"A":2.0,"E":0.0}]},
 {"PRI_DEP":"14700000",
    "Primary_Affiliation": 
        [{"A":0.0,"E":3.0,"F":0.0,"M":1.0,"L":1.0,"T":19.0}],
    "LOA": 
        [{"Blue":14.0,"UFM":5.0,"Invalid":1.0,"Bronze":4.0,"Basic":0.0}],
    "STATE":
        [{"A":23.0,"E":1.0}]}, 
 {"PRI_DEP":"95011000",
    "Primary_Affiliation":
        [{"A":0.0,"E":0.0,"F":1.0,"M":0.0,"L":0.0,"T":0.0}],
    "LOA":
        [{"Blue":0.0,"UFM":0.0,"Invalid":0.0,"Bronze":1.0,"Basic":0.0}],
    "STATE":
        [{"A":1.0,"E":0.0}]},
 {"PRI_DEP":"Null",
    "Primary_Affiliation": 
        [{"A":0.0,"E":1.0,"F":0.0,"M":0.0,"L":0.0,"T":0.0}],
    "LOA":
        [{"Blue":0.0,"UFM":0.0,"Invalid":0.0,"Bronze":1.0,"Basic":0.0}],
    "STATE":
        [{"A":1.0,"E":0.0}]},
 {"PRI_DEP":"ST010000",
    "Primary_Affiliation":
        [{"A":1.0,"E":0.0,"F":0.0,"M":0.0,"L":0.0,"T":1.0}],
    "LOA":
        [{"Blue":0.0,"UFM":0.0,"Invalid":1.0,"Bronze":0.0,"Basic":1.0}],
    "STATE":
        [{"A":2.0,"E":0.0}]}]

1 个答案:

答案 0 :(得分:0)

我只是一直在用不同的方式来组合数据框,我想我已经找到了答案。

在我的原始文章(设置嵌套组)中的python代码之后,我执行了以下操作:

Group_frames = [Nested_PAFF_df.set_index('PRI_DEP'), Nested_LOA_df.set_index('PRI_DEP'), Nested_ST_df.set_index('PRI_DEP')]
result = pd.concat(Group_frames, axis=1).reset_index()
print(result.to_json(orient='records'))