熊猫分组后将多列转换为字典

时间:2020-06-17 14:10:53

标签: python pandas

我有一个数据框

df = pd.DataFrame({"a":[1,1,1,2,2,2,3,3], "b":["a","a","a","b","b","b","c","c"], "c":[0,0,1,0,1,1,0,1], "d":["x","y","z","x","y","y","z","x"]})


    a   b   c   d
0   1   a   0   x
1   1   a   0   y
2   1   a   1   z
3   2   b   0   x
4   2   b   1   y
5   2   b   1   y
6   3   c   0   z
7   3   c   1   x

我想对a列和b列进行分组以获取以下输出:

    a   b   e
0   1   a   [{'c': 0, 'd': 'x'}, {'c': 0, 'd': 'y'}, {'c': 1, 'd': 'z'}]
1   2   b   [{'c': 0, 'd': 'x'}, {'c': 1, 'd': 'y'}, {'c': 1, 'd': 'y'}]
2   3   c   [{'c': 0, 'd': 'z'}, {'c': 1, 'd': 'x'}]

我的解决方案:

new_df = df.groupby(["a","b"])["c","d"].apply(lambda x: x.to_dict(orient="records")).reset_index(name="e")

但是问题是它的行为不一致,有时我遇到以下错误:

reset_index()获得了意外的关键字参数“名称”

如果有人指出上述解决方案中的问题或提供其他解决方法,这将很有帮助。

2 个答案:

答案 0 :(得分:2)

您可以

import requests
skills_url = 'https://match.emsiskills.com/api/emsi-services/profiles/rankings/skills'
data = '{"filter":{"title":["15.74"]},"rank":{"by":"profiles","limit":60,"min_profiles":1}}'
r = requests.post(skills_url, data=data, json=True)

答案 1 :(得分:1)

或者我们可以做:

df['e'] = df[['c', 'd']].agg(lambda s: dict(zip(s.index, s.values)), axis=1)
df1 = df.groupby(['a', 'b'])['e'].agg(list).reset_index()

# print(df1)
   a  b                                                  e
0  1  a  [{'c': 0, 'd': 'x'}, {'c': 0, 'd': 'y'}, {'c':...
1  2  b  [{'c': 0, 'd': 'x'}, {'c': 1, 'd': 'y'}, {'c':...
2  3  c           [{'c': 0, 'd': 'z'}, {'c': 1, 'd': 'x'}]