我有一个Pandas DataFrame,我试图根据列值对行进行分组,然后将一些行合并到列表中。请允许我详细说明:
我拥有的DataFrame看起来像这样:
industry index entities
cars 0 ['Norway', 'it']
cars 0 ['Mercedes', 'they']
cars 0 ['it', 'EV', 'its']
nature 1 ['fox', 'it']
nature 1 ['them', 'rabbits']
nature 2 ['whale', 'it']
所需的DataFrame应该如下所示:
industry index entities
cars 0 [ ['Norway', 'it'], ['Mercedes', 'they'], ['it', 'EV', 'its'] ]
nature 1 [ ['fox', 'it'], ['them', 'rabbits'] ]
nature 2 ['whale', 'it']
我基本上是在尝试根据行业和索引对行进行分组,同时将列entities
的值合并到列表中。
我已经尝试过
df.groupby('industry')['index'].apply(list)
但是他们给我的结果完全不同。
我怎么能完成我想要的?谢谢。
答案 0 :(得分:5)
您需要在index
之后将entities
更改为groupby
,以处理列entities
并按list
分组-['industry','index']在groupby
语句中:
df = df.groupby(['industry', 'index'])['entities'].apply(list).reset_index()
print (df)
industry index entities
0 cars 0 [[Norway, it], [Mercedes, they], [it, EV, its]]
1 nature 1 [[fox, it], [them, rabbits]]
2 nature 2 [[whale, it]]
如果需要最后一个值不在嵌套列表中,因为if-else
可以使用lambda函数在每个组中只有一个值:
df1 = (df.groupby(['industry', 'index'])['entities']
.apply(lambda x: x.tolist() if len(x) != 1 else x.iat[0])
.reset_index())
print (df1)
industry index entities
0 cars 0 [[Norway, it], [Mercedes, they], [it, EV, its]]
1 nature 1 [[fox, it], [them, rabbits]]
2 nature 2 [whale, it]
编辑:
如果在列entities
中只是列表的字符串表示形式,则可以在上述解决方案之前通过ast
模块将值转换为列表:
print (type(df['entities'].iat[0]))
<class 'str'>
import ast
df['entities'] = df['entities'].apply(ast.literal_eval)
print (type(df['entities'].iat[0]))
<class 'list'>
答案 1 :(得分:4)
假设entities
中的元素为list
:
df.groupby(['industry', 'index'])['entities'].apply(lambda x: [l for l in x]).reset_index()
输出:
industry index entities
0 cars 0 [[Norway, it], [Mercedes, they], [it, EV, its]]
1 nature 1 [[fox, it], [them, rabbits]]
2 nature 2 [[whale, it]]