熊猫-按多列分组,并保留多列-

时间:2020-03-05 14:06:14

标签: python pandas

我有一个数据框

    action  person_id                       frame_no        path
0   boxing  person12_boxing_d2_uncomp.avi   image_0128.jpg  ../../../datasets/kth/train/boxing/person12_bo...
1   boxing  person12_boxing_d2_uncomp.avi   image_0129.jpg  ../../../datasets/kth/train/boxing/person12_bo...
2   walking person13_boxing_d2_uncomp.avi   image_0130.jpg  ../../../datasets/kth/train/walking/person13_b...
3   walking person13_boxing_d2_uncomp.avi   image_0131.jpg  ../../../datasets/kth/train/walking/person13_b...
4   running person13_boxing_d2_uncomp.avi   image_0132.jpg  ../../../datasets/kth/train/running/person13_b.

并且我正在尝试合并具有相同person_id的行。具有相同person_id的行肯定具有相同的action。这是我目前拥有的

df = pd.DataFrame(data_filtered, columns=["action","person_id","frame_no","path"])
#df = pd.DataFrame(df.groupby(["action","person_id"])['frame_no'].apply(list)).reset_index()
df.head()

,但是数据帧丢失了path列。我不确定如何告诉熊猫将其余的列分组,并且在Google上搜索没有帮助,因为我什至都不知道要搜索什么。抱歉,是否已被反复询问。

@ Aditya

我尝试过

df = pd.DataFrame(df.groupby(["action","person_id"])[['frame_no', 'path']].apply(list)).reset_index()

但这就是我得到的

    action  person_id                       0
0   boxing  person12_boxing_d2_uncomp.avi   [frame_no, path]
1   running person13_boxing_d2_uncomp.avi   [frame_no, path]
2   walking person13_boxing_d2_uncomp.avi   [frame_no, path]

2 个答案:

答案 0 :(得分:1)

# pd.__version__ == 0.25.1
d=[['hello',1,'GOOD','long.kw'],
   ['chipotle',2,'GOOD','bingo'],
   ['hello',3,"BAD", "lm"]]
t=pd.DataFrame(data=d, columns=['A','B','C','D'])

输出为

t.groupby('A')[['B','C']].agg(lambda x: tuple(x)).applymap(list)
               B            C
A
chipotle     [2]       [GOOD]
hello     [1, 3]  [GOOD, BAD]

答案 1 :(得分:1)

仅将GroupBy.apply更改为GroupBy.agg才能将每一列转换为列表:

print (df)
    action                      person_id        frame_no         path
0   boxing  person12_boxing_d2_uncomp.avi  image_0128.jpg  person12_bo
1   boxing  person12_boxing_d2_uncomp.avi  image_0129.jpg  person12_bo
2  walking  person13_boxing_d2_uncomp.avi  image_0130.jpg   person13_b
3  walking  person13_boxing_d2_uncomp.avi  image_0131.jpg   person13_b
4  running  person13_boxing_d2_uncomp.avi  image_0132.jpg   person13_b

df = df.groupby(["action","person_id"])['frame_no', 'path'].agg(list)
print (df)
                                                               frame_no  \
action  person_id                                                         
boxing  person12_boxing_d2_uncomp.avi  [image_0128.jpg, image_0129.jpg]   
running person13_boxing_d2_uncomp.avi                  [image_0132.jpg]   
walking person13_boxing_d2_uncomp.avi  [image_0130.jpg, image_0131.jpg]   

                                                             path  
action  person_id                                                  
boxing  person12_boxing_d2_uncomp.avi  [person12_bo, person12_bo]  
running person13_boxing_d2_uncomp.avi                [person13_b]  
walking person13_boxing_d2_uncomp.avi    [person13_b, person13_b]