按列分组并获取另一列值

时间:2015-01-29 17:11:28

标签: python pandas

这是种子数据集:

In[1]: my_data =
      [{'client':'A','product_s_n':'1','status':'in_store','month':'Jan'}, 
       {'client':'A','product_s_n':'1','status':'sending', 'month':'Feb'}, 
       {'client':'A','product_s_n':'2','status':'in_store','month':'Jan'},
       {'client':'A','product_s_n':'2','status':'in_store','month':'Feb'},
       {'client':'B','product_s_n':'3','status':'in_store','month':'Jan'},
       {'client':'B','product_s_n':'3','status':'sending', 'month':'Feb'},
       {'client':'B','product_s_n':'4','status':'in_store','month':'Jan'},
       {'client':'B','product_s_n':'4','status':'in_store','month':'Feb'},
       {'client':'C','product_s_n':'5','status':'in_store','month':'Jan'},
       {'client':'C','product_s_n':'5','status':'sending', 'month':'Feb'}]
df = pd.DataFrame(my_data)
df

Out[1]:
      client    month   product_s_n   status
0       A       Jan     1             in_store
1       A       Feb     1             sending
2       A       Jan     2             in_store
3       A       Feb     2             in_store
4       B       Jan     3             in_store
5       B       Jan     4             in_store
6       B       Feb     4             in_store
8       C       Jan     5             sending

我想问这个数据的问题是:每个product_serial_number的客户端是什么?根据此示例中的数据,这就是生成的DataFrame的样子(我需要一个新的DataFrame):

    product_s_n    client   
0        1            A
1        2            A
2        3            B
3        4            B
4        5            C

正如您可能已经注意到的那样,'状态'和'月'字段只是为了给出意义。和结构到此示例数据集中的数据。尝试使用groupby,没有成功。有什么想法吗?

谢谢!

1 个答案:

答案 0 :(得分:2)

调用df.groupby(['product_s_n'])后,您可以通过使用['client']建立索引来限制对特定列的关注。然后,您可以致电client,从每个组中选择first()的第一个值。

>>> df.groupby(['product_s_n'])['client'].first()    
product_s_n
1              A
2              A
3              B
4              B
5              C
Name: client, dtype: object