我有以下数据框my_df
:
name timestamp color
---------------------------
John 2017-01-01 blue
John 2017-01-02 blue
John 2017-01-03 blue
John 2017-01-04 yellow
John 2017-01-05 red
John 2017-01-06 red
Ann 2017-01-04 green
Ann 2017-01-05 orange
Ann 2017-01-06 orange
Ann 2017-01-07 red
Ann 2017-01-08 black
Dan 2017-01-11 blue
Dan 2017-01-12 blue
Dan 2017-01-13 green
Dan 2017-01-14 yellow
然后我使用以下代码查找每个人的颜色序列:
new_df = my_df.groupby(['name'], as_index=False).color \
.agg({"color_list": lambda x: list(x)})
然后new_df
看起来像:
name color_list
-----------------------------------------------
John blue, blue, blue, yellow, red, red
Ann green, orange, orange,red, black
Dan blue, blue, green, yellow
但是,如果我想创建一个color_seq
(没有连续的重复颜色)而不是color_list
,如下所示,我该如何修改上面的代码?谢谢!
name color_seq
-----------------------------------------------
John blue, yellow, red
Ann green, orange, red, black
Dan blue, green, yellow
答案 0 :(得分:1)
如果您允许非连续重复,则必须仔细过滤。 一种方法:
def filter(l):
l.append(None)
return ','.join([x for (i,x) in enumerate (l[:-1])
if l[i] != l[i+1]])
out=df.groupby('name')['color'].apply(list).apply(filter)
代表
name
Ann green,orange,red,black
Dan blue,green,yellow
John blue,yellow,red
Name: color, dtype: object