我想做以下事情:
表示如下所示的数据框:
df = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/24/2014","06/25/2014","06/23/2014","07/02/1999","07/02/1999"], "value": ["3","5","1","7","8"] })
我想按日期分组所有彼此在2天内的观察结果。然后,例如,前3行将被分组,最后两行将被分组。
到目前为止,我已经考虑过使用类似的东西:
df.groupby(df['date'].map(lambda x: x.month))
这种"模糊组合"?
的一般方法是什么?谢谢,
答案 0 :(得分:5)
您可以按date
对行进行排序,然后记录连续日期之间的差异。
当差异大于2天时进行测试。取累积总和分配所需的组号:
import pandas as pd
df = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/24/2014","06/25/2014","06/23/2014","07/02/1999","07/02/1999"], "value": ["3","5","1","7","8"] })
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by='date')
df['group'] = (df['date'].diff() > pd.Timedelta(days=2)).cumsum()
print(df)
产量
ID date value group
3 B 1999-07-02 7 0
4 B 1999-07-02 8 0
2 C 2014-06-23 1 1
0 A 2014-06-24 3 1
1 A 2014-06-25 5 1