我有一个pandas dataframe df
df
StartDate EndDate Value \
0 2015-03-25 12:25:43.999994 2015-03-25 13:23:43.979992 0
1 2015-03-25 13:23:43.999998 2015-03-25 13:24:43.979998 1
2 2015-03-25 13:24:43.999994 2015-03-25 13:25:43.979995 0
3 2015-03-26 13:25:44.000001 2015-03-26 13:47:43.979996 0
4 2015-03-26 13:47:43.999992 2015-03-26 13:48:43.979993 1
5 2015-03-26 13:48:43.999999 2015-03-26 14:25:43.980001 0
6 2015-03-27 14:25:43.999997 2015-03-27 15:25:43.979998 0
7 2015-03-27 15:25:43.999994 2015-03-27 15:28:43.979997 0
8 2015-03-27 15:28:43.999993 2015-03-27 15:29:43.979994 1
9 2015-03-27 15:29:44.000000 2015-03-27 15:59:43.979997 0
我想逐日计算一些操作...... 因此,我想提取一个子数据帧,其中只包含属于第一天的行,然后是与第二天相关的行等等。
我计划有一个for循环,并在每次迭代时选择特定日期的行......
我计算独特的一天
unique_days = df['StartDate'].map(lambda t: t.date()).unique()
然后开始循环...
# for each day compute operation
for i in unique_day:
print(i)
df_day = df[df['StartDate'].map(lambda t: t.date()) == i]
df2 = func(df_day,parameters)
答案 0 :(得分:3)
我认为最好groupby
date
{{3}}并使用自定义函数应用mean
,sum
或apply
等函数:
df1 = df.groupby(df['StartDate'].dt.date).mean()
df2 = df.groupby(df['StartDate'].dt.date).apply(func)
样品:
#some sample function
def func(df_day,parameters):
#print each group
print (df_day)
return df_day['StartDate'] - pd.Timedelta(parameters, unit='d')
df2 = df.groupby(df['StartDate'].dt.date).apply(lambda x: func(x, 1))
#less readable
#df2 = df.groupby(df['StartDate'].dt.date).apply(func, 1)
print (df2)
StartDate
2015-03-25 0 2015-03-24 12:25:43.999994
1 2015-03-24 13:23:43.999998
2 2015-03-24 13:24:43.999994
2015-03-26 3 2015-03-25 13:25:44.000001
4 2015-03-25 13:47:43.999992
5 2015-03-25 13:48:43.999999
2015-03-27 6 2015-03-26 14:25:43.999997
7 2015-03-26 15:25:43.999994
8 2015-03-26 15:28:43.999993
9 2015-03-26 15:29:44.000000
Name: StartDate, dtype: datetime64[ns]