Question

我有一个pandas dataframe df

df
 StartDate                    EndDate Value  \
0   2015-03-25 12:25:43.999994 2015-03-25 13:23:43.979992     0   
1   2015-03-25 13:23:43.999998 2015-03-25 13:24:43.979998     1   
2   2015-03-25 13:24:43.999994 2015-03-25 13:25:43.979995     0   
3   2015-03-26 13:25:44.000001 2015-03-26 13:47:43.979996     0   
4   2015-03-26 13:47:43.999992 2015-03-26 13:48:43.979993     1   
5   2015-03-26 13:48:43.999999 2015-03-26 14:25:43.980001     0   
6   2015-03-27 14:25:43.999997 2015-03-27 15:25:43.979998     0   
7   2015-03-27 15:25:43.999994 2015-03-27 15:28:43.979997     0   
8   2015-03-27 15:28:43.999993 2015-03-27 15:29:43.979994     1   
9   2015-03-27 15:29:44.000000 2015-03-27 15:59:43.979997     0

我想逐日计算一些操作...... 因此，我想提取一个子数据帧，其中只包含属于第一天的行，然后是与第二天相关的行等等。

我计划有一个for循环，并在每次迭代时选择特定日期的行......

我计算独特的一天

unique_days = df['StartDate'].map(lambda t: t.date()).unique()

然后开始循环...

# for each day compute operation 
for i in unique_day:
    print(i)
    df_day = df[df['StartDate'].map(lambda t: t.date()) == i]

    df2 = func(df_day,parameters)

Answer 1

我认为最好groupby date {{3}}并使用自定义函数应用mean，sum或apply等函数：

df1 = df.groupby(df['StartDate'].dt.date).mean()

df2 = df.groupby(df['StartDate'].dt.date).apply(func)

样品：

#some sample function
def func(df_day,parameters):
    #print each group
    print (df_day)

    return df_day['StartDate'] - pd.Timedelta(parameters, unit='d')

df2 = df.groupby(df['StartDate'].dt.date).apply(lambda x: func(x, 1))
#less readable
#df2 = df.groupby(df['StartDate'].dt.date).apply(func, 1)
print (df2)
StartDate    
2015-03-25  0   2015-03-24 12:25:43.999994
            1   2015-03-24 13:23:43.999998
            2   2015-03-24 13:24:43.999994
2015-03-26  3   2015-03-25 13:25:44.000001
            4   2015-03-25 13:47:43.999992
            5   2015-03-25 13:48:43.999999
2015-03-27  6   2015-03-26 14:25:43.999997
            7   2015-03-26 15:25:43.999994
            8   2015-03-26 15:28:43.999993
            9   2015-03-26 15:29:44.000000
Name: StartDate, dtype: datetime64[ns]

从pandas dataframe中选择具有相同数据的行

1 个答案: