Question

我有数据帧df，如下所示：

   DateTime                  Value
   2011-01-01 01:00:00        5
   2011-01-01 01:30:00        5.5
   2011-01-01 02:00:00        6.7
   2011-01-01 02:30:00        6.9
   .
   .
   2011-01-30 23:45:00        86.5

我希望每天早上8:45将Value重置为0.0。此外，我还想添加一个新列Difference，它是下一行与从第二行开始的一行值列之间的差异。例如5.5 - 5 = 0.5

因此我的输出应该如下：

   DateTime                  Value    Difference
   2011-01-01 01:00:00        5          0
   2011-01-01 01:30:00        5.5        0.5
   2011-01-01 02:00:00        6.7        1.2
   2011-01-01 02:30:00        6.9        0.2
   .
   .
   2011-01-01 08:25:00        10.5       5.0
   2011-01-01 08:30:00        12.5       2.0
   2011-01-01 08:45:00        0.00       0.0
   2011-01-01 09:00:00        9.0        9.0
   .
   2011-01-30 23:45:00        86.5       2.5

我怎么可能这样做？

Answer 1

首先，创建一个列，用于指示每天开始的位置（08:45）：

# assuming your DataFrame is named "df"
# also assuming df['Datetime'] is not yet pandas.datetime objects
df['myDate'] = (df['DateTime'].apply(lambda x: pd.datetime.strftime(x, "%H:%M")) == "08:45").cumsum()

这会将每一行标记为True，其中时间为08:45，其余为假;当我们获取这些数字的累积总和时，每天的第一个True为第i天的剩余时间生成值，第二天为i + 1，等等。现在，获取Difference，你已经描述过了，我们只需要这样做：

df['Difference'] = df.groupby('myDate')['Value'].diff().fillna(0)

您可以在myDate（Difference）后删除df.drop('myDate', axis=1, inplace=True)，或者如果您希望将其设为一行，则可以完全跳过临时列分配：

# perhaps too long for one line :)
df['Difference'] = df.groupby((df['DateTime'].apply(lambda x: pd.datetime.strftime(x, "%H:%M")) == "08:45").cumsum())['Value'].diff().fillna(0)

输出（假设您保留临时列）：

              DateTime  Value  myDate  Difference
1  2011-01-01 01:00:00    5.0       0         0.0
2  2011-01-01 01:30:00    5.5       0         0.5
3  2011-01-01 02:00:00    6.7       0         1.2
4  2011-01-01 02:30:00    6.9       0         0.2
5  2011-01-01 08:25:00   10.5       0         3.6
6  2011-01-01 08:30:00   12.5       0         2.0
7  2011-01-01 08:45:00    0.0       1         0.0
8  2011-01-01 09:00:00    9.0       1         9.0

Answer 2

解决方案可能是：

>>> df
     0
0  5.5
1  6.7
2  3.4
3  8.9
>>> df[1]=df[0]
>>> df[1][0]=0.0
>>> df[1][1:]=[df[0][i]-df[0][i-1] for i in range(1,len(df[0]))]
>>> df
     0    1
0  5.5  0.0
1  6.7  1.2
2  3.4 -3.3
3  8.9  5.5

但@cmaher的解决方案要好得多！

在特定时间步长重置数据框列中的值，并减去行

2 个答案: