我有数据帧df,如下所示:
DateTime Value
2011-01-01 01:00:00 5
2011-01-01 01:30:00 5.5
2011-01-01 02:00:00 6.7
2011-01-01 02:30:00 6.9
.
.
2011-01-30 23:45:00 86.5
我希望每天早上8:45将Value
重置为0.0
。此外,我还想添加一个新列Difference
,它是下一行与从第二行开始的一行值列之间的差异。例如5.5 - 5 = 0.5
因此我的输出应该如下:
DateTime Value Difference
2011-01-01 01:00:00 5 0
2011-01-01 01:30:00 5.5 0.5
2011-01-01 02:00:00 6.7 1.2
2011-01-01 02:30:00 6.9 0.2
.
.
2011-01-01 08:25:00 10.5 5.0
2011-01-01 08:30:00 12.5 2.0
2011-01-01 08:45:00 0.00 0.0
2011-01-01 09:00:00 9.0 9.0
.
2011-01-30 23:45:00 86.5 2.5
我怎么可能这样做?
答案 0 :(得分:3)
首先,创建一个列,用于指示每天开始的位置(08:45):
# assuming your DataFrame is named "df"
# also assuming df['Datetime'] is not yet pandas.datetime objects
df['myDate'] = (df['DateTime'].apply(lambda x: pd.datetime.strftime(x, "%H:%M")) == "08:45").cumsum()
这会将每一行标记为True
,其中时间为08:45
,其余为假;当我们获取这些数字的累积总和时,每天的第一个True
为第i天的剩余时间生成值,第二天为i + 1,等等。现在,获取Difference
,你已经描述过了,我们只需要这样做:
df['Difference'] = df.groupby('myDate')['Value'].diff().fillna(0)
您可以在myDate
(Difference
)后删除df.drop('myDate', axis=1, inplace=True)
,或者如果您希望将其设为一行,则可以完全跳过临时列分配:
# perhaps too long for one line :)
df['Difference'] = df.groupby((df['DateTime'].apply(lambda x: pd.datetime.strftime(x, "%H:%M")) == "08:45").cumsum())['Value'].diff().fillna(0)
输出(假设您保留临时列):
DateTime Value myDate Difference
1 2011-01-01 01:00:00 5.0 0 0.0
2 2011-01-01 01:30:00 5.5 0 0.5
3 2011-01-01 02:00:00 6.7 0 1.2
4 2011-01-01 02:30:00 6.9 0 0.2
5 2011-01-01 08:25:00 10.5 0 3.6
6 2011-01-01 08:30:00 12.5 0 2.0
7 2011-01-01 08:45:00 0.0 1 0.0
8 2011-01-01 09:00:00 9.0 1 9.0
答案 1 :(得分:1)
解决方案可能是:
>>> df
0
0 5.5
1 6.7
2 3.4
3 8.9
>>> df[1]=df[0]
>>> df[1][0]=0.0
>>> df[1][1:]=[df[0][i]-df[0][i-1] for i in range(1,len(df[0]))]
>>> df
0 1
0 5.5 0.0
1 6.7 1.2
2 3.4 -3.3
3 8.9 5.5
但@cmaher的解决方案要好得多!