我有一个pandas DataFrame,其中timedeltas是以毫秒表示的单独列中的这些增量的累积和。下面提供了一个示例:
Transaction_ID Time TimeDelta CumSum[ms]
1 00:00:04.500 00:00:00.000 000
2 00:00:04.600 00:00:00.100 100
3 00:00:04.762 00:00:00.162 262
4 00:00:05.543 00:00:00.781 1043
5 00:00:09.567 00:00:04.024 5067
6 00:00:10.654 00:00:01.087 6154
7 00:00:14.300 00:00:03.646 9800
8 00:00:14.532 00:00:00.232 10032
9 00:00:16.500 00:00:01.968 12000
10 00:00:17.543 00:00:01.043 13043
我希望能够为CumSum [ms]提供最大值,之后累计和将从0重新开始。例如,如果上例中的最大值为3000,则结果看起来像这样:
Transaction_ID Time TimeDelta CumSum[ms]
1 00:00:04.500 00:00:00.000 000
2 00:00:04.600 00:00:00.100 100
3 00:00:04.762 00:00:00.162 262
4 00:00:05.543 00:00:00.781 1043
5 00:00:09.567 00:00:04.024 0
6 00:00:10.654 00:00:01.087 1087
7 00:00:14.300 00:00:03.646 0
8 00:00:14.532 00:00:00.232 232
9 00:00:16.500 00:00:01.968 2200
10 00:00:17.543 00:00:01.043 0
我已经探索过使用模运算符,但是当得到的cumsum等于所提供的限制时(即500%500的cumsum [ms]等于零),我只能成功地重置为零。
如果您有任何想法,请提前致谢,如果我能提供更多信息,请告诉我。
答案 0 :(得分:7)
以下是通过迭代数据框中的每一行来实现此目的的示例。为简单起见,我为示例创建了新数据:
df = pd.DataFrame({'TimeDelta': np.random.normal( 900, 60, size=100)})
print df.head()
TimeDelta
0 971.021295
1 734.359861
2 867.000397
3 992.166539
4 853.281131
所以让我们用你想要的最大值3000来做累加器循环:
maxvalue = 3000
lastvalue = 0
newcum = []
for row in df.iterrows():
thisvalue = row[1]['TimeDelta'] + lastvalue
if thisvalue > maxvalue:
thisvalue = 0
newcum.append( thisvalue )
lastvalue = thisvalue
然后将newcom
列表放入数据框:
df['newcum'] = newcum
print df.head()
TimeDelta newcum
0 801.977678 801.977678
1 893.296429 1695.274107
2 935.303566 2630.577673
3 850.719497 0.000000
4 951.554206 951.554206