我有一个数据框,其中包含如下所示的降水数据
Date Time, Raw Measurement, Site ID, Previous Raw Measurement, Raw - Previous
2020-05-06 14:15:00,12.56,8085,12.56,0.0
2020-05-06 14:30:00,12.56,8085,12.56,0.0
2020-05-06 14:45:00,12.56,8085,12.56,0.0
2020-05-06 15:00:00,2.48,8085,12.56,-10.08
2020-05-06 15:30:00,2.48,8085,2.47,0.01
2020-05-06 15:45:00,2.48,8085,2.48,0.0
2020-05-06 16:00:00,2.50,8085,2.48,0.02
2020-05-06 16:15:00,2.50,8085,2.50,0.0
2020-05-06 16:30:00,2.50,8085,2.50,0.0
2020-05-06 16:45:00,2.51,8085,2.50,0.01
2020-05-06 17:00:00,2.51,8085,2.51,0.0
我想使用最后一列“原始-上一列”(Raw-Previous),它只是最新观察值与先前观察值之间的差异,以创建运行中的正变化的总计,以构成一个累积列。我有时会不时清空雨量计,因此“ Raw-Previous”在发生时会为负数,我想将其从df中滤除,同时保持总计的累计。我遇到了使用
df.sum()
但据我所知,它们仅提供整列的总和,而不是每行之后的运行总和。
我的总体目标是拥有这样的东西
Date Time, Raw Measurement, Site ID, Previous Raw Measurement, Raw - Previous, Total Accumulation
2020-05-06 14:15:00,12.56,8085,12.56,0.0,12.56
2020-05-06 14:30:00,12.56,8085,12.56,0.0,12.56
2020-05-06 14:45:00,12.56,8085,12.56,0.0,12.56
2020-05-06 15:00:00,2.48,8085,12.56,-10.08,12.56
2020-05-06 15:15:00,2.47,8085,2.48,-0.01,12.56
2020-05-06 15:30:00,2.48,8085,2.47,0.01,12.57
2020-05-06 15:45:00,2.48,8085,2.48,0.0,12.57
2020-05-06 16:00:00,2.50,8085,2.48,0.02,12.59
2020-05-06 16:15:00,2.50,8085,2.50,0.0,12.59
2020-05-06 16:30:00,2.50,8085,2.50,0.0,12.59
2020-05-06 16:45:00,2.51,8085,2.50,0.01,12.60
2020-05-06 17:00:00,2.51,8085,2.51,0.0,12.60
编辑:更改标题以更好地反映问题变成了什么
答案 0 :(得分:2)
np.where
也会做。
import pandas as pd, numpy as np
df['Total Accumulation'] = np.where((df['Raw - Previous'] > 0), df['Raw - Previous'], 0).cumsum() + df.iloc[0,3]
df
输出:
Date Time Raw Measurement Site ID Previous Raw Measurement Raw - Previous Total Accumulation
0 2020-05-06 14:15:00 12.56 8085 12.56 0.00 12.56
1 2020-05-06 14:30:00 12.56 8085 12.56 0.00 12.56
2 2020-05-06 14:45:00 12.56 8085 12.56 0.00 12.56
3 2020-05-06 15:00:00 2.48 8085 12.56 -10.08 12.56
4 2020-05-06 15:30:00 2.48 8085 2.47 0.01 12.57
5 2020-05-06 15:45:00 2.48 8085 2.48 0.00 12.57
6 2020-05-06 16:00:00 2.50 8085 2.48 0.02 12.59
7 2020-05-06 16:15:00 2.50 8085 2.50 0.00 12.59
8 2020-05-06 16:30:00 2.50 8085 2.50 0.00 12.59
9 2020-05-06 16:45:00 2.51 8085 2.50 0.10 12.69
10 2020-05-06 17:00:00 2.51 8085 2.51 0.00 12.69
答案 1 :(得分:1)
您可以使用clip()
来裁剪负值,然后使用cumsum
来消除差异的总和:
df['Total'] = df['Raw - Previous'].clip(lower=0).cumsum() + df['Raw Measurement'].iloc[0]
输出:
0 12.56
1 12.56
2 12.56
3 12.56
4 12.56
5 12.57
6 12.57
7 12.59
8 12.59
9 12.59
10 12.60
11 12.60
Name: Raw - Previous, dtype: float64