我正在寻找最好的pythonic方法,将每小时之后彼此跟随的值的总和进行分组。示例:我有以下带有“日期”和“降雨”列的数据框,并添加了所需结果的示例:
date rainfall RE 31/12/17 23:00 0.88 None 01/01/18 00:00 0.38 1.26 01/01/18 01:00 0 None 01/01/18 02:00 0.22 0.22 01/01/18 03:00 0 None 01/01/18 04:00 0 None 01/01/18 13:00 0 None 01/01/18 14:00 0 None 01/01/18 15:00 0.55 0.55 01/01/18 16:00 0 None 01/01/18 17:00 1.31 1.31 01/01/18 18:00 0 None 01/01/18 19:00 0.49 0.49 01/01/18 20:00 0 None 01/01/18 21:00 0 None 01/01/18 22:00 0 None 01/01/18 23:00 0 None 02/01/18 00:00 0.7 None 02/01/18 01:00 0.22 None 02/01/18 02:00 0.61 None 02/01/18 03:00 0.42 1.95 02/01/18 04:00 0 None 02/01/18 05:00 1.69 1.69 02/01/18 06:00 0 None 02/01/18 07:00 0 None 02/01/18 08:00 0 None
我希望清楚
非常感谢您的帮助,
雷米(Rémy)
答案 0 :(得分:0)
如果我理解正确,那么您想要的是一笔总和。 以下将输出两个期间“降雨”的滚动总和的熊猫系列
df['rainfall'].rolling(2).sum()
当然,您的DataFrame
必须按照示例中显示的顺序进行排序。
将其添加为DataFrame的一部分将是:
df['rainfall_rolling_sum'] = df['rainfall'].rolling(2).sum()
EDIT1:
如果您只是想累计降雨的使用量,
df['rainfall_cumsum'] = df['rainfall'].cumsum()
函数cumsum()
是累加和
EDIT2:
rf_not_zero = df['rainfall'] != 0
df['rainfall_accum'] = df['rainfall'].cumsum()-df['rainfall'].cumsum().where(~df_not_zero).ffill().fillna(0).astype(int)
答案 1 :(得分:0)
这不是最好的方法,但是当您拥有大型数据帧时,这可能是一种简单的方法。 (如果在开始或结束时下雨,在此示例中将忽略该警告。请参阅警告)
import pandas as pd
import numpy as np
# generate pseudo dataframe
rng = pd.date_range('1/1/2012', periods=20, freq='H')
rain = np.random.choice([0,0.5,1,2], size=20, p=[0.4,0.2,0.2,0.2])
df = pd.DataFrame()
df['data'] = rng
df['rain'] = rain
# convert rain to boolean
df['is_rain'] = df['rain'] > 0
# shift rain for one slot to recognice if the state has changed
data = list(df['is_rain'].values[0:-1])
shifted = [data[0]]
shifted.extend(data)
df['is_rain_next'] = shifted
# get start and endpoints of rain (the end has to be excluded!
# it is always the next period, when it isn't raining anymore)
df['rain_start'] = df['is_rain'] < df['is_rain_next']
df['rain_end'] = df['is_rain'] > df['is_rain_next']
# this are the starts and ends you can use them for getting the groups from dataframe
df[df['rain_start']].index
df[df['rain_end']].index
警告:阵列的大小可能不同,这是由末尾和开始时的未知状态引起的。因此,请检查rain_end中的第一个元素是否小于rain_start中的第一个元素,是否可以删除它。另外,如果rain_start的最后一个元素大于rain_end的最后一个元素(我认为这是合乎逻辑的)