熊猫-Groupby累积时间

时间:2019-03-21 09:59:35

标签: python pandas

这是我的问题: 想象一下一个按时间索引的数据帧。

df = pd.DataFrame(index=["00:00:00", 
"00:00:08","00:00:14","00:00:21","00:00:23","00:00:49"],data={"col1":["a", 
"b","a","a", "c", "d"], "col2":[4,4,4,6,6,7], "col3":[2,17,2,2,3,50]})

我现在想应用一个函数并基于15秒间隔的累积时间对数据进行分组,即对于00:00:00-00:00:15、00:00:00-00:00之间的时间戳: 30,00:00:00-00:00:45,等等

例如,如果col1中的值在每个间隔中均为“ a”,我想对col2,col3的所有值求和并除以一个。

输出应类似于:

         output
00:00:15    2
00:00:30    2.3333

感谢任何帮助!

2 个答案:

答案 0 :(得分:3)

首先通过Members table将索引转换为时间增量,并添加15 seconds以对其进行移位,然后通过to_timedeltaboolean indexing仅过滤a行({{1 }}。

然后依次Series.eq ==,然后DataFrame.resample和最后除以DataFrame.cumsum的列:

sum

替代项转换为df.index = pd.to_timedelta(df.index) + pd.Timedelta(15, unit='s') df = df[df['col1'].eq('a')].resample('15S').sum().cumsum() df['out'] = df['col2'].div(df['col3']) print (df) col2 col3 out 00:00:15 8 4 2.000000 00:00:30 14 6 2.333333

datetime

答案 1 :(得分:1)

df = pd.DataFrame(index=["00:00:00", "00:00:08","00:00:14","00:00:21","00:00:23","00:00:49"],data={"col1":["a","b","a","a", "c", "d"], "col2":[4,4,4,6,6,7], "col3":[2,17,2,2,3,50]})
df.index = pd.to_datetime(df.index, format='%H:%M:%S')
df = df[df['col1']=='a'].resample('15s', how='sum').cumsum()
df['output'] = df['col2']/df['col3']