这是我的问题: 想象一下一个按时间索引的数据帧。
df = pd.DataFrame(index=["00:00:00",
"00:00:08","00:00:14","00:00:21","00:00:23","00:00:49"],data={"col1":["a",
"b","a","a", "c", "d"], "col2":[4,4,4,6,6,7], "col3":[2,17,2,2,3,50]})
我现在想应用一个函数并基于15秒间隔的累积时间对数据进行分组,即对于00:00:00-00:00:15、00:00:00-00:00之间的时间戳: 30,00:00:00-00:00:45,等等
例如,如果col1中的值在每个间隔中均为“ a”,我想对col2,col3的所有值求和并除以一个。
输出应类似于:
output
00:00:15 2
00:00:30 2.3333
感谢任何帮助!
答案 0 :(得分:3)
首先通过Members table将索引转换为时间增量,并添加15 seconds
以对其进行移位,然后通过to_timedelta
和boolean indexing
仅过滤a
行({{1 }}。
然后依次Series.eq
==
,然后DataFrame.resample
和最后除以DataFrame.cumsum
的列:
sum
替代项转换为df.index = pd.to_timedelta(df.index) + pd.Timedelta(15, unit='s')
df = df[df['col1'].eq('a')].resample('15S').sum().cumsum()
df['out'] = df['col2'].div(df['col3'])
print (df)
col2 col3 out
00:00:15 8 4 2.000000
00:00:30 14 6 2.333333
:
datetime
答案 1 :(得分:1)
df = pd.DataFrame(index=["00:00:00", "00:00:08","00:00:14","00:00:21","00:00:23","00:00:49"],data={"col1":["a","b","a","a", "c", "d"], "col2":[4,4,4,6,6,7], "col3":[2,17,2,2,3,50]})
df.index = pd.to_datetime(df.index, format='%H:%M:%S')
df = df[df['col1']=='a'].resample('15s', how='sum').cumsum()
df['output'] = df['col2']/df['col3']