熊猫在时间戳之间平均数据

时间:2020-05-06 14:17:20

标签: python pandas

如果我有一些数据(24小时时间序列)读入熊猫:

import pandas as pd
import numpy as np


#read CSV file
df = pd.read_csv('https://raw.githubusercontent.com/bbartling/Building-Demand-Electrical-Load-Profiles/master/july15.csv', 
                 index_col='Date', parse_dates=True)

如何将这些时间戳之间的dfkW平均成一个新的单独的df熊猫?

bkps_timestamps_kW = [
'2013-06-19 00:15:00',
'2013-06-19 05:15:00',
'2013-06-19 16:30:00',
'2014-06-18 00:00:00']

新熊猫df的列名可能类似于avg_kw1, avg_kw2, avg_kw3,它表示bkps_timestamps_kW中时间戳之间的数据平均值。

感谢您的帮助/提示

1 个答案:

答案 0 :(得分:0)

我认为您需要cut才能将列表转换为日期时间并进行汇总,并汇总mean

d = [
'2013-06-19 00:00:00',
'2013-06-19 00:15:00',
'2013-06-19 01:15:00',
'2013-06-19 05:15:00',
'2013-06-19 07:15:00',
'2013-06-19 16:30:00',
'2013-06-20 16:30:00',
'2014-06-18 00:00:00',
'2015-06-18 00:00:00']
df = pd.DataFrame({'Date':range(len(d))}, index=pd.to_datetime(d))
print (df)
                     Date
2013-06-19 00:00:00     0
2013-06-19 00:15:00     1
2013-06-19 01:15:00     2
2013-06-19 05:15:00     3
2013-06-19 07:15:00     4
2013-06-19 16:30:00     5
2013-06-20 16:30:00     6
2014-06-18 00:00:00     7
2015-06-18 00:00:00     8

bkps_timestamps_kW = [
'2013-06-19 00:15:00',
'2013-06-19 05:15:00',
'2013-06-19 16:30:00',
'2014-06-18 00:00:00']


b = pd.to_datetime(bkps_timestamps_kW)
labels = [f'{i}-{j}' for i, j in zip(bkps_timestamps_kW[:-1], bkps_timestamps_kW[1:])] 

df = df.groupby(pd.cut(df.index, bins=b, labels=labels)).mean()
print (df)
                                         Date
2013-06-19 00:15:00-2013-06-19 05:15:00   2.5
2013-06-19 05:15:00-2013-06-19 16:30:00   4.5
2013-06-19 16:30:00-2014-06-18 00:00:00   6.5

如果需要在cut中以左间隔关闭:

df = df.groupby(pd.cut(df.index, bins=b, labels=labels, right=False)).mean()
print (df)
                                         Date
2013-06-19 00:15:00-2013-06-19 05:15:00   1.5
2013-06-19 05:15:00-2013-06-19 16:30:00   3.5
2013-06-19 16:30:00-2014-06-18 00:00:00   5.5