如果我有一些数据(24小时时间序列)读入熊猫:
import pandas as pd
import numpy as np
#read CSV file
df = pd.read_csv('https://raw.githubusercontent.com/bbartling/Building-Demand-Electrical-Load-Profiles/master/july15.csv',
index_col='Date', parse_dates=True)
如何将这些时间戳之间的df
列kW
平均成一个新的单独的df熊猫?
bkps_timestamps_kW = [
'2013-06-19 00:15:00',
'2013-06-19 05:15:00',
'2013-06-19 16:30:00',
'2014-06-18 00:00:00']
新熊猫df的列名可能类似于avg_kw1, avg_kw2, avg_kw3
,它表示bkps_timestamps_kW
中时间戳之间的数据平均值。
感谢您的帮助/提示
答案 0 :(得分:0)
我认为您需要cut
才能将列表转换为日期时间并进行汇总,并汇总mean
:
d = [
'2013-06-19 00:00:00',
'2013-06-19 00:15:00',
'2013-06-19 01:15:00',
'2013-06-19 05:15:00',
'2013-06-19 07:15:00',
'2013-06-19 16:30:00',
'2013-06-20 16:30:00',
'2014-06-18 00:00:00',
'2015-06-18 00:00:00']
df = pd.DataFrame({'Date':range(len(d))}, index=pd.to_datetime(d))
print (df)
Date
2013-06-19 00:00:00 0
2013-06-19 00:15:00 1
2013-06-19 01:15:00 2
2013-06-19 05:15:00 3
2013-06-19 07:15:00 4
2013-06-19 16:30:00 5
2013-06-20 16:30:00 6
2014-06-18 00:00:00 7
2015-06-18 00:00:00 8
bkps_timestamps_kW = [
'2013-06-19 00:15:00',
'2013-06-19 05:15:00',
'2013-06-19 16:30:00',
'2014-06-18 00:00:00']
b = pd.to_datetime(bkps_timestamps_kW)
labels = [f'{i}-{j}' for i, j in zip(bkps_timestamps_kW[:-1], bkps_timestamps_kW[1:])]
df = df.groupby(pd.cut(df.index, bins=b, labels=labels)).mean()
print (df)
Date
2013-06-19 00:15:00-2013-06-19 05:15:00 2.5
2013-06-19 05:15:00-2013-06-19 16:30:00 4.5
2013-06-19 16:30:00-2014-06-18 00:00:00 6.5
如果需要在cut
中以左间隔关闭:
df = df.groupby(pd.cut(df.index, bins=b, labels=labels, right=False)).mean()
print (df)
Date
2013-06-19 00:15:00-2013-06-19 05:15:00 1.5
2013-06-19 05:15:00-2013-06-19 16:30:00 3.5
2013-06-19 16:30:00-2014-06-18 00:00:00 5.5