基本上,我需要的是现有表格的每小时平均值(见下文)。
输入文件最初的粒度为15分钟。输出应显示每天的平均值和每天上午8点到晚上8点的平均时间。
DateTime Value Date Output Average (entire day) Average (8am - 8pm)
09/01/2017 00:00 5055.414058 -> 09/01/2017
09/01/2017 00:15 5055.414058 10/01/2017
09/01/2017 00:30 5055.414058 11/01/2017
09/01/2017 00:45 5055.414058 12/01/2017
09/01/2017 01:00 5986.204028 13/01/2017
09/01/2017 01:15 5986.204028 14/01/2017
09/01/2017 01:30 5986.204028 15/01/2017
09/01/2017 01:45 5986.204028 16/01/2017
09/01/2017 02:00 7199.824865 17/01/2017
09/01/2017 02:15 7199.824865 18/01/2017
09/01/2017 02:30 7199.824865 19/01/2017
09/01/2017 02:45 7199.824865 20/01/2017
09/01/2017 03:00 9185.008333 21/01/2017
09/01/2017 03:15 9185.008333 22/01/2017
09/01/2017 03:30 9185.008333 23/01/2017
…
13/01/2017 22:00 94080.58174
13/01/2017 22:15 94080.58174
13/01/2017 22:30 94080.58174
13/01/2017 22:45 94080.58174
13/01/2017 23:00 93231.23486
13/01/2017 23:15 93231.23486
13/01/2017 23:30 93231.23486
13/01/2017 23:45 93231.23486
14/01/2017 00:00 91619.33743
14/01/2017 00:15 91619.33743
14/01/2017 00:30 91619.33743
14/01/2017 00:45 91619.33743
14/01/2017 01:00 89894.48751
14/01/2017 01:15 89894.48751
14/01/2017 01:30 89894.48751
…
import pandas as pd
import datetime
import numpy as np
import glob
import csv
# Local path and name of the excel file.
path = 'W:/myfolder/'
sheetname = "Forecast_" + datetime.datetime.today().strftime('%d.%m.%Y-%H')
filename = path + sheetname + ".csv"
#Create data frame of data
df = pd.read_csv(filename ,delimiter=',',engine = 'python', encoding='latin-1', index_col = False)
print(df)
table = df.groupby([df["DateTime"].dt.day, df["DateTime"].dt.hour]).mean()
print(table)
答案 0 :(得分:1)
只需要创建包含日期的第一列(表示为日期)并执行groupby +聚合。
我会尽快发布代码:)
编辑:正如承诺的那样,from datetime import datetime
import pandas as pd
import os
folderPath = "data/"
#Put all dataframes together
def folderIterator(folderPath):
for item in os.listdir(folderPath):
yield pd.read_csv("{:s}{:s}".format(folderPath, item))
dfIterator = folderIterator(folderPath)
fullDataFrame = dfIterator.next()
for df in dfIterator:
fullDataFrame = fullDataFrame.append(df)
#Create date column
fullDataFrame["DayCol"] = fullDataFrame["DateTime"].map(lambda x: datetime.strptime(x, '%d/%m/%Y %H:%M')).map(lambda x: x.strftime("%d/%m/%Y"))
finalDF = fullDataFrame.groupby("DayCol").mean()
print finalDF
询问有关代码的任何问题:)
答案 1 :(得分:0)
使用pd.TimeGrouper
,query
和pd.concat
tidx = pd.date_range('2016-03-30', '2016-04-01', freq='2H')
df = pd.DataFrame(dict(value=np.random.rand(len(tidx))), tidx)
from8to8 = df.assign(hour=df.index.hour).query('8 >= hour < 9') \
.groupby(pd.TimeGrouper('D')).value.mean().rename('8to8')
daily = df.groupby(pd.TimeGrouper('D')).value.mean().rename('day')
pd.concat([df.value, daily, from8to8], axis=1).ffill()
value day 8to8
2016-03-30 00:00:00 0.671287 0.565916 0.704173
2016-03-30 02:00:00 0.997307 0.565916 0.704173
2016-03-30 04:00:00 0.335283 0.565916 0.704173
2016-03-30 06:00:00 0.722650 0.565916 0.704173
2016-03-30 08:00:00 0.794335 0.565916 0.704173
2016-03-30 10:00:00 0.992366 0.565916 0.704173
2016-03-30 12:00:00 0.206157 0.565916 0.704173
2016-03-30 14:00:00 0.480467 0.565916 0.704173
2016-03-30 16:00:00 0.389169 0.565916 0.704173
2016-03-30 18:00:00 0.326746 0.565916 0.704173
2016-03-30 20:00:00 0.458807 0.565916 0.704173
2016-03-30 22:00:00 0.416415 0.565916 0.704173
2016-03-31 00:00:00 0.344517 0.487147 0.409475
2016-03-31 02:00:00 0.095404 0.487147 0.409475
2016-03-31 04:00:00 0.412321 0.487147 0.409475
2016-03-31 06:00:00 0.384827 0.487147 0.409475
2016-03-31 08:00:00 0.810305 0.487147 0.409475
2016-03-31 10:00:00 0.052873 0.487147 0.409475
2016-03-31 12:00:00 0.936284 0.487147 0.409475
2016-03-31 14:00:00 0.303524 0.487147 0.409475
2016-03-31 16:00:00 0.347630 0.487147 0.409475
2016-03-31 18:00:00 0.787372 0.487147 0.409475
2016-03-31 20:00:00 0.989716 0.487147 0.409475
2016-03-31 22:00:00 0.380994 0.487147 0.409475
2016-04-01 00:00:00 0.091029 0.091029 0.091029