所以我目前有一个看起来像这样的数据框:
/requests/estimate
我已使用以下内容汇总特定日期的平均温度/安慰:
DATE_LOCAL consomation temperature site day month \
278455 2012-11-27 23:10:00 34 5.6 ID18 2012-11-27 11
278456 2012-11-27 23:20:00 40 5.6 ID18 2012-11-27 11
278457 2012-11-27 23:30:00 33 5.6 ID18 2012-11-27 11
278458 2012-11-27 23:40:00 22 5.6 ID18 2012-11-27 11
278459 2012-11-27 23:50:00 35 5.6 ID18 2012-11-27 11
week_day hour NAF code consomation_day temperature_day
278455 1 23 Hotels 29.465278 6.75
278456 1 23 Hotels 29.465278 6.75
278457 1 23 Hotels 29.465278 6.75
278458 1 23 Hotels 29.465278 6.75
278459 1 23 Hotels 29.465278 6.75
我现在的目标是为_previous_day,_day_of_previous_week和weekly_average做同样的事情。事实证明这些更加棘手,我很欣赏能够获得正确方向的任何推动。如果我需要更好地提出我的问题,请告诉我!我在这里相当新。
答案 0 :(得分:2)
这是朝着正确方向迈出的一步吗?完全简化的例子。
import pandas as pd
data = '''\
DATE_LOCAL co temperature site NAFcode
2012-11-26T23:10:00 34 5.6 ID18 Hotels
2012-11-27T23:10:00 34 5.6 ID18 Hotels
2012-11-28T23:20:00 40 5.6 ID18 Hotels
2012-11-28T23:30:00 33 5.6 ID18 Hotels
2012-11-29T23:40:00 22 5.6 ID18 Hotels
2012-12-02T23:50:00 35 5.6 ID18 Hotels
2012-12-03T23:50:00 35 5.6 ID18 Hotels
2012-12-04T23:50:00 35 5.6 ID18 Hotels'''
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+', parse_dates=['DATE_LOCAL'])
df['PD_date'] = (df['DATE_LOCAL'] - pd.Timedelta(hours=24)).dt.date
df['PW_date'] = (df['DATE_LOCAL'] - pd.Timedelta(days=7)).dt.date
# Assign new help columns
df['date'] = df['DATE_LOCAL'].dt.date
df['week'] = df['DATE_LOCAL'].dt.year.map(str)+df['DATE_LOCAL'].dt.week.map(str)
# Create a mask and assign
m = df.groupby(['date','site'])['co'].mean().to_dict()
df['co_day'] = df[['date','site']].apply(tuple, 1).map(m)
df['co_pday'] = df[['PD_date','site']].apply(tuple, 1).map(m)
df['co_pweek'] = df[['PW_date','site']].apply(tuple, 1).map(m)
# Calculate rolling 7 days (week) mean
m = df.groupby(['date','site'])['co'].mean().rolling(7, min_periods=1).mean().to_dict()
df['co_week'] = df[['date','site']].apply(tuple,1).map(m)
# Drop help-columns
df = df.set_index('date').drop(
['DATE_LOCAL','PD_date','NAFcode','PW_date','week'],axis=1)
print(df)
返回:
co temperature site co_day co_pday co_pweek co_week
date
2012-11-26 34 5.6 ID18 34.0 NaN NaN 34.000000
2012-11-27 34 5.6 ID18 34.0 34.0 NaN 34.000000
2012-11-28 40 5.6 ID18 36.5 34.0 NaN 34.833333
2012-11-28 33 5.6 ID18 36.5 34.0 NaN 34.833333
2012-11-29 22 5.6 ID18 22.0 36.5 NaN 31.625000
2012-12-02 35 5.6 ID18 35.0 NaN NaN 32.300000
2012-12-03 35 5.6 ID18 35.0 35.0 34.0 32.750000
2012-12-04 35 5.6 ID18 35.0 35.0 34.0 33.071429