熊猫日期时间聚合

时间:2018-05-14 20:53:50

标签: python pandas datetime

所以我目前有一个看起来像这样的数据框:

/requests/estimate

我已使用以下内容汇总特定日期的平均温度/安慰:

    DATE_LOCAL  consomation  temperature  site         day  month  \
278455 2012-11-27 23:10:00           34          5.6  ID18  2012-11-27     11   
278456 2012-11-27 23:20:00           40          5.6  ID18  2012-11-27     11   
278457 2012-11-27 23:30:00           33          5.6  ID18  2012-11-27     11   
278458 2012-11-27 23:40:00           22          5.6  ID18  2012-11-27     11   
278459 2012-11-27 23:50:00           35          5.6  ID18  2012-11-27     11   

        week_day  hour NAF code  consomation_day  temperature_day  
278455         1    23   Hotels        29.465278             6.75  
278456         1    23   Hotels        29.465278             6.75  
278457         1    23   Hotels        29.465278             6.75  
278458         1    23   Hotels        29.465278             6.75  
278459         1    23   Hotels        29.465278             6.75 

我现在的目标是为_previous_day,_day_of_previous_week和weekly_average做同样的事情。事实证明这些更加棘手,我很欣赏能够获得正确方向的任何推动。如果我需要更好地提出我的问题,请告诉我!我在这里相当新。

1 个答案:

答案 0 :(得分:2)

这是朝着正确方向迈出的一步吗?完全简化的例子。

import pandas as pd
data = '''\
DATE_LOCAL                    co  temperature  site   NAFcode
2012-11-26T23:10:00           34          5.6  ID18    Hotels
2012-11-27T23:10:00           34          5.6  ID18    Hotels
2012-11-28T23:20:00           40          5.6  ID18    Hotels
2012-11-28T23:30:00           33          5.6  ID18    Hotels
2012-11-29T23:40:00           22          5.6  ID18    Hotels
2012-12-02T23:50:00           35          5.6  ID18    Hotels
2012-12-03T23:50:00           35          5.6  ID18    Hotels
2012-12-04T23:50:00           35          5.6  ID18    Hotels'''

df = pd.read_csv(pd.compat.StringIO(data), sep='\s+', parse_dates=['DATE_LOCAL'])

df['PD_date'] = (df['DATE_LOCAL'] - pd.Timedelta(hours=24)).dt.date
df['PW_date'] = (df['DATE_LOCAL'] - pd.Timedelta(days=7)).dt.date

# Assign new help columns
df['date'] = df['DATE_LOCAL'].dt.date
df['week'] = df['DATE_LOCAL'].dt.year.map(str)+df['DATE_LOCAL'].dt.week.map(str)

# Create a mask and assign
m = df.groupby(['date','site'])['co'].mean().to_dict()
df['co_day'] = df[['date','site']].apply(tuple, 1).map(m)
df['co_pday'] = df[['PD_date','site']].apply(tuple, 1).map(m)
df['co_pweek'] = df[['PW_date','site']].apply(tuple, 1).map(m)

# Calculate rolling 7 days (week) mean
m = df.groupby(['date','site'])['co'].mean().rolling(7, min_periods=1).mean().to_dict()
df['co_week'] = df[['date','site']].apply(tuple,1).map(m)

# Drop help-columns
df = df.set_index('date').drop(
    ['DATE_LOCAL','PD_date','NAFcode','PW_date','week'],axis=1)

print(df)

返回:

            co  temperature  site  co_day  co_pday  co_pweek    co_week
date                                                                   
2012-11-26  34          5.6  ID18    34.0      NaN       NaN  34.000000
2012-11-27  34          5.6  ID18    34.0     34.0       NaN  34.000000
2012-11-28  40          5.6  ID18    36.5     34.0       NaN  34.833333
2012-11-28  33          5.6  ID18    36.5     34.0       NaN  34.833333
2012-11-29  22          5.6  ID18    22.0     36.5       NaN  31.625000
2012-12-02  35          5.6  ID18    35.0      NaN       NaN  32.300000
2012-12-03  35          5.6  ID18    35.0     35.0      34.0  32.750000
2012-12-04  35          5.6  ID18    35.0     35.0      34.0  33.071429