Question

我有一个小时温度，降水量和其他数字字段的数据集，这些数据字段的时间戳可以追溯到10年前。

我想为每个领域增加一个10年的“平均”栏目。

我能够按月分组，从而获得每天的相应工具，但我不知道如何将这些方法添加回原始数据框。

这是我的代码：

http://cl.ly/WWRn

http://cl.ly/WWJW

任何提示？

编辑：

如果您没有转换为t.date，则以下答案是正确的：

    df['datetime'].apply(lambda t: "%d-%d" % (t.month, t.day) )

Answer 1

我做了这样的事情 - 也许它可以帮助你（或不是）。

import pandas as pd

df = pd.DataFrame( [
    ['2011-01-01 01:00', 1, 2, 3],
    ['2011-01-01 02:00', 10, 20, 30],
    ['2011-01-01 03:00', 100, 200, 300],
    ['2011-01-02 01:00', 4, 5, 6],
    ['2011-01-02 02:00', 40, 50, 60],
    ['2011-01-02 03:00', 400, 500, 600],
], columns=['datetime','a','b','c'])

# convert string datetime to object datetime
df['datetime'] = pd.to_datetime(df['datetime'])

# now I have example dataframe for work

# create row with date only
df['date'] = df['datetime'].apply(lambda t: t.date())

# groupe by date
g = df.groupby('date').mean()

# change `date` from index to normal column
g2 = g.reset_index()

# merge by `date` columns
new_df = pd.merge(left=df, right=g2, on='date', suffixes=('_df','_group') )

print df
print g
print g2
print new_df

DF：

             datetime    a    b    c        date
0 2011-01-01 01:00:00    1    2    3  2011-01-01
1 2011-01-01 02:00:00   10   20   30  2011-01-01
2 2011-01-01 03:00:00  100  200  300  2011-01-01
3 2011-01-02 01:00:00    4    5    6  2011-01-02
4 2011-01-02 02:00:00   40   50   60  2011-01-02
5 2011-01-02 03:00:00  400  500  600  2011-01-02

G：

              a    b    c
date                     
2011-01-01   37   74  111
2011-01-02  148  185  222

G2：

         date    a    b    c
0  2011-01-01   37   74  111
1  2011-01-02  148  185  222

new_df：

             datetime  a_df  b_df  c_df        date  a_group  b_group  c_group
0 2011-01-01 01:00:00     1     2     3  2011-01-01       37       74      111
1 2011-01-01 02:00:00    10    20    30  2011-01-01       37       74      111
2 2011-01-01 03:00:00   100   200   300  2011-01-01       37       74      111
3 2011-01-02 01:00:00     4     5     6  2011-01-02      148      185      222
4 2011-01-02 02:00:00    40    50    60  2011-01-02      148      185      222
5 2011-01-02 03:00:00   400   500   600  2011-01-02      148      185      222

修改

使用left_on='date', right_index=True无需使用reset_index()

# change `date` from index to normal column #g2 = g.reset_index() # merge by `date` columns #new_df = pd.merge(left=df, right=g2, on='date', suffixes=('_df','_group') ) new_df = pd.merge(left=df, right=g2, left_on='date', right_index=True, suffixes=('_df','_group') )

print df

向每行添加多年平均列

1 个答案: