Question

我有一个包含3列的数据框：ID，Date，Data_Value报告来自不同气象站（ID）的给定时间段（日期 - 每天一天）中的温度记录（Data_Value）。我需要的是＆＃39; group by＆＃39;每一天计算每一天的平均温度，例如

ID      |   Date       | Data_Value
------------------------------------
12345   |   02-05-2017 |  22
12346   |   02-05-2017 |  24
12347   |   02-05-2017 |  20
12348   |   01-05-2017 |  18
12349   |   01-05-2017 |  16

变为：

ID      |   Date       | Data_Value
------------------------------------
.....   |   02-05-2017 | 22
.....   |   01-05-2017 | 17

有人可以帮我解决这个问题吗？

Answer 1

我认为您需要groupby并且需要mean：

df = df.groupby('Date', as_index=False, sort=False)['Data_Value'].mean()
print (df)
         Date  Data_Value
0  02-05-2017          22
1  01-05-2017          17

然后，如果还需要ID值，请使用agg：

df = df.groupby('Date', as_index=False, sort=False)
       .agg({'Data_Value':'mean', 'ID':lambda x: ','.join(x.astype(str))})
       .reindex_axis(['ID','Date','Data_Value'], axis=1)
print (df)
                  ID        Date  Data_Value
0  12345,12346,12347  02-05-2017          22
1        12348,12349  01-05-2017          17

或者只有ID的第一个值按first聚合：

df = df.groupby('Date', as_index=False, sort=False) 
       .agg({'Data_Value':'mean', 'ID':'first'}) 
       .reindex_axis(['ID','Date','Data_Value'], axis=1)
print (df)

      ID        Date  Data_Value
0  12345  02-05-2017          22
1  12348  01-05-2017          17

按日期分组＆＃39;在计算其他列的平均值时

1 个答案: