如何计算各组数据的平均值

时间:2019-06-10 12:12:55

标签: python pandas

我有这个熊猫DataFrame df

Station   DateTime               Record
A         2017-01-01 00:00:00    20
A         2017-01-01 01:00:00    22  
A         2017-01-01 02:00:00    20
A         2017-01-01 03:00:00    18
B         2017-01-01 00:00:00    22
B         2017-01-01 01:00:00    24

我想估算RecordDateTime电台之间每个A(平均每小时)的平均B。如果AB都没有某个DateTime的记录,那么对于该电台,Record的值应视为0。

可以假设DateTime至少有一个Station可用时间。

这是预期的结果:

DateTime               Avg_Record
2017-01-01 00:00:00    21
2017-01-01 01:00:00    23  
2017-01-01 02:00:00    10
2017-01-01 03:00:00    9

1 个答案:

答案 0 :(得分:2)

这是一个解决方案:

g = df.groupby('DateTime')['Record']
df_out = g.mean()
m = g.count() == 1
df_out.loc[m] = df_out.loc[m] / 2
df_out = df_out.reset_index()

或者更难看的一线:

df = df.groupby('DateTime')['Record'].apply(
      lambda x: x.mean() if x.size == 2 else x.values[0]/2
      ).reset_index()

证明:

import pandas as pd

data = '''\
Station   DateTime               Record
A         2017-01-01T00:00:00    20
A         2017-01-01T01:00:00    22  
A         2017-01-01T02:00:00    20
A         2017-01-01T03:00:00    18
B         2017-01-01T01:00:00    22
B         2017-01-01T02:00:00    24'''

fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, sep='\s+', parse_dates=['DateTime'])

# Create a grouper and get the mean
g = df.groupby('DateTime')['Record']
df_out = g.mean()

# Divide by 2 where only 1 input exist
m = g.count() == 1
df_out.loc[m] = df_out.loc[m] / 2

# Reset index to get a dataframe format again
df_out = df_out.reset_index()

print(df_out)

返回:

    DateTime   Record
0   2017-01-01 00:00:00 10.0
1   2017-01-01 01:00:00 22.0
2   2017-01-01 02:00:00 22.0
3   2017-01-01 03:00:00 9.0