我的问题与此one有关。但是,那里的解决方案对我不起作用。
我有一个数据帧df
,如下所示。我想以elevation
作为权重,取width
和counts
的加权平均值,按building
和day
分组。我怎样才能做到这一点?
#Sample data
import pandas as pd
df = pd.DataFrame({
'building': ['A1', 'A1', 'A1', 'A1'],
'day': ['2019-07-02', '2019-07-02', '2019-07-03', '2019-07-03'],
'id': ['alak', 'ldau', 'laud', 'lkdu'],
'counts': [1, 2, 3, 7],
'elevation': [5.7, 7.8, 8.7, 6.9],
'width':[1.2, 2.4, 3.4, 2.7]
})
df
building day id counts elevation width
0 A1 2019-07-02 alak 1 5.7 1.2
1 A1 2019-07-02 ldau 2 7.8 2.4
2 A1 2019-07-03 laud 3 8.7 3.4
3 A1 2019-07-03 lkdu 7 6.9 2.7
# What I want to get:
building day elevation width
0 A1 2019-07-02 7.1 2.0
1 A1 2019-07-03 7.4 2.9
答案 0 :(得分:4)
您可以使用reindex
和repeat
的技巧
df.reindex(df.index.repeat(df.counts)).drop('counts',1).\
groupby(['building','day'],as_index=False).mean()
Out[110]:
building day elevation width
0 A1 2019-07-02 7.10 2.00
1 A1 2019-07-03 7.44 2.91
答案 1 :(得分:1)
我猜想有更好的方法,但是可以做到:
df = pd.DataFrame({
'building': ['A1', 'A1', 'A1', 'A1'],
'day': ['2019-07-02', '2019-07-02', '2019-07-03', '2019-07-03'],
'id': ['alak', 'ldau', 'lauid', 'lkdu'],
'counts': [1, 2, 3, 7],
'elevation': [5.7, 7.8, 8.7, 6.9],
'width':[1.2, 2.4, 3.4, 2.7]
})
df = df.set_index(['building','day'])
sum_count = df.groupby(['building','day']).counts.sum()
df['w_elevation'] = df.elevation*df.counts /sum_count
df['w_width'] = df.width*df.counts / sum_count
df.groupby(['building','day']).sum()
输出:
counts elevation width w_elevation w_width
building day
A1 2019-07-02 3 13.5 3.6 7.10 2.00
2019-07-03 10 15.6 6.1 7.44 2.91
答案 2 :(得分:1)
您可以执行以下操作:
df_sum= df.copy()
df_sum['elevation']*= df_sum['counts']
df_sum['width']*= df_sum['counts']
df_sum= df_sum.groupby(['building', 'day']).agg(dict(elevation=sum, width=sum, counts=sum))
df_sum['elevation']/= df_sum['counts']
df_sum['width']/= df_sum['counts']
df_sum.reset_index(inplace=True)
df_sum.drop('counts', axis='columns', inplace=True)
结果是:
building day elevation width
0 A1 2019-07-02 7.10 2.00
1 A1 2019-07-03 7.44 2.91
答案 3 :(得分:1)
以防您需要比以前的答案更详细的内容:
result = df
# normalize for the weight
result['elevation'] = result['elevation'] * result['counts']
result['width'] = result['width'] * result['counts']
# let's sum all values per our dimensions
result = result.groupby(['building', 'day']).sum()
# and the get the weighted averages
result['elevation'] = result['elevation'] / result['counts']
result['width'] = result['width'] / result['counts']
# final results
result[['elevation', 'width']]
或通过另一种方式:
result = df
# first of all let's get the sum of counts by building and day
# the indexes will return useful later
result = result.set_index(['building', 'day'])
counts = result['counts'].groupby(['building', 'day']).sum()
# let's normalize the values for elevation and width
result[['elevation', 'width']] = result[['elevation', 'width']].apply(lambda x: x * result['counts'])
# finally calculate the wigthed average
result = result[['elevation', 'width']].groupby(['building', 'day']).sum()
result = result[['elevation', 'width']].apply(lambda x: x / counts)
result