我想使用groupby
计算多列的平均值。下面是一个玩具示例
df = pd.DataFrame({'company': ['dell', 'microsoft', 'toshiba', 'apple'],
'measure': ['sales', 'speed', 'wait time', 'service'], 'category': ['laptop',
'tablet', 'smartphone', 'desktop'], '10/6/2015': [234, 333, 456, 290],
'10/13/2015': [134, 154, 123, 177], '10/20/2015': [57, 57, 63, 71]})
我想计算df
中日期列中每一行的平均值。我认为使用groupby
的最佳方法是更改列名,以使每个月的列名都不唯一,就像这样:
def maybe_rename(col_name):
if re.match('\\d+/\\d+/\\d+', col_name):
return re.split('/', col_name)[0] + re.split('/', col_name)[2]
else:
return col_name
df = df.rename(columns = maybe_rename)
df
company measure category 102015 102015 102015
0 dell sales laptop 234 134 57
1 microsoft speed tablet 333 154 57
2 toshiba wait time smartphone 456 123 63
3 apple service desktop 290 177 71
然后我尝试像这样计算mean
:
df = df.groupby(df.columns, axis = 1).mean()
哪个返回以下错误:DataError: No numeric types to aggregate
我该如何解决?我想要的结果如下:
df
company measure category 102015
0 dell sales laptop 141.66
1 microsoft speed tablet 181.33
2 toshiba wait time smartphone 214.0
3 apple service desktop 79.33
答案 0 :(得分:1)
尝试一下:
import pandas as pd
df = pd.DataFrame({'company': ['dell', 'microsoft', 'toshiba', 'apple'],
'measure': ['sales', 'speed', 'wait time', 'service'], 'category': ['laptop',
'tablet', 'smartphone', 'desktop'], '10/6/2015': [234, 333, 456, 290],
'10/13/2015': [134, 154, 123, 177], '10/20/2015': [57, 57, 63, 71]})
columns_to_average = ['10/6/2015','10/20/2015','10/13/2015']
df['means'] = df[columns_to_average].mean(axis=1)
如果您有很多日期列,我建议将其转换为时间序列数据...
tdf = df[['category','10/6/2015','10/20/2015','10/13/2015']].transpose()
tdf = tdf.rename(columns=tdf.iloc[0]).drop(tdf.index[0])
print(tdf['laptop'].mean())