我在熊猫中有以下DF:
+---------+-----------+------------+------------------------------------------------+
| keyword | frequency | avg weight | sum other keywords |
+---------+-----------+------------+------------------------------------------------+
| dog | 3 | 0.14 | [cat, horse, pig, cat, horse, cat, horse, pig] |
| cat | 1 | 0.5 | [dog, pig, camel] |
| horse | 2 | 0.185 | [dog, camel, cat, camel] |
+---------+-----------+------------+------------------------------------------------+
我想要执行的任务是按关键字进行分组,同时计算关键字频率,按权重平均并按其他关键字求和。结果将是这样的:
{{1}}
现在,我知道如何在许多单独的操作中执行它:value_counts,groupby.sum(),groupby.avg()然后合并它。然而,效率非常低,我不得不进行大量的手动调整。
我想知道是否可以在一次操作中完成它?
答案 0 :(得分:10)
您可以使用agg
:
df = df.groupby('keyword').agg({'keyword':'size', 'weight':'mean', 'other keywords':'sum'})
#set new ordering of columns
df = df.reindex_axis(['keyword','weight','other keywords'], axis=1)
#reset index
df = df.rename_axis(None).reset_index()
#set new column names
df.columns = ['keyword','frequency','avg weight','sum other keywords']
print (df)
keyword frequency avg weight \
0 cat 1 0.500
1 dog 3 0.140
2 horse 2 0.185
sum other keywords
0 [dog, pig, camel]
1 [cat, horse, pig, cat, horse, cat, horse, pig]
2 [dog, camel, cat, camel]