Question

我有一个带有列的数据框：

User_id PQ_played PQ_offered
 1           5        15
 2          12        75
 3          25        50

我需要将PQ_played除以PQ_offered才能计算出所玩游戏的百分比。到目前为止，这是我尝试过的：

new_df['%_PQ_played'] = df.groupby('User_id').((df['PQ_played']/df['PQ_offered'])*100),as_index=True

我知道我做错了。

Answer 1

这比您想象的要简单得多。

df['%_PQ_played'] = df['PQ_played'] / df['PQ_offered'] * 100

         PQ_offered  PQ_played  %_PQ_played
User_id                                     
1                15          5     33.333333
2                75         12     16.000000
3                50         25     50.000000

Answer 2

您可以使用lambda函数

df.groupby('User_id').apply(lambda x: (x['PQ_played']/x['PQ_offered'])*100)\
.reset_index(1, drop = True).reset_index().rename(columns = {0 : '%_PQ_played'})

你得到

    User_id %_PQ_played
0   1       33.333333
1   2       16.000000
2   3       50.000000

Answer 3

我完全同意@mVChr，并认为您过于复杂了您需要做的事情。如果您只是尝试添加其他列，那么他的回答很明显。如果您确实需要groupby，则值得注意的是，它通常用于聚合，例如sum()，count()等。例如，如果您有几条记录包含非User_id列中的唯一值，则可以使用

df['%_PQ_played'] = df['PQ_played'] / df['PQ_offered'] * 100

，然后执行汇总。假设您想知道为每个用户提供的游戏的平均游戏数量，您可以执行以下操作

new_df = df.groupby('User_id', as_index=False)['%_PQ_played'].mean()

这会产生（数字是任意的）

   User_id  %_PQ_played
0        1    52.777778
1        2    29.250000
2        3    65.000000

划分2列并使用结果创建新列

3 个答案: