Question

在遍历Pandas数据框时，我需要一些替换iterrows的帮助。我有一个像这样的Pandas数据框：

| cust_no | channel  | month1 | month2 |
|   1     | radio    | 0.7    | 0.4    |
|   1     | fb       | 0.1    | 0.5    |
|   1     | tv       | 0.2    | 0.1    |
|   2     | fb       | 0.5    | 0.25   |
|   2     | radio    | 0.4    | 0.25   |
|   2     | tv       | 0.1    | 0.5    |

当按cust_no分组时，我需要每个月具有最大值的通道，并将它们作为字符串连接到同一数据框中的新列中。因此，例如，从上面的数据框开始：

在客户1的情况下，radio在month1中具有最大值，而fb在month 2中具有最大值，因此我需要以下字符串：radio>fb

在客户2的情况下，fb在month1中具有最大值，但是tv在month2中具有最大值，因此我需要这样的强项：fb>tv

感谢任何帮助。谢谢。 Performance is really important

Answer 1

通过channel和DataFrame.set_index创建索引，然后使用DataFrameGroupBy.idxmax，最后使用apply+join：

df1 = (df.set_index('channel')
         .groupby('cust_no')['month1','month2']
         .idxmax()
         .apply('>'.join, axis=1)
         .reset_index(name='new'))
print (df1)
   cust_no       new
0        1  radio>fb
1        2     fb>tv

如果没有其他列，请删除过滤列month1和month2：

df1 = (df.set_index('channel')
         .groupby('cust_no')
         .idxmax()
         .apply('>'.join, axis=1)
         .reset_index(name='new'))
print (df1)
   cust_no       new
0        1  radio>fb
1        2     fb>tv

熊猫爬得很慢

1 个答案: