Question

我已经待了几个小时了，不幸的是，它还没有有效地做到这一点。对不起，这似乎很简单。我需要对两个列上的一个数据框进行分组，并根据同一数据框中另一个列的值更改其他列子集（来自同一数据框）。

我的数据框如下所示：

state | binned_age   | mnth1 | mnth2 | key
 NSW  |  24-34       | 400   | 200   | 250
 VIC  |  65-150      | 150   | 200   | 450
 VIC  |  65-150      | 50    | 200   | 450
 VIC  |  65-150      | 600   | 200   | 450
 VIC  |  65-150      | 900   | 200   | 450

我正在尝试像这样转换此数据框： 1）在state和binned_age上分组 2）如果month1和mnth2大于key，请替换为1，其他地方，请替换为0

最终结果应如下所示：

 state | binned_age   | mnth1 | mnth2 | key
     NSW  |  24-34       | 1     | 0     | 250
     VIC  |  65-150      | 0     | 0     | 450
     VIC  |  65-150      | 0     | 0     | 450
     VIC  |  65-150      | 1     | 0     | 450
     VIC  |  65-150      | 1     | 0     | 450

我处于现阶段，但不确定如何将其转换为以上数据框。

grouped_df = sample_cols.groupby(['state', 'binned_age'])
grouped_df.apply(lambda x: x.max_exp_1_mnth > x.max_exp_2_mnth)

感谢所有帮助。

Answer 1

我不确定您是否需要groupby，可以按照以下方式进行操作：

df[['mnth1','mnth2']]=np.where(df[['mnth1','mnth2']].gt(df.key,axis=0),1,0)
print(df)

   state     binned_age    mnth1  mnth2  key
0   NSW      24-34             1      0  250
1   VIC      65-150            0      0  450
2   VIC      65-150            0      0  450
3   VIC      65-150            1      0  450
4   VIC      65-150            1      0  450

Answer 2

此处groupby是不必要的，因为没有按组进行比较。因此，按DataFrame.gt比较过滤后的列，并按DataFrame.astype将True/False转换为1/0：

cols = ['mnth1','mnth2']
df[cols] = df[cols].gt(df.key,axis=0).astype(int)
print (df)
  state binned_age  mnth1  mnth2  key
0   NSW      24-34      1      0  250
1   VIC     65-150      0      0  450
2   VIC     65-150      0      0  450
3   VIC     65-150      1      0  450
4   VIC     65-150      1      0  450

如果性能很重要，请在广播中使用numpy替代：

df[cols] = (df[cols].values > df.key.values[:, None]).astype(int)

Groupby和修改Pandas数据框列

2 个答案: