Question

所以，我有一个像这样的pandas数据框：

id, counts
1, 20
1, 21
1,15
1, 24
2,12
2,42
2,9
3,43
...

id, counts, label
1, 20, 0
1, 21, 0
1,15, 0
1, 24, 1 # because 24 is the highest count for id 1
2,12, 0
2,42, 1 # because 42 is the highest count for id 2
2,9, 0
3,43, 
...

如何使用pandas

Answer 1

maxes = df.groupby('id').counts.max().rename('Max').reset_index()
df1 = df.merge(maxes, how='left')
df['Max'] = (df1.counts == df1.Max) * 1
df

Answer 2

这似乎有效：

df['label'] = 0
df['label'].iloc[df.groupby('id').apply(lambda x: x['counts'].argmax()).values] = 1

但它太难看了！而且不遵循良好的编码实践......我会尝试改进它。

如果您喜欢以下内容，请点击this answer（Merlin对此问题的回答）表示感谢。

df['label'] = np.where(df.index.isin((df.groupby('id')['counts'].idxmax())), 1, 0)

恕我直言，您应该使用Merlin的答案来解决这个问题。我的编码实践并不好，与Merlin相比，它的规模也很差。

Answer 3

试试这个：

 df["label"] =  np.where( df.index.isin((df2.groupby("id")["counts"].idxmax())),1,0)

   id  counts  label
0   1      20      0
1   1      21      0
2   1      15      0
3   1      24      1
4   2      12      0
5   2      42      1
6   2       9      0
7   3      43      1

找到每个id的最大值并在pandas中创建一个新列

3 个答案: