Question

我有一个看起来像这样的数据框 -

col_1   |   col_2
-------------------
"red"   |    21
-------------------
"blue"  |    31
-------------------
"red"   |    12
-------------------
"blue"  |    99
-------------------
"blue"  |    102

我也有一个这样的值列表 label = [1,3,2]

我想构建第三列 col_3，如果 col_1 中的颜色为“红色”，则该列应为“Yes”，否则在相应的行上应该有 1,3,2。基本上，如果颜色为“蓝色”，则标签中的值应该一个接一个。

预期输出 -

col_1   |   col_2    | col_3
---------------------------
"red"   |    21      |  "Yes"
-----------------------------
"blue"  |    31      |  "1"
------------------------------
"red"   |    12      | "Yes"
------------------------------
"blue"  |    99      |  "3"
------------------------------
"blue"  |    102     |  "2"

我的方法 -

我曾尝试像这样使用 np.where() 进行估算

np.where(df["col_1"]=="red","Yes",labels)

，但是

ValueError: operands could not be broadcast together with shapes

我认为这是由于 df 和 labels 的大小不同（5 对 3）。

有人可以帮我吗？

谢谢

编辑：

添加了预期输出
在 My Approach 演示中犯了一些错误，已更正。

Answer 1

您可以尝试使用布尔掩码。首先，先将 Yes 分配给整列，然后使用 Series.ne 创建一个布尔掩码。在您的情况下，创建一个掩码，其中 col_1 值不等于 Red 并使用该掩码填充值。

df['col_3'] = 'Yes'
m = df['col_1'].ne('Red') # ne -> not equal to
df.loc[m, 'col_3'] = label

Answer 2

你在寻找类似的东西

df['col_3'] = 'Yes'
df.loc[(df['col_1'] != 'red'), 'col_3'] = label

print(df)

   col_1  col_2 col_3
0   red     21   Yes
1  blue     32     1
2   red     12   Yes
3  blue     99     3
4  blue    102     2

如果基于某些其他列不满足条件，则使用值列表填充 Dataframe 列

2 个答案: