使用数据框df:
Product_ID | Category_A | Category _B
1232 0 0
1343 Unknown X
2543 Nan 0
2549 Y Y
0349 X X
8533 Y X
我想创建一个新列Category_Final,其中包含以下规则:
预期产出:
Product_ID | Category_A | Category _B | Category_Final
1232 0 0 Unknown
1343 Unknown X Unknown
2543 Nan 0 Unknown
2549 Y Y 0
0349 X X 0
8533 Y X X
我设法获得0和X的逻辑,但我不知道如何包含未知逻辑。
df['Category_Final'] = np.where(df['Category_A'] != df['Category_B'], 'X', '0')
谢谢!
答案 0 :(得分:1)
在当前行之后,试试这个:
mask = ((df.Category_A.isnull()) |
(df.Category_A == 'Unknown') |
(df.Category_A == 0))
df.loc[mask, 'Category_Final'] = 'Unknown'
答案 1 :(得分:1)
您可以使用嵌套的np.where
df['Category_Final'] = np.where((df['Category_A'].isnull() | \
(df['Category_A'] == 'Unknown') | (df['Category_A'] == '0')),\
'Unknown', np.where(df['Category_A'] == \
df['Category_B'], 0, 'X'))
输出
Product_ID Category_A Category_B Category_Final
0 1232 0 0 Unknown
1 1343 Unknown X Unknown
2 2543 NaN 0 Unknown
3 2549 Y Y 0
4 349 X X 0
5 8533 Y X X
答案 2 :(得分:1)
df['Category_Final'] = (
df.apply(lambda _: "0", axis=1)
.where(df['Category_A'] == df['Category_B'], "X")
.where(~df['Category_A'].isin(["0", "Unknown", np.NaN]), "Unknown")
)