Python Pandas基于多个其他列中的条件替换一列中的值

时间:2018-04-05 16:20:02

标签: python pandas if-statement conditional nan

使用数据框df:

Product_ID | Category_A   | Category _B
1232             0              0 
1343             Unknown        X
2543             Nan            0 
2549             Y              Y
0349             X              X
8533             Y              X

我想创建一个新列Category_Final,其中包含以下规则:

  • 如果Category_A为0,则为Unknown或Nan,Category_Final应为" Unknown"
  • 如果Category_A与Category_B相同,则Category_Final应为0
  • 如果Category_A与Category_B不同,则Category_Final应为X

预期产出:

Product_ID | Category_A   | Category _B | Category_Final
1232             0              0            Unknown
1343             Unknown        X            Unknown
2543             Nan            0            Unknown
2549             Y              Y            0
0349             X              X            0
8533             Y              X            X

我设法获得0和X的逻辑,但我不知道如何包含未知逻辑。

df['Category_Final'] = np.where(df['Category_A'] != df['Category_B'], 'X', '0')

谢谢!

3 个答案:

答案 0 :(得分:1)

在当前行之后,试试这个:

mask = ((df.Category_A.isnull()) | 
        (df.Category_A == 'Unknown') | 
        (df.Category_A == 0))
df.loc[mask, 'Category_Final'] = 'Unknown'

答案 1 :(得分:1)

您可以使用嵌套的np.where

df['Category_Final'] = np.where((df['Category_A'].isnull() | \
                                              (df['Category_A'] == 'Unknown') | (df['Category_A'] == '0')),\
                                              'Unknown', np.where(df['Category_A'] == \
                                                                  df['Category_B'], 0, 'X'))

输出

Product_ID  Category_A  Category_B  Category_Final
0   1232    0            0            Unknown
1   1343    Unknown      X            Unknown
2   2543    NaN          0            Unknown
3   2549    Y            Y              0
4   349     X            X              0
5   8533    Y            X              X

答案 2 :(得分:1)

df['Category_Final'] = (
    df.apply(lambda _: "0", axis=1)
    .where(df['Category_A'] == df['Category_B'], "X")
    .where(~df['Category_A'].isin(["0", "Unknown", np.NaN]), "Unknown")
)