Question

我有一个熊猫数据框，看起来像这样：

Item    Status
123     B
123     BW
123     W 
123     NF
456     W
456     BW
789     W
789     NF
000     NF

我需要创建一个新列Value，根据Item和Status列中的值，该列将为1或0。值1的分配按以下顺序排列优先级：B，BW，W，NF。因此，使用上面的示例数据框，结果应为：

Item    Status    Value
123     B         1
123     BW        0
123     W         0
123     NF        0
456     W         0
456     BW        1
789     W         1
789     NF        0
000     NF        1

使用Python 3.7。

Answer 1

将原始数据框作为输入df数据框，以下代码将产生所需的输出：

#dictionary assigning order of priority to status values
priority_map = {'B':1,'BW':2,'W':3,'NF':4}

#new temporary column that converts Status values to order of priority values
df['rank'] = df['Status'].map(priority_map)

#create dictionary with Item as key and lowest rank value per Item as value
lowest_val_dict = df.groupby('Item')['rank'].min().to_dict()

#new column that assigns the same Value to all rows per Item
df['Value'] = df['Item'].map(lowest_val_dict)

#replace Values where rank is different with 0's
df['Value'] = np.where(df['Value'] == df['rank'],1,0)

#delete rank column
del df['rank']

Answer 2

我更喜欢状态为有序pd.Categorical的方法，因为a）就是这样，b）可读性更强：如果有，只需比较一个值是否等于其群组中的max：

df['Status'] = pd.Categorical(df['Status'], categories=['NF', 'W', 'BW', 'B'],
                              ordered=True)
df['Value'] = df.groupby('Item')['Status'].apply(lambda x: (x == x.max()).astype(int))

#   Item Status  Value
#0   123      B      1
#1   123     BW      0
#2   123      W      0
#3   123     NF      0
#4   456      W      0
#5   456     BW      1
#6   789      W      1
#7   789     NF      0
#8     0     NF      1

Answer 3

我可以通过解释一些我会做的步骤来在概念上为您提供帮助：

创建新的列Value，并用零np.zeros()或pd.fillna()填充它
使用groupby = pd.groupby('Item')按项目对数据框进行分组
遍历所有组发现的for name, group in groupby:
通过使用带有if的简单函数，自定义优先级队列，自定义排序条件或任何其他首选方法，通过该值1确定哪个条目具有更高的优先级“ 按以下顺序排列优先级：B，BW，W，NF “，并为其“值”列分配值1 group.loc[entry]['Value'] == 0

假设我们正在查看“ 123”组：
```
 Item    Status    Value
 -------------------------
 123     B         0 (before 0, after 1)
 123     BW        0
 123     W         0
 123     NF        0
```
由于根据您的条件，行[123, 'B', 0]的优先级最高，因此将其更改为[123, 'B', 1]
完成后，从groupby对象创建数据框，然后完成。您有很多这样做的可能，请在此处检查：Converting a Pandas GroupBy object to DataFrame

根据优先级在熊猫数据框中创建二进制列

3 个答案: