我有一个数据框df,想为每个“ Id”更改“状态”列中的值
规则是: 如果'Status'=='High',则将之前的行更新为'Before',否则为'After'
数据帧df:
Id Status
0 1 Low
1 1 Low
2 1 High
3 1 Low
4 2 Low
5 2 Low
6 2 High
7 2 Low
8 3 Low
9 3 Low
10 3 High
11 3 Low
12 3 High
13 3 Low
我期望的df:
Id Status
0 1 Before
1 1 Before
2 1 High
3 1 After
4 2 Before
5 2 Before
6 2 High
7 2 After
8 3 Before
9 3 Before
10 3 High
11 3 After
12 3 High
13 3 After
到目前为止,这是我的代码,(我尚未添加规则,否则将其更改为“之后”)
df.loc[df.groupby(['Id'])['Status'] == "High", df['Status'].shift(1)] = 'Before'
我遇到一个错误:
ValueError: cannot index with vector containing NA / NaN values
答案 0 :(得分:2)
使用numpy.select
将每个组的最后High
之后设置为After
,将所有没有High
的值设置为Before
:
m1 = df['Status'].eq('High')
m2 = m1.groupby(df['Id']).cumsum() == 0
df['Status1'] = np.select([m1, m2], ['High', 'Before'], default='After')
print (df)
Id Status Status1
0 1 Low Before
1 1 Low Before
2 1 High High
3 1 Low After
4 2 Low Before
5 2 Low Before
6 2 High High
7 2 Low After
8 3 Low Before
9 3 Low Before
10 3 High High
11 3 Low After
12 3 High High
13 3 Low After
答案 1 :(得分:0)
您可以使用地图功能:
df['Status'] = df['Status'].map({'High': 'After', 'Low': 'Before'})