我需要替换数据框列x中的值。结果应类似于x_new。因此,详细而言,我必须将值保留在y为1和255的x列中。在1和255之间,我必须用y为1的值替换x值。255和1之间的值应保持不变。那么如何获取x_new列?
我猜想它可以与replace和某些条件一起使用,但是我不知道如何组合它。我期待任何帮助和提示。
我的数据框看起来像例如:
x y z x_new
12.28 1 1 12.28
11.99 0 1 12.28
11.50 0 1 12.28
11.20 0 1 12.28
11.01 0 1 12.28
9.74 255 0 9.74
13.80 0 0 13.80
15.2 0 0 15.2
17.8 0 0 17.8
12.1 1 1 12.1
11.9 0 1 12.1
11.7 0 1 12.1
11.2 0 1 12.1
10.3 255 0 10.3
答案 0 :(得分:2)
尝试:
# mark the occurrences of 1 and 255
df['is_1_255'] = df.y[(df.y==1)|(df.y==255)]
df['x_n'] = None
# copy the 1's
df.loc[df.is_1_255==1,'x_n'] = df.loc[df.is_1_255==1,'x']
# fill is_1_255 with markers,
#255 means between 255 and 1, 1 means between 1 and 255
df['is_1_255'] = df['is_1_255'].ffill()
# update the 255 values
df.loc[df.is_1_255==255, 'x_n'] = df.loc[df.is_1_255==255,'x']
# update the 1 values
df['x_n'].ffill(inplace=True)
输出:
+-----+-------+-----+---+-------+----------+-------+
| idx | x | y | z | x_new | is_1_255 | x_n |
+-----+-------+-----+---+-------+----------+-------+
| 0 | 12.28 | 1 | 1 | 12.28 | 1.0 | 12.28 |
| 1 | 11.99 | 0 | 1 | 12.28 | 1.0 | 12.28 |
| 2 | 11.50 | 0 | 1 | 12.28 | 1.0 | 12.28 |
| 3 | 11.20 | 0 | 1 | 12.28 | 1.0 | 12.28 |
| 4 | 11.01 | 0 | 1 | 12.28 | 1.0 | 12.28 |
| 5 | 9.74 | 255 | 0 | 9.74 | 255.0 | 9.74 |
| 6 | 13.80 | 0 | 0 | 13.80 | 255.0 | 13.80 |
| 7 | 15.20 | 0 | 0 | 15.20 | 255.0 | 15.20 |
| 8 | 17.80 | 0 | 0 | 17.80 | 255.0 | 17.80 |
| 9 | 12.10 | 1 | 1 | 12.10 | 1.0 | 12.10 |
| 10 | 11.90 | 0 | 1 | 12.10 | 1.0 | 12.10 |
| 11 | 11.70 | 0 | 1 | 12.10 | 1.0 | 12.10 |
| 12 | 11.20 | 0 | 1 | 12.10 | 1.0 | 12.10 |
| 13 | 10.30 | 255 | 0 | 10.30 | 255.0 | 10.30 |
+-----+-------+-----+---+-------+----------+-------+
答案 1 :(得分:2)
假设1和255总是成对出现的干净数据,我们可以形成1-255和groupby的组来填充数据。
s = (df.y.eq(1).cumsum() == df.y.eq(255).cumsum()+1)
df['xnew'] = df.groupby(s.ne(s.shift()).cumsum().where(s)).x.transform('first').fillna(df.x)
x y z xnew
0 12.28 1 1 12.28
1 11.99 0 1 12.28
2 11.50 0 1 12.28
3 11.20 0 1 12.28
4 11.01 0 1 12.28
5 9.74 255 0 9.74
6 13.80 0 0 13.80
7 15.20 0 0 15.20
8 17.80 0 0 17.80
9 12.10 1 1 12.10
10 11.90 0 1 12.10
11 11.70 0 1 12.10
12 11.20 0 1 12.10
13 10.30 255 0 10.30
尽管对于这样的事情,您应该真正进行彻底的单元测试,因为这种逻辑对于不正确的输入可能会变得非常棘手和成问题。
答案 2 :(得分:2)
可以执行多个步骤,但是可以。查找y为255的行的索引,直到找到下一个1.将值保存在idx中。现在,使用idx和其他两个条件(y == 1或y == 255)创建new_x。填充其余部分。
# Index of rows between 255 and 1 in column y
idx = df.loc[df['y'].replace(0, np.nan).ffill() == 255, 'y'].index
# Create x_new1 and assign value of x where index is idx or y == 1 or y ==255
df.loc[idx, 'x_new1'] = df['x']
df.loc[(df['y'] == 1) | (df['y'] == 255) , 'x_new1'] = df['x']
# ffill rest of the values in x_new1
df['x_new1'] = df['x_new1'].ffill()
x y z x_new x_new1
0 12.28 1 1 12.28 12.28
1 11.99 0 1 12.28 12.28
2 11.50 0 1 12.28 12.28
3 11.20 0 1 12.28 12.28
4 11.01 0 1 12.28 12.28
5 9.74 255 0 9.74 9.74
6 13.80 0 0 13.80 13.80
7 15.20 0 0 15.20 15.20
8 17.80 0 0 17.80 17.80
9 12.10 1 1 12.10 12.10
10 11.90 0 1 12.10 12.10
11 11.70 0 1 12.10 12.10
12 11.20 0 1 12.10 12.10
13 10.30 255 0 10.30 10.30