此问题类似于有关条件填充列的几个问题,但我的df
稍微复杂一些。
我有一个df
,其中的列包含浮点数和字符串。我试图根据字符串有条件地填充包含浮点数的列。
基于以下df
:
如果Code
中的值以A
开头,我希望保留这些值。
如果值Code
以B
开头,我希望保持相同的初始值并将nan's
返回到以下行,直到Code
中的下一个值。< / p>
如果Code
中的值以C
开头,我希望保持相同的第一个值,直到['Numx','Numy]
import pandas as pd
import numpy as np
d = ({
'Code' :['A1','A1','','B1','B1','A2','A2','','B2','B2','','A3','A3','A3','','B1','','B4','B4','A2','A2','A1','A1','','B4','B4','C1','C1','','','D1','','B2'],
'Numx' : [30.2,30.5,30.6,35.6,40.2,45.5,46.1,48.1,48.5,42.2,'',30.5,30.6,35.6,40.2,45.5,'',48.1,48.5,42.2, 40.1,48.5,42.2,'',48.5,42.2,43.1,44.1,'','','','',45.1],
'Numy' : [1.9,2.3,2.5,2.2,2.5,3.1,3.4,3.6,3.7,5.4,'',2.3,2.5,2.2,2.5,3.1,'',3.6,3.7,5.4,6.5,8.5,2.2,'',8.5,2.2,2.3,2.5,'','','','',3.2]
})
df = pd.DataFrame(data = d)
输出:
Code Numx Numy
0 A1 30.2 1.9
1 A1 30.5 2.3
2 30.6 2.5
3 B1 35.6 2.2
4 B1 40.2 2.5
5 A2 45.5 3.1
6 A2 46.1 3.4
7 48.1 3.6
8 B2 48.5 3.7
9 B2 42.2 5.4
10 nan nan
11 A3 30.5 2.3
12 A3 30.6 2.5
13 A3 35.6 2.2
14 40.2 2.5
15 B1 45.5 3.1
16 nan nan
17 B4 48.1 3.6
18 B4 48.5 3.7
19 A2 42.2 5.4
20 A2 40.1 6.5
21 A1 48.5 8.5
22 A1 42.2 2.2
23 nan nan
24 B4 48.5 8.5
25 B4 42.2 2.2
26 C1 43.1 2.3
27 C1 44.1 2.5
28 nan nan
29 nan nan
30 D1 nan nan
31 nan nan
32 B2 45.1 3.2
当Code
中的值为B
时,我正在考虑这样的事情:
df['Numx'] = np.where(df['Code'] == 'B-'.ffill())
df['Numy'] = np.where(df['Code'] == 'B-'.ffill())
所以我想要的输出是:
Code Numx Numy
0 A1 30.2 1.9
1 A1 30.5 2.3
2 30.6 2.5
3 B1 35.6 2.2
4 B1 nan nan
5 A2 45.5 3.1
6 A2 46.1 3.4
7 48.1 3.6
8 B2 48.5 3.7
9 B2 nan nan
10 nan nan
11 A3 30.5 2.3
12 A3 30.6 2.5
13 A3 35.6 2.2
14 40.2 2.5
15 B1 45.5 3.1
16 nan nan
17 B4 48.1 3.6
18 B4 nan nan
19 A2 42.2 5.4
20 A2 40.1 6.5
21 A1 48.5 8.5
22 A1 42.2 2.2
23 nan nan
24 B4 48.5 8.5
25 B4 nan nan
26 C1 43.1 2.3
27 C1 43.1 2.3
28 43.1 2.3
29 43.1 2.3
30 D1 43.1 2.3
31 43.1 2.3
32 B2 45.1 3.2
答案 0 :(得分:0)
我认为需要:
df['Code_new'] = df['Code'].where(df['Code'].isin(['AA','BB'])).ffill()
df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated())
mask = df['Code_new'] == 'BB'
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()
print (df)
Code Numx Numy Code_new
0 AA 30.2 1.9 AA
1 NaN NaN AA
2 NaN NaN AA
3 BB 35.6 2.2 BB
4 35.6 2.2 BB
5 35.6 2.2 BB
6 35.6 2.2 BB
7 CC 35.6 2.2 BB
8 35.6 2.2 BB
9 DD 35.6 2.2 BB
或者:
df = df.replace('nan', np.nan)
df['Code_new'] = df['Code'].where(df['Code'].isin(['AA','BB'])).ffill()
m1 = df['Code_new'].duplicated() & (df['Code_new'] == 'AA')
df[['Numx','Numy']] = df[['Numx','Numy']].mask(m1)
m2 = df['Code_new'] == 'BB'
df.loc[m2, ['Numx','Numy']] = df.loc[m2, ['Numx','Numy']].ffill()
print (df)
Code Numx Numy Code_new
0 AA 30.2 1.9 AA
1 NaN NaN AA
2 NaN NaN AA
3 BB 35.6 2.2 BB
4 40.2 2.5 BB
5 45.5 3.1 BB
6 45.5 3.1 BB
7 CC 45.5 3.1 BB
8 45.5 3.1 BB
9 DD 42.2 5.4 BB