pandas df中条件的条件填充

时间:2018-05-29 06:34:39

标签: python pandas where apply fill

这个问题类似于有条件填充的几个问题。我试图根据以下陈述有条件地填充该列。

如果Code中的值以A开头,我希望保留这些值。

如果值CodeB开头,我希望保持相同的初始值并将nan's返回到以下行,直到Code中的下一个值。< / p>

如果Code中的值以C开头,我希望保持相同的第一个值,直到['Numx','Numy]

中的下一个浮点数为止
import pandas as pd
import numpy as np


d = ({                          
      'Code' :['A1','A1','','B1','B1','A2','A2','','B2','B2','','A3','A3','A3','','B1','','B4','B4','A2','A2','A1','A1','','B4','B4','C1','C1','','','D1','','B2'],
      'Numx' : [30.2,30.5,30.6,35.6,40.2,45.5,46.1,48.1,48.5,42.2,'',30.5,30.6,35.6,40.2,45.5,'',48.1,48.5,42.2, 40.1,48.5,42.2,'',48.5,42.2,43.1,44.1,'','','','',45.1],
      'Numy' : [1.9,2.3,2.5,2.2,2.5,3.1,3.4,3.6,3.7,5.4,'',2.3,2.5,2.2,2.5,3.1,'',3.6,3.7,5.4,6.5,8.5,2.2,'',8.5,2.2,2.3,2.5,'','','','',3.2]
      })

df = pd.DataFrame(data = d)

输出:

   Code  Numx Numy
0    A1  30.2  1.9
1    A1  30.5  2.3
2        30.6  2.5
3    B1  35.6  2.2
4    B1  40.2  2.5
5    A2  45.5  3.1
6    A2  46.1  3.4
7        48.1  3.6
8    B2  48.5  3.7
9    B2  42.2  5.4
10        nan  nan       
11   A3  30.5  2.3
12   A3  30.6  2.5
13   A3  35.6  2.2
14       40.2  2.5
15   B1  45.5  3.1
16        nan  nan     
17   B4  48.1  3.6
18   B4  48.5  3.7
19   A2  42.2  5.4
20   A2  40.1  6.5
21   A1  48.5  8.5
22   A1  42.2  2.2
23        nan  nan      
24   B4  48.5  8.5
25   B4  42.2  2.2
26   C1  43.1  2.3
27   C1  44.1  2.5
28        nan  nan      
29        nan  nan   
30   D1   nan  nan      
31        nan  nan        
32   B2  45.1  3.2

我使用了从另一个问题发布的代码,但我回复了太多的南方

df['Code_new'] = df['Code'].where(df['Code'].isin(['A1','A2','A3','A4','B1','B2','B4','C1'])).ffill()

df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated())
mask = df['Code_new'] == 'A1'
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()

这会产生此输出:

   Code  Numx Numy Code_new
0    A1  30.2  1.9       A1
1    A1  30.2  1.9       A1
2        30.2  1.9       A1
3    B1  35.6  2.2       B1
4    B1   NaN  NaN       B1
5    A2  45.5  3.1       A2
6    A2   NaN  NaN       A2
7         NaN  NaN       A2
8    B2  48.5  3.7       B2
9    B2   NaN  NaN       B2
10        NaN  NaN       B2
11   A3  30.5  2.3       A3
12   A3   NaN  NaN       A3
13   A3   NaN  NaN       A3
14        NaN  NaN       A3
15   B1   NaN  NaN       B1
16        NaN  NaN       B1
17   B4  48.1  3.6       B4
18   B4   NaN  NaN       B4
19   A2   NaN  NaN       A2
20   A2   NaN  NaN       A2
21   A1  30.2  1.9       A1
22   A1  30.2  1.9       A1
23       30.2  1.9       A1
24   B4   NaN  NaN       B4
25   B4   NaN  NaN       B4
26   C1  43.1  2.3       C1
27   C1   NaN  NaN       C1
28        NaN  NaN       C1
29        NaN  NaN       C1
30   D1   NaN  NaN       C1
31        NaN  NaN       C1
32   B2   NaN  NaN       B2

我想要的输出是:

   Code  Numx Numy
0    A1  30.2  1.9
1    A1  30.5  2.3
2        30.6  2.5
3    B1  35.6  2.2
4    B1   nan  nan
5    A2  45.5  3.1
6    A2  46.1  3.4
7        48.1  3.6
8    B2  48.5  3.7
9    B2   nan  nan
10        nan  nan        
11   A3  30.5  2.3
12   A3  30.6  2.5
13   A3  35.6  2.2
14       40.2  2.5
15   B1  45.5  3.1
16        nan  nan         
17   B4  48.1  3.6
18   B4   nan  nan
19   A2  42.2  5.4
20   A2  40.1  6.5
21   A1  48.5  8.5
22   A1  42.2  2.2
23        nan  nan      
24   B4  48.5  8.5
25   B4   nan  nan
26   C1  43.1  2.3
27   C1  43.1  2.3
28       43.1  2.3   
29       43.1  2.3   
30   D1  43.1  2.3   
31       43.1  2.3         
32   B2  45.1  3.2

我认为这一行mask = df['Code_new'] == 'A1'我需要改变。该代码有效,但我只适用于代码'A1'中的值。就像在这里添加所有其他值一样简单。那么A1-A4,B1-B4,C1

1 个答案:

答案 0 :(得分:2)

我相信需要

m2 = df['Code'].isin(['A1','A2','A3','A4','B1','B2','B4','C1'])

#create helper column for unique categories
df['Code_new'] = df['Code'].where(m2).ffill()
df['Code_new'] = (df['Code_new'] + '_' + 
                  df['Code_new'].ne(df['Code_new'].shift()).cumsum().astype(str))

#check by start values and filter all columns without A
m1 = df['Code_new'].str.startswith(tuple(['A1','A2','A3','A4'])).fillna(False)
df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated() & ~m1)

#replace by forward filling only starting with C
mask = df['Code_new'].str.startswith('C').fillna(False)
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()

print (df)
   Code  Numx Numy Code_new
0    A1  30.2  1.9     A1_1
1    A1  30.5  2.3     A1_1
2        30.6  2.5     A1_1
3    B1  35.6  2.2     B1_2
4    B1   NaN  NaN     B1_2
5    A2  45.5  3.1     A2_3
6    A2  46.1  3.4     A2_3
7        48.1  3.6     A2_3
8    B2  48.5  3.7     B2_4
9    B2   NaN  NaN     B2_4
10        NaN  NaN     B2_4
11   A3  30.5  2.3     A3_5
12   A3  30.6  2.5     A3_5
13   A3  35.6  2.2     A3_5
14       40.2  2.5     A3_5
15   B1  45.5  3.1     B1_6
16        NaN  NaN     B1_6
17   B4  48.1  3.6     B4_7
18   B4   NaN  NaN     B4_7
19   A2  42.2  5.4     A2_8
20   A2  40.1  6.5     A2_8
21   A1  48.5  8.5     A1_9
22   A1  42.2  2.2     A1_9
23                     A1_9
24   B4  48.5  8.5    B4_10
25   B4   NaN  NaN    B4_10
26   C1  43.1  2.3    C1_11
27   C1  43.1  2.3    C1_11
28       43.1  2.3    C1_11
29       43.1  2.3    C1_11
30   D1  43.1  2.3    C1_11
31       43.1  2.3    C1_11
32   B2  45.1  3.2    B2_12